User Tools

Site Tools


programming:stateful_stateless

This is an old revision of the document!


Stateful and Stateless Crawling

This page only contains notes

Key message:

  • Majority of web measurements studies use stateless crawls, as it is easy to associate events with the single browsed visited website. Also, stateless crawls do not depend on crawling order and are easier to parallelize.
  • Stateful crawling is however more representative of real users, that rarely clear their browser state.

Relevant Literature

Since the majority of publication uses stateless crawling, below we list examples of influential publications doing otherwise. However, not all contribute specifically to the question of difference between stateful and stateless crawling.

Studies of Stateful Aspects

The following studies used stateless crawls, but were interpreting some stateful properties of web:

Shallow vs Deep crawling

References

You could leave a comment if you were logged in.
programming/stateful_stateless.1742307641.txt.gz · Last modified: 2025/03/18 14:20 by karelkubicek