====== Welcome to Measure The Web ====== Empirical studies on the web require researchers to navigate a complex landscape of experimental design choices, ranging from selecting a representative sample of websites to choosing the appropriate crawling technology. Similarly, analyzing results involves critical decisions, such as website categorization and statistical methodology. Too often, these decisions are made based on limited guidance, informal advice, or trial and error, despite their profound impact on research outcomes and their applicability. Measure The Web aims to bridge this gap, by providing evidence-based guidance for empirical web measurement studies. The platform evaluates design choices and their implications by referencing relevant academic publications or conducting original studies where necessary. For novice researchers in the web measurement field, Measure The Web should provide a complete knowledge base to conduct their study. But it should be helpful also to experienced researchers, who can find here newest practices or proper arguments (or citations) to justify their design. If you are in a mentoring position, consider sharing the website (or directly [[Study design checklist]]) to your mentees and also consider reviewing and contributing to the page. Even small edits, such as supporting some subjective claim, can help make this website stronger. ===== Outline ===== The website is organized as follows. Note that it is ordered by website structure, for order by research design, navigate to [[Study design checklist]]. * [[Design|Research design]] clarifies choices needed to conduct the measurements. Example pages: * [[Design:Automated measurements]] or [[Design:User studies]] * [[Design:Website selection]] and [[Design:Sampling|Representative sampling methods]], [[Design:Website classification]], and [[Design:IP classification]] * [[Design:Crawling location]] and [[Design:Archives|Crawling live or using archives]] * [[Design:Platforms|Research of specific large platforms]] such as [[Design:Platforms:Facebook]], [[Design:Platforms:Twitter]], [[Design:Platforms:TikTok]], [[Design:Platforms:Amazon]] * [[Programming]] * [[Programming:Crawler|Comparison of crawling libraries]] such as [[Programming:Crawler:OpenWPM]], [[Programming:Crawler:Tracker Radar Collector]], etc. * [[Programming:Multilingual support]] * [[Programming:Interaction|Interaction with websites]] * [[Programming:Traffic files|Working with traffic files (e.g., HAR)]] * Multitude of pages linked from elsewhere documenting specific technologies, e.g., [[Programming:CrUX]], [[Programming:Similarweb]]. * [[Privacy]] * Classifying [[Privacy:Requests|Web requests]], [[Privacy:Cookies]], [[Privacy:Fingerprinting]], or [[Privacy:JavaScript]] * [[Privacy:consent|Granting consent to websites]] * [[Security]] * [[Statistics]] * [[Statistics:Study preregistration]] * [[Statistics:Hypothesis testing]] suitable for web measurements. * [[Statistics:Regression]] * [[Statistics:Pvalue corrections|P-value corrections]] * [[Statistics:Biases]] * [[Writing]] * [[Writing:Conferences]] suitable for web measurements in the field of security and privacy. * [[Writing:Literature review]] * [[Artifacts]] explains how to ensure your research to be reproducible. * [[Practices|Other research practices]] explains various aspects affiliated to the research, but not directly the study design. * [[Practices:Ethics]] * [[Practices:How to PhD]] * [[Practices:How to find academic jobs]] * [[Practices:Useful external resources]] ===== Contributing ===== We welcome contributions! Editing functionality is limited to registered users. You might want to check [[Contributing]] page, which helps with the syntax. /* ==== Contributors ==== Thanks to everyone who is helping with documenting the methodology of web measurements! We know that it might involve sharing some of your tricks, hopefully you also think that the community better prospers together. Here is a table of authors: */