User Tools

Site Tools


design:website_selection

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
design:website_selection [2025/01/03 17:40] – Added discussion karelkubicekdesign:website_selection [2025/02/19 10:13] (current) – highlighted todos karelkubicek
Line 24: Line 24:
 [[https://radar.cloudflare.com/domains|Cloudflare Radar]] provides rankings based on DNS query data and traffic on Cloudflare-operated websites. [[https://radar.cloudflare.com/domains|Cloudflare Radar]] provides rankings based on DNS query data and traffic on Cloudflare-operated websites.
  
-  * **Advantages**: TODO. +  * **Advantages**: <wrap todo>TODO</wrap> 
-  * **Limitations**: TODO.+  * **Limitations**: <wrap todo>TODO</wrap>
   * **More details, API**: [[Programming:Cloudflare_Radar]]   * **More details, API**: [[Programming:Cloudflare_Radar]]
  
Line 44: Line 44:
   * ''Quantcast'': Primarily focused on US traffic.   * ''Quantcast'': Primarily focused on US traffic.
   * ''[[https://en.wikipedia.org/wiki/Alexa_Internet|Alexa]]'': Discontinued as of August 1, 2023; previously widely used in research. Rankings were based on page visits from a user panel and tracking scripts, making them more reliable than DNS-based lists but highly volatile.   * ''[[https://en.wikipedia.org/wiki/Alexa_Internet|Alexa]]'': Discontinued as of August 1, 2023; previously widely used in research. Rankings were based on page visits from a user panel and tracking scripts, making them more reliable than DNS-based lists but highly volatile.
-  * ''[[https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all|Farsight]]'': TODO.+  * ''[[https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all|Farsight]]'': <wrap todo>TODO</wrap>
   * ''[[https://secrank.cn/|SecRank]]'': List based on Chinese DNS data, introduced in USENIX Security 2022 {[xie2022_building]}.   * ''[[https://secrank.cn/|SecRank]]'': List based on Chinese DNS data, introduced in USENIX Security 2022 {[xie2022_building]}.
  
 ===== Best Practices ===== ===== Best Practices =====
-Based on recent studies {[LePochat2019_tranco]} {[ruth2022_toppling]}, the following recommendations can enhance the representativeness and reliability of website selection:+Based on recent studies {[LePochat2019_tranco,ruth2022_toppling]}, the following recommendations can enhance the representativeness and reliability of website selection:
  
   - **Use Aggregated Data**: Majority of research does not require the popularity indices of individual websites, but rather aggregated ranks (e.g., top 10k). This reduces [[#Temporal Stability and Manipulations]] limitation.   - **Use Aggregated Data**: Majority of research does not require the popularity indices of individual websites, but rather aggregated ranks (e.g., top 10k). This reduces [[#Temporal Stability and Manipulations]] limitation.
Line 82: Line 82:
 Several publications have surveyed the usage of various ranking lists in academic research. The figures below illustrate findings from Scheitle et al. {[scheitle2018_long]} and Xie et al. {[xie2024_crawling]}, noting, however, that recent trends are not fully captured due to the lag in the research process. For instance, Alexa, though discontinued in 2023, is still used in 2024 publications due to sampling occurring at the start of studies. Also, the publication survey ends in 2022. Several publications have surveyed the usage of various ranking lists in academic research. The figures below illustrate findings from Scheitle et al. {[scheitle2018_long]} and Xie et al. {[xie2024_crawling]}, noting, however, that recent trends are not fully captured due to the lag in the research process. For instance, Alexa, though discontinued in 2023, is still used in 2024 publications due to sampling occurring at the start of studies. Also, the publication survey ends in 2022.
  
-<WRAP right 50% box>+<WRAP center 100% box>
 {{design:website_ranking_list_popularity_scheitle2018.png|Popularity of website lists in web measurement publications according to Scheitle et al.}} {{design:website_ranking_list_popularity_scheitle2018.png|Popularity of website lists in web measurement publications according to Scheitle et al.}}
 <div>Popularity of website lists in web measurement publications according to {[scheitle2018_long]}.</div> <div>Popularity of website lists in web measurement publications according to {[scheitle2018_long]}.</div>
 </WRAP> </WRAP>
-<WRAP right 50% box>+<WRAP center 100% box>
 {{design:website_ranking_list_popularity_xie2022.png|Popularity of website lists in web measurement publications according to Xie et al.}} {{design:website_ranking_list_popularity_xie2022.png|Popularity of website lists in web measurement publications according to Xie et al.}}
 <div>Popularity of website lists in web measurement publications according to {[xie2024_crawling]}.</div> <div>Popularity of website lists in web measurement publications according to {[xie2024_crawling]}.</div>
design/website_selection.1735926046.txt.gz · Last modified: 2025/01/03 17:40 by karelkubicek