User Tools

Site Tools


privacy:cookies

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
privacy:cookies [2025/01/09 11:11] – [Categories] formating karelkubicekprivacy:cookies [2025/03/04 13:53] (current) – [First- and Third-Party Cookies] highlighting common misconception karelkubicek
Line 7: Line 7:
 ===== First- and Third-Party Cookies ===== ===== First- and Third-Party Cookies =====
  
-First-party cookies are set by the domain the user is directly visiting, while all other cookies are from third partiesA common misconception is that first-party cookies are always benign and third-party cookies are always intrusive. However, first-party cookies can also track users or even be set by third parties using CNAME cloaking. First-party cookies are restricted to the website's context, while third-party cookies can track users across multiple websites.+<WRAP important>A common misconception in research is that first-party cookies are always benign and third-party cookies are always intrusive!</WRAP> 
 + 
 +First-party cookies are set by the domain the user is directly visiting, while all other cookies are considered third-party cookiesAlthough third-party cookies are significantly more used for tracking than first-party cookies, it is wrong to claim that first-party cookies are always benign and third-party cookies are always intrusive. First-party cookies can also track users or even be set by third parties using [[https://arxiv.org/abs/2102.09301|CNAME cloaking]] and there are many third-party cookies serving necessary functionality such as SSO. 
 + 
 +The only difference is from the browser perspective. First-party cookies are accessible only from the first-party website's context, while third-party cookies are accessible across multiple websites that embed the same third party. But this implementation depends on the browser, with [[https://webkit.org/blog/8943/privacy-preserving-ad-click-attribution-for-the-web/|Safari]] and [[https://blog.mozilla.org/en/products/firefox/firefox-rolls-out-total-cookie-protection-by-default-to-all-users-worldwide/|Firefox]] setting the  storage for third parties for every website separately.
  
 Munir et al. {[shaoor2023cookiegraph]} observed that 89.86% of the top-million websites use first-party tracking cookies. Of these, 96.61% are ghostwritten by third-party scripts embedded in the first-party context, and some are set by fingerprinting scripts. Munir et al. {[shaoor2023cookiegraph]} observed that 89.86% of the top-million websites use first-party tracking cookies. Of these, 96.61% are ghostwritten by third-party scripts embedded in the first-party context, and some are set by fingerprinting scripts.
Line 30: Line 34:
 Using datasets of cookies or online classification services has significant disadvantages: they cannot classify unseen data or assign one cookie multiple classes based on dynamic content. However, they offer advantages over ML methods, such as post-crawl classification of detected cookies. Using datasets of cookies or online classification services has significant disadvantages: they cannot classify unseen data or assign one cookie multiple classes based on dynamic content. However, they offer advantages over ML methods, such as post-crawl classification of detected cookies.
  
-We discuss issues with dynamic cookie names, publicly released datasets, and two main online classification services: Cookiepedia and Cookiedatabase. +<WRAP info>
- +
-==== Dynamic Cookie Names ==== +
 Some websites deviate from the typical key-value (cookie name and cookie value) scheme by storing data directly in the cookie name. There are several cases, explained by following examples: Some websites deviate from the typical key-value (cookie name and cookie value) scheme by storing data directly in the cookie name. There are several cases, explained by following examples:
  
   * ''_gat_UA-<ID>'' and ''_ga_<ID>'' (Google Analytics cookies), where the ID is unique to the Google Analytics configuration but not dynamic per user.   * ''_gat_UA-<ID>'' and ''_ga_<ID>'' (Google Analytics cookies), where the ID is unique to the Google Analytics configuration but not dynamic per user.
-  * ''AMCV_<ID>@<host>'' (Adobe Experience Cloud Identity Service cookie), where the ID is unique per user. Such cookie names cannot be found in databases due to their dynamic nature.+  * ''AMCV_<ID>@<host>'' (Adobe Experience Cloud Identity Service cookie), where the ID is unique per user. Such cookie names cannot be found in databases due to their dynamic nature,, except for cases when the database stores patterns. 
 +</WRAP> 
 + 
 +We discuss publicly released datasets and two main online classification services: Cookiepedia and Cookiedatabase. 
  
 ==== OneTrust and CookieBot Dataset ==== ==== OneTrust and CookieBot Dataset ====
Line 81: Line 86:
   - Preferences Cookies (in some jurisdictions known as functionality)   - Preferences Cookies (in some jurisdictions known as functionality)
  
 +==== Cookiesearch ====
 +
 +https://cookiesearch.org/ (no experience)
 ===== Machine-Learning Classification ===== ===== Machine-Learning Classification =====
  
 Using machine learning to classify cookies, rather than relying on static datasets, addresses the limitation of classifying unseen data. Research indicates that ML methods may even outperform human classification. However, practical deployment of ML-based approaches faces challenges similar to those in ML-based advertising blocking: they are prone to adversarial attacks, may disrupt website functionality, and can potentially be used for fingerprinting. These limitations however does not hinder application of ML-based detection in research. Using machine learning to classify cookies, rather than relying on static datasets, addresses the limitation of classifying unseen data. Research indicates that ML methods may even outperform human classification. However, practical deployment of ML-based approaches faces challenges similar to those in ML-based advertising blocking: they are prone to adversarial attacks, may disrupt website functionality, and can potentially be used for fingerprinting. These limitations however does not hinder application of ML-based detection in research.
  
-==== CookieBlock ====+==== CookieBlock Model ====
  
 In **Automating Cookie Consent and GDPR Violation Detection** {[bollinger2022automating]}, researchers developed an ML model to classify cookies according to the four ICC UK purposes. They scraped data from 30k websites using CMPs like OneTrust and CookieBot, collecting over 2 million cookies labeled by website operators. In **Automating Cookie Consent and GDPR Violation Detection** {[bollinger2022automating]}, researchers developed an ML model to classify cookies according to the four ICC UK purposes. They scraped data from 30k websites using CMPs like OneTrust and CookieBot, collecting over 2 million cookies labeled by website operators.
Line 218: Line 226:
     * ''timestamp'': change timestamp     * ''timestamp'': change timestamp
  
-==== CookieGraph ====+==== CookieGraph Model ====
  
 **CookieGraph: Understanding and Detecting First-Party Tracking Cookies** {[shaoor2023cookiegraph]} extends CookieBlock to resist adversarial modifications by avoiding easily mutable features (e.g., name) and leveraging network graph features to capture cookie usage patterns. This approach requires even further instrumentation, available only in their custom crawler. **CookieGraph: Understanding and Detecting First-Party Tracking Cookies** {[shaoor2023cookiegraph]} extends CookieBlock to resist adversarial modifications by avoiding easily mutable features (e.g., name) and leveraging network graph features to capture cookie usage patterns. This approach requires even further instrumentation, available only in their custom crawler.
privacy/cookies.1736421084.txt.gz · Last modified: 2025/01/09 11:11 by karelkubicek