User Tools

Site Tools


design:website_classification

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
design:website_classification [2025/01/03 17:19] – [Adult Websites, Security, and Privacy Protection] added all links karelkubicekdesign:website_classification [2025/02/19 10:11] (current) – highlighted todos karelkubicek
Line 20: Line 20:
     - Limited granularity in labels, which may not suit detailed marketing or behavioral analysis.     - Limited granularity in labels, which may not suit detailed marketing or behavioral analysis.
     - Documentation includes deprecated categories, leading to potential misinterpretations.     - Documentation includes deprecated categories, leading to potential misinterpretations.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ==== FortiGuard ==== ==== FortiGuard ====
Line 30: Line 30:
     - Lower label granularity may restrict its applicability outside security domains.     - Lower label granularity may restrict its applicability outside security domains.
     - Limited documentation transparency in certain sensitive categories.     - Limited documentation transparency in certain sensitive categories.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ==== Symantec ==== ==== Symantec ====
Line 39: Line 39:
     - Taxonomy is less diverse compared to marketing-oriented services.     - Taxonomy is less diverse compared to marketing-oriented services.
     - Limited coverage for obscure or long-tail domains.     - Limited coverage for obscure or long-tail domains.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ==== Trend Micro ==== ==== Trend Micro ====
Line 46: Line 46:
     - Labels aligned with threat intelligence, enhancing usability in cybersecurity contexts.     - Labels aligned with threat intelligence, enhancing usability in cybersecurity contexts.
   * **Disadvantages**:   * **Disadvantages**:
-    - (TODO: URL, API)+    - <wrap todo>TODO: are there any?</wrap> 
 +  <wrap todo>TODO: URL, API</wrap>
  
 ==== Forcepoint ==== ==== Forcepoint ====
Line 55: Line 56:
     - Limited multi-labeling capabilities restrict nuanced classification.     - Limited multi-labeling capabilities restrict nuanced classification.
     - Challenges in documenting clear and concise taxonomies.     - Challenges in documenting clear and concise taxonomies.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ==== Dr.Web ==== ==== Dr.Web ====
Line 64: Line 65:
     - Very low coverage.     - Very low coverage.
     - Lack of nuanced or detailed labeling reduces utility in research or marketing.     - Lack of nuanced or detailed labeling reduces utility in research or marketing.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ===== Marketing and Content Discovery ===== ===== Marketing and Content Discovery =====
Line 83: Line 84:
     - Precision and granularity can vary, sometimes complicating results.     - Precision and granularity can vary, sometimes complicating results.
     - Documentation and taxonomy definitions require improvement for research usability.     - Documentation and taxonomy definitions require improvement for research usability.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ===== General Classification with Human Contributions ===== ===== General Classification with Human Contributions =====
Line 94: Line 95:
     - Scalability issues due to reliance on human volunteers.     - Scalability issues due to reliance on human volunteers.
     - Low coverage and subjective biases in labeling.     - Low coverage and subjective biases in labeling.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ==== DMOZ (Curlie) ==== ==== DMOZ (Curlie) ====
Line 103: Line 104:
     - Extremely limited scalability due to a small number of editors.     - Extremely limited scalability due to a small number of editors.
     - Labels may be outdated due to infrequent updates for many categories.     - Labels may be outdated due to infrequent updates for many categories.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ===== Aggregated Services ===== ===== Aggregated Services =====
Line 114: Line 115:
     - Inconsistencies due to integration of outdated or non-standardized data.     - Inconsistencies due to integration of outdated or non-standardized data.
     - Lack of direct control over taxonomies used by aggregated providers.     - Lack of direct control over taxonomies used by aggregated providers.
-  TODO: URL, API+  <wrap todo>TODO: URL, API</wrap>
  
 ===== Company Datasets ===== ===== Company Datasets =====
Line 133: Line 134:
     - Based on LinkedIn profiles that are self-reported - prone to adversarial data.     - Based on LinkedIn profiles that are self-reported - prone to adversarial data.
     - Only a subset of PeopleDataLabs' full dataset (22M/70M rows, 10/78 columns).     - Only a subset of PeopleDataLabs' full dataset (22M/70M rows, 10/78 columns).
-  TODO: cite ''Machine Learning Compliance Analysis for Email Regulation'' when it is public.+  <wrap todo>TODO: cite ''Machine Learning Compliance Analysis for Email Regulation'' when it is public.</wrap>
  
 ==== Crunchbase ==== ==== Crunchbase ====
Line 144: Line 145:
     - URLs are extremely noisy (they are not the priority) (Source: Karel Kubicek's experience).     - URLs are extremely noisy (they are not the priority) (Source: Karel Kubicek's experience).
     - Focuses mostly on variables useful for investments and market competitiveness.     - Focuses mostly on variables useful for investments and market competitiveness.
-  TODO: cite ''Machine Learning Compliance Analysis for Email Regulation'' when it is public.+  <wrap todo>TODO: cite ''Machine Learning Compliance Analysis for Email Regulation'' when it is public.</wrap>
  
 ==== Orbis ==== ==== Orbis ====
Line 235: Line 236:
  
  
 +~~DISCUSSION~~
design/website_classification.1735924767.txt.gz · Last modified: 2025/01/03 17:19 by karelkubicek