design:website_classification
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
design:website_classification [2025/01/03 17:17] – [Company Datasets] fixed formatting karelkubicek | design:website_classification [2025/02/19 10:11] (current) – highlighted todos karelkubicek | ||
---|---|---|---|
Line 20: | Line 20: | ||
- Limited granularity in labels, which may not suit detailed marketing or behavioral analysis. | - Limited granularity in labels, which may not suit detailed marketing or behavioral analysis. | ||
- Documentation includes deprecated categories, leading to potential misinterpretations. | - Documentation includes deprecated categories, leading to potential misinterpretations. | ||
- | TODO: URL, API | + | |
==== FortiGuard ==== | ==== FortiGuard ==== | ||
Line 30: | Line 30: | ||
- Lower label granularity may restrict its applicability outside security domains. | - Lower label granularity may restrict its applicability outside security domains. | ||
- Limited documentation transparency in certain sensitive categories. | - Limited documentation transparency in certain sensitive categories. | ||
- | TODO: URL, API | + | |
==== Symantec ==== | ==== Symantec ==== | ||
Line 39: | Line 39: | ||
- Taxonomy is less diverse compared to marketing-oriented services. | - Taxonomy is less diverse compared to marketing-oriented services. | ||
- Limited coverage for obscure or long-tail domains. | - Limited coverage for obscure or long-tail domains. | ||
- | TODO: URL, API | + | |
==== Trend Micro ==== | ==== Trend Micro ==== | ||
Line 46: | Line 46: | ||
- Labels aligned with threat intelligence, | - Labels aligned with threat intelligence, | ||
* **Disadvantages**: | * **Disadvantages**: | ||
- | - (TODO: URL, API) | + | - <wrap todo> |
+ | <wrap todo>TODO: URL, API</ | ||
==== Forcepoint ==== | ==== Forcepoint ==== | ||
Line 55: | Line 56: | ||
- Limited multi-labeling capabilities restrict nuanced classification. | - Limited multi-labeling capabilities restrict nuanced classification. | ||
- Challenges in documenting clear and concise taxonomies. | - Challenges in documenting clear and concise taxonomies. | ||
- | TODO: URL, API | + | |
==== Dr.Web ==== | ==== Dr.Web ==== | ||
Line 64: | Line 65: | ||
- Very low coverage. | - Very low coverage. | ||
- Lack of nuanced or detailed labeling reduces utility in research or marketing. | - Lack of nuanced or detailed labeling reduces utility in research or marketing. | ||
- | TODO: URL, API | + | |
===== Marketing and Content Discovery ===== | ===== Marketing and Content Discovery ===== | ||
Line 83: | Line 84: | ||
- Precision and granularity can vary, sometimes complicating results. | - Precision and granularity can vary, sometimes complicating results. | ||
- Documentation and taxonomy definitions require improvement for research usability. | - Documentation and taxonomy definitions require improvement for research usability. | ||
- | TODO: URL, API | + | |
===== General Classification with Human Contributions ===== | ===== General Classification with Human Contributions ===== | ||
Line 94: | Line 95: | ||
- Scalability issues due to reliance on human volunteers. | - Scalability issues due to reliance on human volunteers. | ||
- Low coverage and subjective biases in labeling. | - Low coverage and subjective biases in labeling. | ||
- | TODO: URL, API | + | |
==== DMOZ (Curlie) ==== | ==== DMOZ (Curlie) ==== | ||
Line 103: | Line 104: | ||
- Extremely limited scalability due to a small number of editors. | - Extremely limited scalability due to a small number of editors. | ||
- Labels may be outdated due to infrequent updates for many categories. | - Labels may be outdated due to infrequent updates for many categories. | ||
- | TODO: URL, API | + | |
===== Aggregated Services ===== | ===== Aggregated Services ===== | ||
Line 114: | Line 115: | ||
- Inconsistencies due to integration of outdated or non-standardized data. | - Inconsistencies due to integration of outdated or non-standardized data. | ||
- Lack of direct control over taxonomies used by aggregated providers. | - Lack of direct control over taxonomies used by aggregated providers. | ||
- | TODO: URL, API | + | |
===== Company Datasets ===== | ===== Company Datasets ===== | ||
Line 133: | Line 134: | ||
- Based on LinkedIn profiles that are self-reported - prone to adversarial data. | - Based on LinkedIn profiles that are self-reported - prone to adversarial data. | ||
- Only a subset of PeopleDataLabs' | - Only a subset of PeopleDataLabs' | ||
- | TODO: cite '' | + | |
==== Crunchbase ==== | ==== Crunchbase ==== | ||
Line 144: | Line 145: | ||
- URLs are extremely noisy (they are not the priority) (Source: Karel Kubicek' | - URLs are extremely noisy (they are not the priority) (Source: Karel Kubicek' | ||
- Focuses mostly on variables useful for investments and market competitiveness. | - Focuses mostly on variables useful for investments and market competitiveness. | ||
- | TODO: cite '' | + | |
==== Orbis ==== | ==== Orbis ==== | ||
Line 184: | Line 185: | ||
* [[https:// | * [[https:// | ||
- | Visit individual privacy-oriented pages for more details regarding classification of [[Privacy: | + | Visit individual privacy-oriented pages for more details regarding classification of [[Privacy: |
==== Marketing Industry ==== | ==== Marketing Industry ==== | ||
Line 234: | Line 235: | ||
<bibtex bibliography></ | <bibtex bibliography></ | ||
- | ====== BibTex ====== | ||
- | <bibtex database> | ||
- | @inproceedings{vallina2020_misshapes, | ||
- | author = {Vallina, Pelayo and Le Pochat, Victor and Feal, \' | ||
- | title = {Mis-shapes, | ||
- | year = {2020}, | ||
- | isbn = {9781450381383}, | ||
- | publisher = {Association for Computing Machinery}, | ||
- | address = {New York, NY, USA}, | ||
- | url = {https:// | ||
- | doi = {10.1145/ | ||
- | abstract = {Domain classification services have applications in multiple areas, including cybersecurity, | ||
- | booktitle = {Proceedings of the ACM Internet Measurement Conference}, | ||
- | pages = {598–618}, | ||
- | numpages = {21}, | ||
- | location = {Virtual Event, USA}, | ||
- | series = {IMC '20} | ||
- | } | ||
- | </ | ||
+ | ~~DISCUSSION~~ |
design/website_classification.1735924679.txt.gz · Last modified: 2025/01/03 17:17 by karelkubicek