programming:similarweb
While official API use is paid, Similarweb offers a browser extension, which uses internal API to retrieve the data. We recommend first trying the extension manually, checking the API access in the network tab of the browser development tools, there you can find the request to domain like https://data.similarweb.com/api/v1/data?domain=example.com
and the headers used in this request. Using these headers, you can automate the access similarly to the following code.
- main.py
import requests from urllib.parse import urlparse import pandas as pd API_URL = "https://data.similarweb.com/api/v1/data?domain=" HEADERS = { # TODO: update the User-Agent with your current agent, else you will get blocked 'User-Agent': 'User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:133.0) Gecko/20100101 Firefox/133.0' } def similarweb_get(domain): domain = urlparse(domain).netloc # Similarweb input is only the domains netloc resp = requests.get(API_URL + domain, headers = HEADERS) if resp.status_code == 200: return resp.json() else: resp.raise_for_status() return False if __name__ == "__main__": domains = [ 'https://github.com/', 'https://measuretheweb.org' # not in the dataset (yet :)), response is valid but empty ] responses = [] success_n = 0 consecutive_errors_n = 0 for domain in domains: # if you are often getting errors, you might try to add a sleep here try: resp = similarweb_get(domain) responses.append(resp) success_n += 1 consecutive_errors_n = 0 except HTTPError: responses.append("") consecutive_errors_n += 1 if consecutive_errors_n >= 5: break print(f'Succeeded on {success_n} out of {len(domains)} domains.') df = pd.json_normalize(responses) print(df) df.to_csv('similarweb.csv')
Output:
Succeeded on 2 out of 2 domains. Version SiteName Description ... Notification.Content GlobalCategoryRank.Rank GlobalCategoryRank.Category 0 1 github.com github is where people build software. more th... ... None NaN NaN 1 1 measuretheweb.org None ... None NaN NaN [2 rows x 39 columns]
Columns:
Version
:SiteName
:Description
:TopCountryShares
:Title
:GlobalCategoryRank
:IsSmall
:Policy
:Category
:LargeScreenshot
:IsDataFromGa
:Countries
:TopKeywords
:SnapshotDate
:Engagments.BounceRate
:Engagments.Month
:Engagments.Year
:Engagments.PagePerVisit
:Engagments.Visits
:Engagments.TimeOnSite
:EstimatedMonthlyVisits.2024-09-01
:EstimatedMonthlyVisits.2024-10-01
:EstimatedMonthlyVisits.2024-11-01
:GlobalRank.Rank
:CountryRank.Country
:CountryRank.CountryCode
:CountryRank.Rank
:CategoryRank.Rank
:CategoryRank.Category
:TrafficSources.Social
:TrafficSources.Paid Referrals
:TrafficSources.Mail
:TrafficSources.Referrals
:TrafficSources.Search
:TrafficSources.Direct
:Competitors.TopSimilarityCompetitors
:Notification.Content
:GlobalCategoryRank.Rank
:GlobalCategoryRank.Category
:
You could leave a comment if you were logged in.
programming/similarweb.txt · Last modified: 2025/01/03 17:41 by karelkubicek