====== Chrome User Experience Report (CrUX) ====== The Chrome User Experience Report (also known as the Chrome UX Report, or CrUX for short) is a dataset that reflects how real-world Chrome users experience popular destinations on the web. CrUX data is collected from real browsers around the world, based on certain browser options which determine [[https://developer.chrome.com/docs/crux/methodology#user-eligibility|user eligibility]]. A set of [[https://developer.chrome.com/docs/crux/methodology/dimensions|dimensions]] and [[https://developer.chrome.com/docs/crux/methodology/metrics|metrics]] are collected which allow site owners to determine how users experience their sites. The data collected by CrUX is available publicly through a number of [[https://developer.chrome.com/docs/crux/methodology/tools|Google tools]] and third-party tools and is used by Google Search to inform the [[https://developers.google.com/search/docs/advanced/experience/page-experience|page experience ranking factor]]. Not all origins or pages are represented in the dataset. There are separate eligibility criteria for [[https://developer.chrome.com/docs/crux/methodology#origin-eligibility|origins]] and [[https://developer.chrome.com/docs/crux/methodology#page-eligibility|pages]], primarily that they must be publicly discoverable and there must be a large enough number of visitors in order to create a statistically significant dataset. === API === In the following text, we document access to CrUX via [[https://developer.chrome.com/docs/crux/bigquery|Google BigQuery]]. - Register Google BigQuery account. Queries returning up to 1TB are free, the limit is reset monthly. It would require downloading the monthly CrUX list more than a hundred times to run out. But be careful with your SQL statements; if you query historic data, you can quickly run out of the limit and BigQuery can get expensive. - Follow [[https://console.cloud.google.com/projectcreate|BigQuery documentation to create a project]], set up billing if prompted, and enable BigQuery API. - Get [[https://console.cloud.google.com/apis/credentials|credentials JSON file]]. - Follow more details from [[https://developer.chrome.com/docs/crux/api|CrUX API documentation]] or skip to the example below. ==== Code Example ==== The following Python code downloads the CrUX list of November 2024 with the rank across all countries. It expects that you have set an environmental variable with a path to the credentials JSON file: ''export GOOGLE_APPLICATION_CREDENTIALS="/your/path/to/creds.json"''. It also expects you to have installed ''google-cloud'', ''google-api-python-client'', ''google-cloud-bigquery[pandas]'', and ''pandas'' Python packages. import pandas as pd # to load it into Pandas DataFrame from google.cloud import bigquery from google.oauth2 import service_account YYYYMM = 202411 # let's download only November 2024 data LIMIT = 5 # limit the query to only first 5 results to reduce risk client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS query = f"SELECT * FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM} LIMIT {LIMIT}" df = client.query(query).to_dataframe() print('Dataframe:') print(df) print('Dataframe columns:') print(df.columns) df.to_csv(f'crux_all_{YYYYMM}.csv') Example output: Dataframe: country_code yyyymm ... interaction_to_next_paint navigation_types 0 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... {'navigate': {'fraction': 0.682}, 'navigate_ca... 1 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 2 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 3 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 4 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None [5 rows x 15 columns] Dataframe columns: Index(['country_code', 'yyyymm', 'origin', 'effective_connection_type', 'form_factor', 'first_paint', 'first_contentful_paint', 'dom_content_loaded', 'onload', 'first_input', 'layout_instability', 'largest_contentful_paint', 'experimental', 'interaction_to_next_paint', 'navigation_types'], dtype='object') ==== Columns ==== * ''country_code'': [[https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2|ISO 3166-1 alpha-2]] of 238 countries. To get the country codes, you have to query the ''chrome-ux-report.experimental.country'' table, which was added in [[https://developer.chrome.com/docs/crux/release-notes#202004|202004 CrUX release]] * ''yyyymm'': year and month * ''origin'': url * ''effective_connection_type'': enum with connection types (e.g., 4G) * ''formFactor'': enum of user devices: ''PHONE'', ''TABLET'', and ''DESKTOP'' * ''first_paint'': histogram * ''first_contentful_paint'': histogram * ''dom_content_loaded'': histogram of access speeds? * ''onload'': histogram * ''first_input'': histogram * ''layout_instability'': empty * ''largest_contentful_paint'': ''{'histogram': {'bin': array([{'start': 0, 'end': 100, 'density': 0.0022}...'' * ''experimental'': ''{'time_to_first_byte': {'histogram': {'bin': array([{'start': 0, 'END': 100, 'density': 0.1283},...]), 'interaction_to_next_paint': None, 'permission': None, 'popularity': {'rank': 100000}'' ==== Dataset Size ==== Number of records in 202210 (collected by multiple queries like ''SELECT count(DISTINCT origin) as num_orig FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202210 AND country_code = 'de'''): * DE 1'143'612 (1'633'243 measurements) * FR 959'672 (1'503'659 measurements) * CH 219'127 (297'626 measurements) * GB 1'101'480 (1'677'988 measurements) * US 3'505'806 (5'508'774 measurements) * all 15'629'207 (46'356'015 measurements) ==== Other Useful Code ==== To select top 10k US websites in November 2024: SELECT DISTINCT country_code, origin, experimental.popularity.rank as rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202411 AND country_code = 'us' AND rank <= 10000 To sample websites from EU and EFTA for privacy studies. This code will get origins of 500 countries in each of the ''rank'' x ''country'' combinations. import pandas as pd # to load it into Pandas DataFrame from google.cloud import bigquery from google.oauth2 import service_account YYYYMM = 202411 # let's download only November 2024 data RANKS = (1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000) # all ranks COUNTRIES = ( 'at','be','bg','hr','cy','cz','dk','ee','fi','fr','de','gr','hu','ie','it','lv','lt','lu','mt','nl','pl','pt','ro','sk','si','es','se', # EU states 'is','li','no','ch', # EFTA states 'gb' # Other states ) SITES_N = 500 client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS query = f"SELECT DISTINCT country_code, origin, experimental.popularity.rank as rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM}" df = client.query(query).to_dataframe() df.to_csv(f'crux_all_{YYYYMM}.csv') # store crux original data # this takes a while to process websites = { country: { rank: set(df[(df['country_code'] == country) & (df['rank'] == rank)]['origin'].values) for rank in RANKS } for country in COUNTRIES } sampled_websites = set() for rank in RANKS: for country in COUNTRIES: source = websites[country][rank] sampled = set(sample(tuple(source), min(SITES_N, len(source)))) sampled_websites = sampled_websites | sampled with open(f'sampled_urls_{YYYYMM}.txt', 'w') as wf: for url in sampled_websites: wf.write(url + '\n') === More Details === * CrUX for month X is released typically two weeks after the end of month X. E.g., X=202411 for November 2024 was released on December 10, 2024. Check useful information on [[https://developer.chrome.com/docs/crux/release-notes|CrUX release notes page]]. ~~DISCUSSION~~