Table of Contents
Chrome User Experience Report (CrUX)
The Chrome User Experience Report (also known as the Chrome UX Report, or CrUX for short) is a dataset that reflects how real-world Chrome users experience popular destinations on the web.
CrUX data is collected from real browsers around the world, based on certain browser options which determine user eligibility. A set of dimensions and metrics are collected which allow site owners to determine how users experience their sites.
The data collected by CrUX is available publicly through a number of Google tools and third-party tools and is used by Google Search to inform the page experience ranking factor.
Not all origins or pages are represented in the dataset. There are separate eligibility criteria for origins and pages, primarily that they must be publicly discoverable and there must be a large enough number of visitors in order to create a statistically significant dataset.
API
In the following text, we document access to CrUX via Google BigQuery.
- Register Google BigQuery account. Queries returning up to 1TB are free, the limit is reset monthly. It would require downloading the monthly CrUX list more than a hundred times to run out. But be careful with your SQL statements; if you query historic data, you can quickly run out of the limit and BigQuery can get expensive.
- Follow BigQuery documentation to create a project, set up billing if prompted, and enable BigQuery API.
- Follow more details from CrUX API documentation or skip to the example below.
Code Example
The following Python code downloads the CrUX list of November 2024 with the rank across all countries. It expects that you have set an environmental variable with a path to the credentials JSON file: export GOOGLE_APPLICATION_CREDENTIALS=“/your/path/to/creds.json”
. It also expects you to have installed google-cloud
, google-api-python-client
, google-cloud-bigquery[pandas]
, and pandas
Python packages.
- crux.py
import pandas as pd # to load it into Pandas DataFrame from google.cloud import bigquery from google.oauth2 import service_account YYYYMM = 202411 # let's download only November 2024 data LIMIT = 5 # limit the query to only first 5 results to reduce risk client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS query = f"SELECT * FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM} LIMIT {LIMIT}" df = client.query(query).to_dataframe() print('Dataframe:') print(df) print('Dataframe columns:') print(df.columns) df.to_csv(f'crux_all_{YYYYMM}.csv')
Example output:
Dataframe: country_code yyyymm ... interaction_to_next_paint navigation_types 0 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... {'navigate': {'fraction': 0.682}, 'navigate_ca... 1 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 2 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 3 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 4 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None [5 rows x 15 columns] Dataframe columns: Index(['country_code', 'yyyymm', 'origin', 'effective_connection_type', 'form_factor', 'first_paint', 'first_contentful_paint', 'dom_content_loaded', 'onload', 'first_input', 'layout_instability', 'largest_contentful_paint', 'experimental', 'interaction_to_next_paint', 'navigation_types'], dtype='object')
Columns
country_code
: ISO 3166-1 alpha-2 of 238 countries. To get the country codes, you have to query thechrome-ux-report.experimental.country
table, which was added in 202004 CrUX releaseyyyymm
: year and monthorigin
: urleffective_connection_type
: enum with connection types (e.g., 4G)formFactor
: enum of user devices:PHONE
,TABLET
, andDESKTOP
first_paint
: histogramfirst_contentful_paint
: histogramdom_content_loaded
: histogram of access speeds?onload
: histogramfirst_input
: histogramlayout_instability
: emptylargest_contentful_paint
:{'histogram': {'bin': array([{'start': 0, 'end': 100, 'density': 0.0022}…
experimental
:{'time_to_first_byte': {'histogram': {'bin': array([{'start': 0, 'END': 100, 'density': 0.1283},…]), 'interaction_to_next_paint': None, 'permission': None, 'popularity': {'rank': 100000}
Dataset Size
Number of records in 202210 (collected by multiple queries like SELECT count(DISTINCT origin) as num_orig FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202210 AND country_code = 'de
'):
- DE 1'143'612 (1'633'243 measurements)
- FR 959'672 (1'503'659 measurements)
- CH 219'127 (297'626 measurements)
- GB 1'101'480 (1'677'988 measurements)
- US 3'505'806 (5'508'774 measurements)
- all 15'629'207 (46'356'015 measurements)
Other Useful Code
To select top 10k US websites in November 2024:
SELECT DISTINCT country_code, origin, experimental.popularity.rank AS rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202411 AND country_code = 'us' AND rank <= 10000
To sample websites from EU and EFTA for privacy studies. This code will get origins of 500 countries in each of the rank
x country
combinations.
- sample_crux.py
import pandas as pd # to load it into Pandas DataFrame from google.cloud import bigquery from google.oauth2 import service_account YYYYMM = 202411 # let's download only November 2024 data RANKS = (1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000) # all ranks COUNTRIES = ( 'at','be','bg','hr','cy','cz','dk','ee','fi','fr','de','gr','hu','ie','it','lv','lt','lu','mt','nl','pl','pt','ro','sk','si','es','se', # EU states 'is','li','no','ch', # EFTA states 'gb' # Other states ) SITES_N = 500 client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS query = f"SELECT DISTINCT country_code, origin, experimental.popularity.rank as rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM}" df = client.query(query).to_dataframe() df.to_csv(f'crux_all_{YYYYMM}.csv') # store crux original data # this takes a while to process websites = { country: { rank: set(df[(df['country_code'] == country) & (df['rank'] == rank)]['origin'].values) for rank in RANKS } for country in COUNTRIES } sampled_websites = set() for rank in RANKS: for country in COUNTRIES: source = websites[country][rank] sampled = set(sample(tuple(source), min(SITES_N, len(source)))) sampled_websites = sampled_websites | sampled with open(f'sampled_urls_{YYYYMM}.txt', 'w') as wf: for url in sampled_websites: wf.write(url + '\n')
More Details
- CrUX for month X is released typically two weeks after the end of month X. E.g., X=202411 for November 2024 was released on December 10, 2024. Check useful information on CrUX release notes page.