The Chrome User Experience Report (also known as the Chrome UX Report, or CrUX for short) is a dataset that reflects how real-world Chrome users experience popular destinations on the web.
CrUX data is collected from real browsers around the world, based on certain browser options which determine user eligibility. A set of dimensions and metrics are collected which allow site owners to determine how users experience their sites.
The data collected by CrUX is available publicly through a number of Google tools and third-party tools and is used by Google Search to inform the page experience ranking factor.
Not all origins or pages are represented in the dataset. There are separate eligibility criteria for origins and pages, primarily that they must be publicly discoverable and there must be a large enough number of visitors in order to create a statistically significant dataset.
In the following text, we document access to CrUX via Google BigQuery.
The following Python code downloads the CrUX list of November 2024 with the rank across all countries. It expects that you have set an environmental variable with a path to the credentials JSON file: export GOOGLE_APPLICATION_CREDENTIALS=“/your/path/to/creds.json”
. It also expects you to have installed google-cloud
, google-api-python-client
, google-cloud-bigquery[pandas]
, and pandas
Python packages.
import pandas as pd # to load it into Pandas DataFrame from google.cloud import bigquery from google.oauth2 import service_account YYYYMM = 202411 # let's download only November 2024 data LIMIT = 5 # limit the query to only first 5 results to reduce risk client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS query = f"SELECT * FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM} LIMIT {LIMIT}" df = client.query(query).to_dataframe() print('Dataframe:') print(df) print('Dataframe columns:') print(df.columns) df.to_csv(f'crux_all_{YYYYMM}.csv')
Example output:
Dataframe: country_code yyyymm ... interaction_to_next_paint navigation_types 0 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... {'navigate': {'fraction': 0.682}, 'navigate_ca... 1 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 2 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 3 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None 4 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None [5 rows x 15 columns] Dataframe columns: Index(['country_code', 'yyyymm', 'origin', 'effective_connection_type', 'form_factor', 'first_paint', 'first_contentful_paint', 'dom_content_loaded', 'onload', 'first_input', 'layout_instability', 'largest_contentful_paint', 'experimental', 'interaction_to_next_paint', 'navigation_types'], dtype='object')
country_code
: ISO 3166-1 alpha-2 of 238 countries. To get the country codes, you have to query the chrome-ux-report.experimental.country
table, which was added in 202004 CrUX releaseyyyymm
: year and monthorigin
: urleffective_connection_type
: enum with connection types (e.g., 4G) formFactor
: enum of user devices: PHONE
, TABLET
, and DESKTOP
first_paint
: histogramfirst_contentful_paint
: histogramdom_content_loaded
: histogram of access speeds?onload
: histogramfirst_input
: histogramlayout_instability
: emptylargest_contentful_paint
: {'histogram': {'bin': array([{'start': 0, 'end': 100, 'density': 0.0022}…
experimental
: {'time_to_first_byte': {'histogram': {'bin': array([{'start': 0, 'END': 100, 'density': 0.1283},…]), 'interaction_to_next_paint': None, 'permission': None, 'popularity': {'rank': 100000}
Number of records in 202210 (collected by multiple queries like SELECT count(DISTINCT origin) as num_orig FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202210 AND country_code = 'de
'):
To select top 10k US websites in November 2024:
SELECT DISTINCT country_code, origin, experimental.popularity.rank AS rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202411 AND country_code = 'us' AND rank <= 10000
To sample websites from EU and EFTA for privacy studies. This code will get origins of 500 countries in each of the rank
x country
combinations.
import pandas as pd # to load it into Pandas DataFrame from google.cloud import bigquery from google.oauth2 import service_account YYYYMM = 202411 # let's download only November 2024 data RANKS = (1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000) # all ranks COUNTRIES = ( 'at','be','bg','hr','cy','cz','dk','ee','fi','fr','de','gr','hu','ie','it','lv','lt','lu','mt','nl','pl','pt','ro','sk','si','es','se', # EU states 'is','li','no','ch', # EFTA states 'gb' # Other states ) SITES_N = 500 client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS query = f"SELECT DISTINCT country_code, origin, experimental.popularity.rank as rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM}" df = client.query(query).to_dataframe() df.to_csv(f'crux_all_{YYYYMM}.csv') # store crux original data # this takes a while to process websites = { country: { rank: set(df[(df['country_code'] == country) & (df['rank'] == rank)]['origin'].values) for rank in RANKS } for country in COUNTRIES } sampled_websites = set() for rank in RANKS: for country in COUNTRIES: source = websites[country][rank] sampled = set(sample(tuple(source), min(SITES_N, len(source)))) sampled_websites = sampled_websites | sampled with open(f'sampled_urls_{YYYYMM}.txt', 'w') as wf: for url in sampled_websites: wf.write(url + '\n')