====== Chrome User Experience Report (CrUX) ======
The Chrome User Experience Report (also known as the Chrome UX Report, or CrUX for short) is a dataset that reflects how real-world Chrome users experience popular destinations on the web.
CrUX data is collected from real browsers around the world, based on certain browser options which determine [[https://developer.chrome.com/docs/crux/methodology#user-eligibility|user eligibility]]. A set of [[https://developer.chrome.com/docs/crux/methodology/dimensions|dimensions]] and [[https://developer.chrome.com/docs/crux/methodology/metrics|metrics]] are collected which allow site owners to determine how users experience their sites.
The data collected by CrUX is available publicly through a number of [[https://developer.chrome.com/docs/crux/methodology/tools|Google tools]] and third-party tools and is used by Google Search to inform the [[https://developers.google.com/search/docs/advanced/experience/page-experience|page experience ranking factor]].
Not all origins or pages are represented in the dataset. There are separate eligibility criteria for [[https://developer.chrome.com/docs/crux/methodology#origin-eligibility|origins]] and [[https://developer.chrome.com/docs/crux/methodology#page-eligibility|pages]], primarily that they must be publicly discoverable and there must be a large enough number of visitors in order to create a statistically significant dataset.
=== API ===
In the following text, we document access to CrUX via [[https://developer.chrome.com/docs/crux/bigquery|Google BigQuery]].
- Register Google BigQuery account. Queries returning up to 1TB are free, the limit is reset monthly. It would require downloading the monthly CrUX list more than a hundred times to run out. But be careful with your SQL statements; if you query historic data, you can quickly run out of the limit and BigQuery can get expensive.
- Follow [[https://console.cloud.google.com/projectcreate|BigQuery documentation to create a project]], set up billing if prompted, and enable BigQuery API.
- Get [[https://console.cloud.google.com/apis/credentials|credentials JSON file]].
- Follow more details from [[https://developer.chrome.com/docs/crux/api|CrUX API documentation]] or skip to the example below.
==== Code Example ====
The following Python code downloads the CrUX list of November 2024 with the rank across all countries. It expects that you have set an environmental variable with a path to the credentials JSON file: ''export GOOGLE_APPLICATION_CREDENTIALS="/your/path/to/creds.json"''. It also expects you to have installed ''google-cloud'', ''google-api-python-client'', ''google-cloud-bigquery[pandas]'', and ''pandas'' Python packages.
import pandas as pd # to load it into Pandas DataFrame
from google.cloud import bigquery
from google.oauth2 import service_account
YYYYMM = 202411 # let's download only November 2024 data
LIMIT = 5 # limit the query to only first 5 results to reduce risk
client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS
query = f"SELECT * FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM} LIMIT {LIMIT}"
df = client.query(query).to_dataframe()
print('Dataframe:')
print(df)
print('Dataframe columns:')
print(df.columns)
df.to_csv(f'crux_all_{YYYYMM}.csv')
Example output:
Dataframe:
country_code yyyymm ... interaction_to_next_paint navigation_types
0 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... {'navigate': {'fraction': 0.682}, 'navigate_ca...
1 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None
2 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None
3 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None
4 es 202411 ... {'histogram': {'bin': [{'start': 0, 'end': 25,... None
[5 rows x 15 columns]
Dataframe columns:
Index(['country_code', 'yyyymm', 'origin', 'effective_connection_type',
'form_factor', 'first_paint', 'first_contentful_paint',
'dom_content_loaded', 'onload', 'first_input', 'layout_instability',
'largest_contentful_paint', 'experimental', 'interaction_to_next_paint',
'navigation_types'],
dtype='object')
==== Columns ====
* ''country_code'': [[https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2|ISO 3166-1 alpha-2]] of 238 countries. To get the country codes, you have to query the ''chrome-ux-report.experimental.country'' table, which was added in [[https://developer.chrome.com/docs/crux/release-notes#202004|202004 CrUX release]]
* ''yyyymm'': year and month
* ''origin'': url
* ''effective_connection_type'': enum with connection types (e.g., 4G)
* ''formFactor'': enum of user devices: ''PHONE'', ''TABLET'', and ''DESKTOP''
* ''first_paint'': histogram
* ''first_contentful_paint'': histogram
* ''dom_content_loaded'': histogram of access speeds?
* ''onload'': histogram
* ''first_input'': histogram
* ''layout_instability'': empty
* ''largest_contentful_paint'': ''{'histogram': {'bin': array([{'start': 0, 'end': 100, 'density': 0.0022}...''
* ''experimental'': ''{'time_to_first_byte': {'histogram': {'bin': array([{'start': 0, 'END': 100, 'density': 0.1283},...]), 'interaction_to_next_paint': None, 'permission': None, 'popularity': {'rank': 100000}''
==== Dataset Size ====
Number of records in 202210 (collected by multiple queries like ''SELECT count(DISTINCT origin) as num_orig FROM `chrome-ux-report.experimental.country` WHERE yyyymm = 202210 AND country_code = 'de'''):
* DE 1'143'612 (1'633'243 measurements)
* FR 959'672 (1'503'659 measurements)
* CH 219'127 (297'626 measurements)
* GB 1'101'480 (1'677'988 measurements)
* US 3'505'806 (5'508'774 measurements)
* all 15'629'207 (46'356'015 measurements)
==== Other Useful Code ====
To select top 10k US websites in November 2024:
SELECT DISTINCT country_code, origin, experimental.popularity.rank as rank
FROM `chrome-ux-report.experimental.country`
WHERE yyyymm = 202411
AND country_code = 'us'
AND rank <= 10000
To sample websites from EU and EFTA for privacy studies. This code will get origins of 500 countries in each of the ''rank'' x ''country'' combinations.
import pandas as pd # to load it into Pandas DataFrame
from google.cloud import bigquery
from google.oauth2 import service_account
YYYYMM = 202411 # let's download only November 2024 data
RANKS = (1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000) # all ranks
COUNTRIES = (
'at','be','bg','hr','cy','cz','dk','ee','fi','fr','de','gr','hu','ie','it','lv','lt','lu','mt','nl','pl','pt','ro','sk','si','es','se', # EU states
'is','li','no','ch', # EFTA states
'gb' # Other states
)
SITES_N = 500
client = bigquery.Client() # this will fail if you have not set GOOGLE_APPLICATION_CREDENTIALS
query = f"SELECT DISTINCT country_code, origin, experimental.popularity.rank as rank FROM `chrome-ux-report.experimental.country` WHERE yyyymm = {YYYYMM}"
df = client.query(query).to_dataframe()
df.to_csv(f'crux_all_{YYYYMM}.csv') # store crux original data
# this takes a while to process
websites = {
country: {
rank: set(df[(df['country_code'] == country) & (df['rank'] == rank)]['origin'].values)
for rank in RANKS
}
for country in COUNTRIES
}
sampled_websites = set()
for rank in RANKS:
for country in COUNTRIES:
source = websites[country][rank]
sampled = set(sample(tuple(source), min(SITES_N, len(source))))
sampled_websites = sampled_websites | sampled
with open(f'sampled_urls_{YYYYMM}.txt', 'w') as wf:
for url in sampled_websites:
wf.write(url + '\n')
=== More Details ===
* CrUX for month X is released typically two weeks after the end of month X. E.g., X=202411 for November 2024 was released on December 10, 2024. Check useful information on [[https://developer.chrome.com/docs/crux/release-notes|CrUX release notes page]].
~~DISCUSSION~~