
This is the 21st article of the JADE Advent Calendar 2025.
Yesterday's article was Hasegawa-san's PoC on adding visualization features to an AI agent
— worth checking out if you're curious.
Happy holidays!
I'm Jeremiah, and I work as a data engineer here at JADE.
I usually work between the consultants and engineering teams, helping build and maintain data pipelines when needed.
For this year's Advent Calendar, I didn't want to write anything too heavy.
Instead, I'll walk through some BigQuery features that came out in 2025 and explain why they felt genuinely useful from a data engineering and analytics perspective—especially around AI.
At JADE we frequently use GA4 data, so to keep things concrete I'll use one public GA4 dataset throughout as examples.
AI.FUNCTIONS — AI directly inside BigQuery
The biggest "oh wow" feature for me this year was AI.FUNCTIONS.
In short, you can now call LLMs directly from BigQuery SQL.
That means things like:
- Classifying text
- Extracting structure from messy strings
- Generating summaries
…all without exporting data or standing up another service.
Classifying GA4 landing pages (AI.CLASSIFY)
Let's say we want to automatically classify landing pages by intent (product, blog, checkout, etc.), without maintaining a big rules table using the AI.CLASSIFY function.
WITH landing_pages AS ( SELECT ep.value.string_value AS page_location FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`, UNNEST(event_params) AS ep WHERE event_name = 'session_start' AND ep.key = 'page_location' LIMIT 100 ) SELECT page_location, AI.CLASSIFY( page_location, ['Product', 'Category', 'Blog', 'Checkout', 'Other']) AS page_type, COUNT(*) AS sessions FROM landing_pages GROUP BY page_location, page_type ORDER BY sessions DESC


This gives you an AI-generated dimension you can immediately use downstream, for example like a distribution graph above.
What I like here is how normal and easy to use it feels:
- It works inside Dataform
- Incremental models still work
- IAM, billing, and logging all stay in BigQuery
It doesn't feel like "AI glued on the side".
It feels like SQL just got a new function.
Scoring GA4 sessions by engagement quality (AI.SCORE)
Labels are nice, but in analytics you often want a numeric signal you can rank or threshold.
That's where AI.SCORE fits really well.
Instead of defining rigid rules for "engaged sessions", we can let the model give us a continuous engagement score.
WITH session_summary AS ( SELECT user_pseudo_id, COUNT(*) AS event_count, COUNTIF(event_name = 'purchase') AS purchases FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` GROUP BY user_pseudo_id LIMIT 100 ) SELECT user_pseudo_id, event_count, purchases, AI.SCORE( CONCAT( 'Events:', event_count, ', Purchases:', purchases, 'How engaged does this session look from 1 to 10?' ) ) AS engagement_score FROM session_summary ORDER BY engagement_score DESC


Instead of a binary "engaged / not engaged" flag, you get a ranking signal:
- Higher scores tend to correspond to sessions with more interaction or conversions
- Lower scores tend to look like quick exits or low-intent sessions
This isn't meant to replace GA4's built-in engagement metrics —
it's more of a cheap, fast heuristic you can layer on top of existing logic to see a quick overview of sessions.
TimeFM — forecasting GA4 traffic without suffering
Another feature I ended up liking more than expected is TimeFM, Google's foundation model for time-series forecasting.
Using the same GA4 data, we can build a daily sessions table and forecast traffic with very little setup.
SELECT * FROM AI.FORECAST( ( SELECT PARSE_DATE('%Y%m%d', event_date) AS session_date, COUNT(DISTINCT user_pseudo_id) AS sessions FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` WHERE event_name = 'session_start' GROUP BY session_date ORDER BY session_date ), data_col => 'sessions', timestamp_col => 'session_date', horizon => 30)


No feature engineering.
No separate training pipeline.
No extra infrastructure to maintain.
Is it perfect? No.
Is it often good enough for basic forecasting, baselines, or anomaly detection? Absolutely.
One GA4 dataset, a lot of capability
Stepping back a bit, what I like about these examples is that they all use the same single public GA4 dataset, with no special setup.
Using just:
bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
we were able to:
- Enrich GA4 data with AI-generated page classifications
- Create a lightweight engagement quality signal using AI.SCORE
- Forecast sessions using TimeFM
All of this stays inside BigQuery:
- No exporting data
- No extra services to maintain
- No changes to existing IAM or billing models
It feels less like "adding AI to analytics" and more like analytics itself getting more expressive just within BigQuery.
If you're already working with GA4 data in BigQuery, these features slot into existing pipelines surprisingly naturally.
Final thoughts
BigQuery in 2025 feels like it's moving beyond "just a data warehouse".
AI inference, and forecasting are becoming native, SQL-accessible features, not bolt-ons.
From a data engineering perspective, I really like this direction:
- Fewer moving parts
- Easier governance
- Faster iteration from raw GA4 data to insight
If you already live in BigQuery, these features let you do some pretty powerful things without changing how you work.
Happy holidays, and thanks for reading! 🎄