A GA4-Aligned Analytics Project for Session Scoring, Revenue Opportunity, and Marketing Decision Support
Repository: Predictive-Funnel-Analytics-GA4
Author: Troy Dela Rosa
Tools: Python · pandas · scikit-learn · XGBoost · SHAP · Streamlit
Focus: Ecommerce Analytics · Revenue Prioritization · Conversion Propensity · Demand Signals · Retail Decision Support
Built a GA4-style ecommerce scoring system that ranks sessions by expected revenue, not just conversion probability.
The top 10% of scored sessions converted at 15.33%, roughly 3.0x higher than the baseline conversion rate of 5.16%.
The project shows how behavioral analytics can support marketing prioritization, margin protection, and revenue-focused decision-making.
Ecommerce teams often know how much traffic they receive, but not all traffic deserves the same level of marketing attention.
Some sessions may convert without an incentive. Others may need a targeted nudge. Low-intent sessions can consume budget that could be better used elsewhere.
This project helps answer:
Which sessions should marketing, pricing, and ecommerce teams prioritize without wasting discounts on customers who are likely to buy anyway?
The goal is to connect:
Behavior → Revenue → Business Action
This project developed a two-stage ecommerce scoring framework that ranks sessions by expected revenue.
The model identified a clear concentration of purchase intent in the highest-ranked sessions. While the overall test conversion rate was 5.16%, the top 10% of scored sessions converted at 15.33%, representing roughly 3.0x lift over baseline.
The strongest use case is not simply predicting who will buy. The stronger use case is helping teams decide:
- Which sessions deserve attention
- Which customers should be protected from unnecessary discounting
- Which mid-funnel users may respond to selective incentives
- Which low-intent sessions should be deprioritized
| Metric | Result |
|---|---|
| Baseline conversion rate | 5.16% |
| Top 10% conversion rate | 15.33% |
| Lift over baseline | 3.0x |
| ROC-AUC | 0.80 |
| Calibrated expected revenue | $1.97M |
| Actual test revenue | $1.74M |
| Revenue variance | +13.4% over actual |
The model is strongest as a ranking and prioritization layer.
It translates ecommerce behavior into operational decisions for marketing, pricing, merchandising, and retention teams.
| Metric Area | Business Question |
|---|---|
| Conversion rate | Are sessions turning into purchases? |
| Propensity lift | Can high-intent sessions be identified earlier? |
| Expected revenue | Which sessions are likely to be worth the most? |
| Segment actionability | What should the business do with each scored session? |
| Calibration accuracy | Can scores be trusted for planning and prioritization? |
The model successfully separated high-intent sessions from general traffic.
The overall conversion rate was 5.16%, while the highest-ranked 10% of sessions converted at 15.33%, roughly 3.0x higher than the baseline rate.
This shows that the conversion opportunity was concentrated in a relatively small group of high-ranked sessions rather than evenly distributed across all traffic.
Business meaning:
Marketing teams can prioritize campaign spend, retargeting, and recovery actions toward sessions with stronger buying signals, while avoiding unnecessary incentives for lower-probability traffic.
A session with high purchase probability is not always the best revenue opportunity.
This project separates:
- Session behavior, which signals purchase intent
- Customer history, which signals wallet potential
Then combines both signals:
Expected Revenue = Probability of Conversion x Predicted Spend
Business meaning:
The decision shifts from “Who is likely to buy?” to “Which session is worth prioritizing?”
High-certainty users may already be likely to convert, so aggressive discounting can reduce margin. Low-interest users may not be worth conversion-focused spend.
The strongest opportunity sits in the middle, where users show enough intent to be worth engaging but may still need a nudge.
Business meaning:
Mid-propensity users are often the best audience for cart recovery, remarketing, limited-time incentives, or personalized offers.
The initial model was useful for ranking sessions, but its raw probabilities were too high for financial planning.
After probability calibration, expected revenue became more realistic:
| Revenue Measure | Amount |
|---|---|
| Calibrated expected revenue | $1.97M |
| Actual test revenue | $1.74M |
| Variance | +$233K |
| Percent variance | +13.4% |
Business meaning:
If scores are used for planning or prioritization, calibration helps prevent overconfident revenue estimates.
High-certainty sessions may already be likely to convert, making aggressive discounting potentially margin-destructive.
Recommended action:
Suppress unnecessary promotions and prioritize non-price messaging such as urgency, reassurance, or product reminders.
The At-Risk segment shows buying signals but may still need a reason to complete the purchase.
Recommended action:
Use this group for cart recovery, remarketing, selective incentives, or personalized product messaging.
Conversion probability identifies likely buyers. Expected revenue provides a stronger business ranking.
Recommended action:
Prioritize sessions using expected revenue when budget, incentive cost, or campaign capacity is limited.
Expected revenue should be paired with product margin, discount cost, and campaign cost.
Recommended action:
Move from expected revenue to expected profit or incremental margin before production deployment.
The model identifies where action may be valuable, but it does not prove incremental lift from intervention.
Recommended action:
Use holdout testing to measure whether targeted actions increase conversion, revenue, or margin.
Each scored session is mapped to a recommended action.
| Segment | Meaning | Recommended Action | Primary Stakeholder |
|---|---|---|---|
| High Certainty | Strong likelihood to convert | Protect margin and avoid unnecessary discounts | Marketing, Pricing |
| At-Risk | Persuadable session with meaningful opportunity | Use selective incentives or recovery messaging | Marketing, CRM |
| Monitor | Some signal, but not enough for immediate action | Wait for stronger behavior before spending | Digital Analytics |
| Low Interest | Weak conversion signal | Reduce conversion-focused spend | Paid Media, Growth |
This project combines business analysis, data preparation, machine learning, validation, and deployment.
Key work completed:
- Cleaned and structured event, customer, transaction, product, and campaign data
- Aggregated ecommerce events into session-level records
- Engineered behavioral, customer-history, traffic-source, and funnel-stage features
- Built and compared Logistic Regression, Random Forest, and XGBoost models
- Selected XGBoost based on ranking performance and buyer identification
- Separated purchase intent from customer wallet potential
- Created an expected revenue scoring framework
- Performed leakage checks, train / validation / test splitting, calibration, and decile lift analysis
- Used SHAP explainability to interpret model drivers
- Translated model outputs into business segments and recommended actions
- Developed a Streamlit app for session scoring and business exploration
The first stage estimates whether a session is likely to convert.
- Model: XGBoost classifier
- Output: Probability of purchase
- Purpose: Rank sessions by purchase intent
The second stage estimates how much the customer may spend if they convert.
- Uses customer history instead of only session behavior
- Estimates expected basket value
- Separates purchase intent from wallet potential
Each session receives:
- Conversion probability
- Predicted spend
- Expected revenue
- Business segment
- Recommended action
Raw Data
↓
Event Aggregation
↓
Feature Engineering
↓
Conversion Model
↓
Spend Estimation
↓
Expected Revenue
↓
Business Segmentation
↓
Recommended Action
↓
Streamlit App
Suggested visual addition: convert this workflow into a simple architecture diagram and save it as
visualizations/project_workflow.png.
| Stakeholder | How This Project Helps |
|---|---|
| Marketing | Prioritize retargeting and cart recovery audiences |
| Pricing | Avoid unnecessary discounting for high-certainty buyers |
| Merchandising | Detect product or category demand signals from session behavior |
| CRM | Identify users who may respond to personalized nudges |
| Ecommerce Analytics | Build a repeatable scoring layer for session-level decision support |
| Operations Planning | Use demand signals to support inventory and replenishment awareness |
A Streamlit app was built to make the scoring system easier to explore.
The app allows users to:
- Load sample session data
- Score sessions by conversion probability
- Estimate expected revenue
- Assign business segments
- Review recommended actions
- Download scored output for further analysis
You can validate this project through either the scored sample file or the interactive Streamlit app.
Open:
data/processed/scored_sample_sessions.csv
Sort by:
expected_revenue
Compare top-ranked sessions against bottom-ranked sessions.
Look for:
- Higher conversion concentration in top-ranked sessions
- Revenue concentrated in higher-priority segments
- Clear mapping from model score to recommended business action
- Difference between purchase intent and predicted spend
- Impact of calibration on expected revenue planning
python -m streamlit run app/streamlit_app.pyThe app opens at:
http://localhost:8501
Then:
- Click Load Sample Data
- View session scoring, segmentation, and demand signals
- Compare expected revenue by segment
- Download the scored output
This project is built around a GA4-style ecommerce workflow.
| Project Concept | GA4-Style Equivalent |
|---|---|
| Event-level user behavior | GA4 event data |
| Session aggregation | Session-level analytics table |
| Customer/session identifier | User and session keys |
| Traffic and campaign signals | Source, medium, campaign fields |
| Purchase revenue | Ecommerce purchase revenue |
| Session scoring | Downstream modeling / activation layer |
The dataset is synthetic, but its structure is intended to mirror a workflow that could be adapted to GA4 BigQuery export data after validation.
- Train / validation / test split
- Leakage detection and removal
- Model comparison: Logistic Regression, Random Forest, XGBoost
- Probability calibration using Platt scaling
- Decile lift analysis
- SHAP explainability
- Revenue reconciliation
The Streamlit app expects the same feature set used during model training.
Required features are stored in:
models/feature_names.joblib
The included sample file is already formatted correctly:
data/processed/scored_sample_sessions.csv
If an uploaded file is missing required columns, the app will stop and show which features are missing.
Predictive-Funnel-Analytics-GA4/
│
├── app/
│ └── streamlit_app.py
│
├── models/
│ ├── pfa_ga4_propensity_model.joblib
│ ├── feature_names.joblib
│ └── customer_spend_lookup.csv
│
├── data/
│ └── processed/
│ └── scored_sample_sessions.csv
│
├── notebooks/
│ ├── 01_modeling_pfa_ga4.ipynb
│ ├── 02_deployment_prep.ipynb
│ └── predictive-funnel-analytics-GA4-Stakeholder-Report.ipynb
│
├── visualizations/
│ ├── header.png
│ ├── streamlit_app_demo.png
│ ├── propensity_decile.png
│ └── revenue_opportunity.png
│
├── README.md
├── requirements.txt
└── .gitignore
Clone the repository:
git clone https://github.com/rynetroy/Predictive-Funnel-Analytics-GA4.git
cd Predictive-Funnel-Analytics-GA4Install the required libraries:
pip install -r requirements.txtRun the app:
python -m streamlit run app/streamlit_app.py- This project uses a synthetic GA4-style ecommerce dataset
- It is designed as an analytics prototype, not a production-ready system
- This is a ranking and prioritization system
- Measuring incremental lift from interventions would require A/B testing
- Expected revenue should be extended to expected profit before production use
Before real deployment, this system would require:
- GA4 BigQuery export validation
- Identity stitching across devices
- Event quality validation
- Attribution logic
- Campaign cost integration
- Product margin and discount-cost logic
- Model drift monitoring
- Automated scoring pipeline
- Experimentation framework for measuring incrementality
Most junior analytics portfolios stop at charts, notebooks, or model scores.
This project is different because it translates analytics into operational decisions.
Most ecommerce models ask:
“Will this customer buy?”
This project asks:
“What is this session worth, and what should the business do about it?”
By combining conversion probability with spend potential, the project turns behavioral data into a practical decision system for revenue prioritization, marketing efficiency, and margin-aware targeting.
Clicks signal intent. History signals wallet.



