Propensity Score for Customers
Propensity modeling framework to predict customer likelihood to purchase or churn.
Propensity Modeling Framework: Predicting Customer Actions
Executive Summary
Marketing efficiency hinges on targeting the right customer at the right time. This project developed a suite of propensity models to predict the likelihood of key customer actions: purchasing (Conversion Propensity) and churning (Churn Propensity). By integrating these scores into our CRM and ad platforms, we achieved a 22% increase in campaign ROI and a 15% reduction in churn rate for high-risk segments.
Problem Statement
Our marketing campaigns were previously broad and unoptimized, often targeting users who were unlikely to buy (wasting ad spend) or ignoring high-risk customers who were about to leave. We needed a data-driven way to score every customer daily on their probability to take specific actions in the next 7-30 days.
Methodology
1. Data Pipeline & Feature Engineering
- Behavioral Data: aggregated clickstream events (page views, cart adds, search queries) from Google Analytics/Segment.
- Transactional Data: Purchase history, Average Order Value (AOV), recency of purchase.
- Demographic Data: Age, location, device type.
- Engineered Features:
days_since_last_activesession_duration_avgcart_abandonment_ratecategory_affinity_score
2. Modeling Approach
- Algorithm: XGBoost (Extreme Gradient Boosting) was chosen for its performance on tabular data and interpretability (via SHAP values).
- Target Variable: Binary classification (1 = Event occurred in window, 0 = Did not).
- Handling Imbalance: Utilized SMOTE (Synthetic Minority Over-sampling Technique) and
scale_pos_weightto handle the class imbalance (conversion rates are typically low, <5%). - Validation: Time-series cross-validation (training on past months, validating on future months) to prevent data leakage.
3. Model Explainability
- SHAP (SHapley Additive exPlanations): Used to explain why a specific customer had a high score. e.g., “High likelihood to churn because
days_since_last_order> 90 andsupport_ticket_sentimentis negative.” - These insights were pushed to the CRM for agents to see.
Implementation Details
The system runs as a daily batch job on Databricks/Spark.
- ETL: Nightly job aggregates data from the Data Lake (S3/Delta Lake).
- Inference: The trained XGBoost model scores all 5M+ active users.
- Activation:
- High Propensity (Purchase) -> Pushed to Facebook/Google Ads as “High Intent Audience”.
- High Propensity (Churn) -> Pushed to Salesforce/Email tool for a “We Miss You” retention campaign.
- Low Propensity -> Excluded from expensive campaigns to save budget.
Challenges & Solutions
- Challenge: “Why did my score drop?” Stakeholders needed transparency.
-
Solution: Built a Streamlit dashboard allowing marketers to input a Customer ID and see the top 5 features contributing to their score (positive/negative).
- Challenge: Model drift over time (e.g., during Black Friday).
- Solution: Implemented automated retraining pipelines using Airflow that trigger if model performance metrics (AUC-ROC) drop below a threshold on the previous day’s data.
Results and Impact
- Conversion Rate: Email campaigns targeting the top 20% propensity decile saw a 3x higher open rate and 2x higher conversion rate.
- Ad Spend Efficiency: Reduced Cost Per Acquisition (CPA) by 18% by suppressing ads to low-propensity users.
- Retention: The proactive churn prevention campaign saved ~1,200 high-value customers per month.
Future Work
- Uplift Modeling: Instead of just predicting who will buy, predicting who is persuadable (i.e., would only buy if we show them an ad).
- Real-time Scoring: Moving from batch to real-time scoring using Feature Stores (Feast) to react to user actions within seconds.