Nugget Casino: Daily Coin-In Modeling

End-to-end pipeline: exploratory analysis, spatial clustering, feature engineering, model training (RF / XGBoost / LightGBM), SHAP explainability, and optimization.

Model

Random Forest

Best R²

0.9342

Typical Error

~$15 on ~$300/day

1) Exploratory Analysis: Daily Coin-In Heatmaps

I started by visualizing daily coin-in at the machine level to understand spatial and behavioral patterns across the floor. This helped confirm that “where a machine is placed” has strong signal.

Heat map of daily coin-in by machine location — Heat map of daily coin-in (customer dollars inserted per machine per day).

2) Choosing K: Elbow Method

To discretize the floor into spatial regions, I used the elbow method to choose an appropriate number of clusters. I selected K = 13 where the marginal gain in inertia reduction leveled off.

Elbow method plot for K selection — Elbow method: selected K=13 where the slope levels off.

3) Spatial Clustering: K-Means on Machine Locations

I ran K-means on machine coordinates to assign each machine a spatial cluster. These clusters became categorical features (and helped define proximity features) used in the supervised model.

Casino floor layout color-coded by K-means clusters — Floor layout color-coded by K-means cluster assignment.

4) Feature Engineering

Temporal: day-of-week, weekend flag, holiday flag.
Spatial: near door, near restroom, bar area, cluster id, proximity proxies.
Machine: one-hot encoding for game type / cabinet / denomination categories.
Leakage-aware “machine strength”: engineered a historical-performance feature while avoiding direct leakage from the target day.

I used SHAP to interpret which features actually drove predictions overall, and then zoomed in on the most important spatial vs temporal drivers.

Overall feature importance (SHAP) — Overall feature importance

Spatial feature importance (SHAP) — Spatial drivers

Temporal feature importance (SHAP) — Temporal drivers

5) Model Training: RF vs XGBoost vs LightGBM

I trained and evaluated Random Forest, XGBoost, and LightGBM. The initial comparison plot is “zoomed out” because it shows the entire target range across all machines and days.

Model comparison plot across RF, XGBoost, and LightGBM — Model comparison across the full target range (zoomed out).

6) Random Forest: Best Performance (R² = 0.9342)

The Random Forest performed best with R² = 0.9342. In the most common daily coin-in range, predictions were typically within about $15 on an average of roughly $300 per day, which was strong enough to support layout optimization experiments.

Zoomed-in Random Forest performance showing accuracy in common range — Zoomed-in RF results: tight fit in the most common daily coin-in range.

← Back to Projects Contact →