The Broker’s Compass - An Explainable Quant Assistant for Real-Time Markets
From raw OHLCV to engineered features, short-horizon forecasts, supervised ML signals, constrained portfolio weights, an interactive Reinforcement Learning sandbox, and stress-tested trading information, rendered in a clean Gradio UI.
ML
Amaan Vora
9/16/202517 min read
The Challenge of Market Decisions
Imagine you open your terminal on a Monday with five tickers on your watchlist: a mega-cap tech name grinding upward, a cyclical showing early strength, a defensives ETF, a semiconductor darling mid-breakout, and a small-cap with noisy earnings. You’ve got a fixed budget, a one-month window, and a risk limit you actually care about.
What do you buy?
How much of each?
When do you add, trim, or do nothing?
And crucially, how do you justify those choices with numbers rather than vibes?
This looks simple until you try to formalize it. Markets are not tidy lab environments. Returns are non-stationary (yesterday’s distribution is not today’s), volatility clusters (quiet days arrive together, so do storms), and correlations shift exactly when you’d rather they didn’t. Signals that appear clean in a spreadsheet degrade when faced with slippage, gaps, transaction costs, and path dependency. A strategy that “wins on average” can still lose in the only path you actually experience.
Even basic questions resist single-indicator answers. A rising 20-day moving average might mean trend, or it might be the last breath before mean reversion. An overbought RSI can resolve by going sideways while time catches up. A strong backtest can be nothing more than a leak in your pipeline: look-ahead bias, target leakage from improperly aligned features, or normalization done across the entire series instead of on a rolling basis. The hard part isn’t getting a prediction, it’s building a pipeline where predictions survive contact with the market.
Then there’s portfolio context. Choosing “the best stock” is less important than choosing a set of positions that play well together under uncertainty. A portfolio’s risk comes as much from co-movement as from any single name. Two individually attractive bets can become one highly concentrated macro exposure once the regime shifts. Sizing, not selection, often determines whether a good idea feels like skill or looks like ruin.
Time complicates everything further. Most tools judge decisions only at entry; real portfolios live across exits, partial fills, stops, and discretion. The same signal means different things in a bull leg than in a chop regime. What helped last month can hurt this one. Any system that pretends otherwise is telling a comforting story, not managing money.
So the challenge we set for this project is plain to state and hard to execute: ingest data reliably, extract stable features, forecast prudently, learn patterns without fooling ourselves, allocate capital as a portfolio problem, simulate decisions as they would actually unfold, pressure-test the result, and explain every step in English and in numbers. If a recommendation changes, the system should show you why. If a weight is high or low, the math should make that obvious. That’s the bar.
Beyond One-Indicator Thinking
Most retail tools stop at “RSI crossed 70” or “price above the 20-day.” Useful as anecdotes, brittle as systems. Markets express structure along multiple axes at once (trend, reversion, volatility regime, participation, correlation) and the signal you trust today can flip regimes tomorrow.
Concretely, every symbol arrives as a clean daily series (Open/High/Low/Close/Volume, returns) and is transformed into a small but durable feature set. Momentum is captured with rate-of-change over short and medium windows (ROC_5, ROC_20) and with MACD (fast/slow EMA plus signal and histogram) so the model can distinguish impulse from drift. Mean-reversion pressure is proxied with RSI and Bollinger position (where price sits inside its volatility envelope) so “overbought” can be separated from “trend that isn’t done.” Volatility comes in at two horizons (Volatility_5, Volatility_20), because what matters is not just how much a name moves, but how that movement changes across recent memory. Participation is represented by Volume_Ratio (current volume over its rolling baseline), letting the system weigh whether a breakout has actual sponsorship. Market context enters through Market_Correlation and Market_Regime: each series learns not in isolation but relative to the broad tape, so a good idiosyncratic signal isn’t confused with a synchronized macro push.
The mechanics are deliberately strict. All rolling statistics are computed causally (today uses only today and earlier), targets are shifted forward (no peeking at the close you’re trying to predict), and missing values are handled by forward/backward fills and, only as a last resort, local means—never by global normalization that would smear future information backward. Features are aligned to the same index and pruned if a column is entirely missing for a name, so the downstream models see a coherent matrix rather than a hope. Nothing fancy, just the kind of hygiene that prevents “great results” from being artifacts.
The payoff isn’t that any single indicator becomes magical; it’s that the composition becomes stable. Together, these small pieces let later stages (forecasting, ML classification, portfolio sizing) reason about direction, confidence, and risk in the same breath.
Forecasts That Don’t Pretend
Short-horizon price forecasting is less about prophecy and more about structure: what’s been happening recently, how strong that behavior is, and how noisy the path tends to be. The forecaster in this system is deliberately modest. It doesn’t try to out-oracle the market; it tries to describe the next few steps in a way that respects both signal strength and volatility.
The first pass is a trend line that earns the right to matter. For each symbol, a linear regression is fit to the last ~30 closes, and the slope is damped by R². A crisp uptrend (high R²) carries forward almost undiluted; a wobbly line (low R²) gets scaled down toward neutrality. On top of that, a small, volatility-scaled random walk is added so the path doesn’t pretend to be perfectly smooth.
Because not all markets are trend markets, the second lens is mean reversion. A long-window average (typically ~60 days where available) acts as gravity. If the name has stretched too far from its base without sponsorship, this model pulls the projection inward; if it’s been suppressed, it lets the rubber band snap back a little.
For stability, the forecaster also keeps a double exponential smoothing (Holt) view: one component tracks the level, another tracks the trend, both updated causally with conservative smoothing coefficients. It’s a workhorse method that often outperforms fancier models when data are short and regimes change. Alongside it sits a moving-average momentum view: the short MA versus the medium MA becomes a damped growth factor, so a persistent but small edge accumulates gently rather than compounding recklessly. Where the history is rich enough and the environment allows, an auto-ARIMA fit is attempted with tight bounds and error-tolerant settings.
All of these are then combined into an ensemble with fixed, transparent weights. Trend and smoothing carry most of the mass; mean reversion and MA-momentum provide balance; ARIMA, when available, contributes a modest slice. The forecast horizon is intentionally short (five sessions) and all components are causal: features are computed from information available at time t, and the target sits at t+1, so there’s no accidental look-ahead.
Two practical details matter as much as any model choice. First, volatility-aware bounds: confidence ribbons scale with recent realized volatility so a quiet name doesn’t get circus-tent bands, and a jumpy one doesn’t get paper-thin ones. Second, graceful degradation: if a symbol lacks enough clean history, the system simply withholds aggressive forecasts (or falls back to the most conservative view) rather than hallucinating precision.
The result is not a single “right” number but a family of reasonable futures, each justified by a simple idea, and a weighted average that reflects how convincing those ideas are right now. In short: forecasts that respect the tape as it is, not as we wish it to be.
The Machine Learning Layer: Patterns with Guardrails
If forecasting is about sketching plausible near-term paths, the ML layer is about learning conditional tendencies: given today’s state, how does tomorrow usually behave for this specific name? It’s deliberately modest in scope and fussy about hygiene. No peeking into the future, no heroic hyperparameters, no “accuracy” inflated by target leakage.
The data that feed it are the same engineered features the forecaster uses - momentum (ROC₅ / ROC₂₀), trend structure (MACD and signal / histogram), overbought / oversold pressure (RSI), local and medium-term noise (volatility windows), participation (volume ratio), plus market context (rolling correlation to the benchmark and a simple regime tag). Before a single tree is grown, every column is forward-filled, back-filled, and then mean-filled.
There are two targets on purpose. The first is a regression target, because price level matters for sizing and error analysis. The second is a classification target, because trading decisions often live in {buy, hold, sell} space. The split is chronological: ~80% of the series to train, the most recent ~20% to test.
For structure and speed, the first pass uses random forests: one regressor for price and one classifier for direction. Depth is capped, minimum leaf sizes are enforced, and the forest size is generous but not silly. You get several nice properties for free: non-linear interactions without feature crosses, immunity to monotone rescaling, and feature importance that are actually interpretable.
Predictions are scored in ways that matter operationally: RMSE on the price path, directional accuracy for the classifier, and, more importantly, how the probability of an up day behaves across time. A 0.63 average “up” probability with stable out-of-sample accuracy is more useful to a portfolio allocator than a single-point, high-variance equity RMSE. Those directional probabilities are carried forward into allocation as a soft ML conviction signal.
Guardrails are everywhere. If a symbol doesn’t have enough clean rows or fewer than a handful of non-empty features, it simply doesn’t get a model; nothing is forced. The scaler is fit on training data only; test data are transformed consistently. The feature list is filtered per-symbol so we’re never training on a column that’s entirely NaN. And the test block is large enough to be meaningful, if holding out only five days would make the metrics noisy, we widen the training window instead of pretending five points form a distribution.
The Portfolio Layer: Risk Budgeting With Opinions
Forecasts and short-horizon tendencies are only useful if they translate into position sizes that respect drawdowns and correlation. The optimizer’s job is exactly that: take a panel of daily returns, a soft ML “wind at your back” signal, and a few common-sense constraints, then turn them into weights that survive contact with volatility.
Everything starts with the return matrix. For each symbol that cleared the data hygiene rules, we collect the daily percentage change column and align them by date into a rectangular frame. Missing points are set to zero rather than forward-filled; it’s conservative for covariance and avoids inventing co-movement where none exists. From this, the statistics you’d expect fall out: mean daily return and sample covariance, both annualized with a 252-day factor. The covariance gets a small diagonal nudge so matrices behave when a series is quiet. I don’t assume that this mean–variance snapshot captures the world perfectly; I do assume it’s the least opinionated baseline we can improve upon.
The output isn’t just a vector. Each weight is translated into a dollar target for a user-supplied portfolio size, then rounded down to a whole number of shares. That rounding gap is tracked, not ignored, so the UI can show “target vs. actual” and an efficiency percentage. For every name we also compute what its own annualized return and volatility looked like over the sample window.
There is a small but important interaction with the learning layer. The classifier’s average “up tomorrow” probability is not injected as a phantom return; it’s treated as a conviction hint when ties need breaking. If two names fight for the same slot on similar risk and return, the one with sturdier short-term winds gets the nudge. That keeps the optimizer honest while still letting recent information matter.
Reasoning is first-class output. Alongside the allocation table you’ll find plain-language notes for each position: whether it’s a core or supporting weight, whether its expected return sits above the portfolio average, whether it earned its place through lower volatility or a stronger near-term signal. There’s also a portfolio-level read: expected annual return, volatility, Sharpe, number of positions, a Herfindahl index to quantify concentration, and the cash that remains after share rounding.
The philosophy is simple. Use classical risk math because it’s stable. Add gentle regularization because markets are noisy. Let machine learning inform the tie-breaks, not dictate the outcome. And surface enough numbers that you can see, at a glance, which parts of the decision came from statistics and which came from judgement.
The RL Sandbox: Decisions You Can Overturn
The trading environment is intentionally small and legible. Each timestep is a row from the engineered feature set: RSI, MACD and its signal/histogram, a 20-day volatility proxy, a volume pressure ratio, 20-day rate of change, a market-correlation estimate, and, when available, two context flags (price versus the 20-day moving average and position inside Bollinger bands). Those are concatenated into a state vector and fed to a thin decision layer that doesn’t try to be clever: it maps the state into one of three actions and a confidence score.
There are three “personalities,” each encoded as thresholds and guardrails rather than a black-box reward: conservative waits for oversold/overbought extremes and bigger momentum, moderate needs two confirming conditions, and aggressive reacts earlier and sizes larger. Position size isn’t a mystery number; it’s an explicit scalar (≈30%, 50%, 80% of capital in the single-name sandbox) that flips to zero on exit.
The sandbox measures itself like a trader would. Each round-trip contributes a percentage PnL; across the sequence we compute a total return, a win rate, average win and average loss, and a simple Sharpe on trade returns. The point isn’t to declare victory over benchmarks; it’s to see whether the rule set behaves.
Two design choices keep our output honest. First, no peeking: targets are shifted, and indicators are computed with data available at time; there’s no look-ahead leakage. Second, no leverage inflation: the position scalar is capped and capital resets only on exits, so a run of buys doesn’t silently stack exposure. If the sandbox looks good, it’s because the rules navigated the path, not because the accounting was generous.
It’s worth stating what this sandbox is not. It isn’t a deep-RL policy gradient trained on millions of episodes; it’s a controlled, interactive harness where you can change thresholds, swap the personality, and immediately see the consequences on real historical sequences. That constraint is intentional. In markets, transparency is a feature, not a bug. Here, every trade is traceable back to a handful of conditions you can read out loud.
The practical payoff is twofold. For engineering, it’s a fast loop to test whether a freshly engineered feature adds anything to decision quality before you wire it into heavier models. For users, it’s a living explainer: the same indicators they see in the price tab drive the trades in the sandbox, and the rationale text ties those worlds together.
Stress Testing: What Breaks and What Bends
The engine runs five canonical shocks. Market Crash applies a −20% return shock and multiplies daily volatility by 2.5—calibrated to COVID-style drawdowns. Volatility Spike barely moves price (−2% drift) but triples volatility to simulate a VIX event where spreads widen and signals get noisy. Mild Correction knocks 10% off with 1.5× volatility—this is the market’s “routine service.” Bull Rally lifts by +15% and dials volatility down to 0.8×, a sentiment overshoot that can expose momentum or concentration risk. Stagflation adds a steady −5% drag with 2× volatility to stress duration risk and correlation breakdowns.
Under the hood, the math is deliberate and visible. Aggregation respects weights instead of inventing cross-terms we didn’t estimate. For each scenario, the portfolio’s stressed return is the weighted sum of asset stresses, and total risk is the root of weighted squared contributions, which is conservative when correlations jump in a crisis. Impact severity is labeled by thresholds you can read without a PhD: below −15% is “Severe,” −8% to −15% “Moderate,” −3% to −8% “Mild,” above +8% “Very Positive,” and the rest “Neutral.”
The design choices bias toward explainability. Shocks are additive in drift and multiplicative in dispersion so they compose cleanly with realized history. The VaR is non-parametric (a percentile on adjusted daily returns) to avoid distributional bravado. Most importantly, stress results flow downstream. The advisory layer reads each name’s worst-case scenario return and raises the risk flag when needed; the portfolio summary carries the resilience score and worst/best case alongside expected return and Sharpe; and the UI pins a short, plain-English explanation under the charts (“Volatility Spike: high uncertainty with minimal price drift; portfolio stability hinges on dispersion control”). Very simply, it’s a disciplined way to answer two practical questions: what hurts the user, and by how much.
The Advisory Layer: From Signals to Decisions
The Broker Advisor reads every artifact the engine produces and writes back a decision for each ticker with a confidence score, a price target, a risk label. The logic begins with price targets because those anchor expectations. If an ensemble forecast exists, that becomes the short-horizon target; if not, the advisor falls back to the trend or exponential smoothing estimate. A projected +5–10% move tips the recommendation toward constructive; a projected −5–10% warns the other way; anything in the middle is acknowledged as sideways.
Machine-learning signals add directional weight, but only as far as they’ve earned it. The advisor computes the average probability that tomorrow closes higher than today and reads the model’s realized accuracy on a held-out slice. The RL module contributes in a different voice: it doesn’t predict, it acts. The advisor reads the agent’s most recent buy/hold/sell behavior, its confidence trace, and the running P&L from completed trades. If the agent has been buying with high confidence and positive realized return, that leans the recommendation toward BUY; if it has been selling into weakness with conviction, that counts against the name. Risk is never bolted on at the end. For each ticker, the stress harness reports a spectrum of scenario returns; the advisor takes the worst case across the negative scenarios and labels the name High, Medium-High, or Medium risk with thresholds you can understand without squinting (worse than −20%, worse than −10%, or better). Technical context rounds out the picture with the few indicators that actually travel: RSI and moving averages. An RSI in the 20s is mentioned as oversold; one in the 70s as overbought. Price above a rising 20- and 50-day average is noted as bullish structure; below both as bearish.
Under the hood, the advisor tallies bullish, bearish, and neutral votes from each subsystem, accumulates a base confidence (typically around fifty to sixty percent), and then adjusts that confidence based on realized evidence: has the RL engine been profitable here, is the ML classifier actually accurate, do forecasts agree? When bullish evidence materially outweighs bearish and confidence is healthy, the decision resolves to BUY; the mirror image yields SELL; in between, you’ll see HOLD, sometimes with a tilt (“HOLD/BUY” or “HOLD/SELL”) when the evidence is leaning but not decisive.
It doesn’t stop at tickers. The same discipline rolls up to a portfolio summary that reads like a one-page memo. You get the expected return, volatility, and Sharpe of the optimized book; a diversification score derived from the Herfindahl index so concentration is numerically explicit; a resilience score from the average stress return across the adverse scenarios; and a count of how many names landed in BUY, HOLD, and SELL, along with the average confidence. The summary also carries a sentence on overall sentiment—bullish, bearish, or cautious—driven by the recommendation mix rather than anyone’s mood.
The net effect isn’t a chorus of models shouting over each other. It’s a clean ledger entry for each name: here’s what the short-term forecasts imply, here’s what the classifier believes and how often it’s been right, here’s how the trader has behaved and what it made, here’s how the name survives the bad weeks, and here’s the technical state of play.
Validation & Backtesting: How We Trust the Numbers
Before any recommendation shows up in the UI, it earns its place through repeatable tests that try to break it. The entire pipeline is evaluated with walk-forward validation so it sees history only as it would have at the time. That means time-ordered splits, rolling retrains, and no peeking. Leakage is the silent killer of financial ML, so we treat it as a first-class bug. Features are computed from lagged values with explicit shifting where necessary; targets are defined and then separated from the feature matrix before any imputation; scalers and models are fit only on the training slice and applied to the holdout slice; and any cross-sectional signals that reference “the market” use a market proxy that excludes the asset being predicted. Forecasters report short-horizon error on prices. Classifiers are judged on directional correctness and calibration: accuracy is necessary but insufficient; expected calibration error and reliability curves tell us whether a “70% up” probability actually corresponds to seven up days out of ten. The RL simulator is held to realized trade P&L on the validation walk, not the training period, and its win rate is reported alongside average win and loss to discourage strategies that “win often, lose catastrophically.” Portfolio optimization is evaluated at the portfolio level: the walk-forward book is rebalanced on the same schedule the optimizer assumed, and we record realized return, volatility, drawdown, and out-of-sample Sharpe over each segment before aggregating.
Baselines matter. Every component must beat something simple that an honest skeptic would use. For forecasts, that’s a drift-adjusted last price and a naive moving average. For direction, it’s a rolling sign of momentum. For the book, it’s equal weight and inverse-volatility allocations with the same rebalance frequency. If a fancy method cannot clear those bars out of sample, it doesn’t ship, and the advisor falls back to the simpler alternative without ceremony.
Ablations keep us honest about attribution. We rerun the walk with MACD removed, then with RSI removed, then with the correlation features withheld, and we watch what moves. If the RMSE refuses to budge when a feature “should” matter, we stop treating that feature as signal and we stop talking about it as if it were. The same discipline applies to the optimizer’s constraints: minimum and maximum weights, turnover caps, and risk penalties are toggled on and off in isolation to observe their effect on realized risk, not just on in-sample elegance.
Finally, performance claims are bounded by uncertainty. Confidence intervals are computed by segmenting the walk, bootstrapping where appropriate, and reporting medians with interquartile ranges rather than single-point bravado. The goal isn’t to manufacture certainty; it’s to document where the engine is trustworthy, where it is tentative, and where it is silent.
Deployment & Reproducibility: Making It Run the Same Way Twice
This system isn’t a notebook trick that works once on a lucky day; it’s wired to be repeatable. The stack is straightforward: Python 3.11, pandas/numpy for numerics, scikit-learn for the baseline models, optional lightgbm for gradient boosting, optional pmdarima for ARIMA, cvxpy (or scipy.optimize) for portfolio construction, plotly for charts, and gradio for the interface. Data arrives through yfinance and gets cached to disk so reruns don’t hammer upstream APIs.
Determinism is enforced wherever it makes sense. Random seeds are fixed at 42 for the RF models and NumPy draws; resampling windows are anchored by index; and every transformation that could leak information looks backward only. There’s a FAST_DEMO flag that cuts estimator sizes and disables slow models (ARIMA, LightGBM) so you can smoke-test the UI in seconds, then flip to full mode for a proper run.
Typical runs (daily bars, ~5–12 tickers, 2–5 years) finish on a laptop-class machine (16 GB RAM) in a few minutes in full mode and under a minute in demo mode. The heaviest steps are covariance estimation for optimization and ARIMA fitting; if those become bottlenecks, the code cleanly downgrades to the scipy optimizer and skips ARIMA without breaking the pipeline. Logs announce every phase, what’s being skipped, and why, so you can tell the difference between “didn’t run” and “ran and found nothing.”
Appendix: Defaults, Knobs, and What They Actually Do
Data ingestion. Daily OHLCV via yfinance; default lookback 2–5 years. Caching enabled. Market proxy for correlation features is a broad index (e.g., SPY) fetched alongside the selection. Typical batch: 8–12 tickers, ~500–1,250 trading days per series (~4,000–15,000 rows per run after merges).
Feature engineering. Price MAs (5/20/50), RSI(14), MACD(12,26,9) and histogram, Bollinger Bands(20,2), rolling vol (5,20), volume ratio, ROC(5,20), market correlation (rolling Pearson vs proxy), regime flags (volatility tertiles / trend filters). All features are lag-safe and forward-/back-filled conservatively.
Forecaster. Horizon=5 sessions. Trend fit on last 30 bars (linear regression), slope scaled by R²; random walk noise ~ σ×√h×0.1. Double exponential smoothing with α=0.3, β=0.1. MA-momentum forecast using 5/20 spread with damping 0.1 per step. Optional ARIMA (auto_arima, recent 100pts, max p/q=3, max_order=6). Ensemble weights: Trend 0.30, Mean-reversion 0.20, ES 0.25, MA 0.15, ARIMA 0.10. All outputs clipped to ±50% around the last price; confidence bands scale with 20-day realized vol.
Supervised ML. Features: RSI, MACD, MACD_Signal, MACD_Hist, Vol_5, Vol_20, Volume_Ratio, ROC_5, ROC_20, Market_Correlation, Market_Regime (subset if missing). Targets: next-day price (regression), next-day direction (binary). Split: first 80% train, last 20% test (time-ordered). Scaling: StandardScaler fit on train only. Models: RandomForestRegressor/Classifier (n_estimators=100 full / 50 demo, max_depth=10, min_samples_split=5, min_samples_leaf=2, random_state=42). Optional LightGBM regressor/classifier (n_estimators=100, max_depth=6, lr=0.1). Reported metrics: RMSE for price, accuracy and probability calibration for direction; feature importances exposed.
Portfolio optimizer. Objective: maximize μᵀw − λ·wᵀΣw with λ=0.5 (annualized μ, Σ). Constraints: sum(w)=1; 2% ≤ wᵢ ≤ 40%. Primary solver: cvxpy; fallback: scipy SLSQP Sharpe-max with concentration penalty (0.1·∑w²). Final fallback: risk parity (inverse vol), then equal weight. Allocations rounded to whole shares; report shows target vs actual, efficiency, weights, and position count.
RL simulator. State: subset of engineered features plus normalized price positioning (e.g., vs MA20, Bollinger position). Actions: buy / hold / sell. Styles: conservative (RSI<25 buy / >75 sell, momentum threshold 0.015, position 0.3, stop 8%, take 12%, hold 10); moderate (30/70, 0.01, 0.5, 8%/12%, hold 7); aggressive (35/65, 0.005, 0.8, 12%/20%, hold 5). Outputs: action stream, position time series, per-trade P&L, cumulative P&L, win rate, average win/loss, simple Sharpe.
Stress tests. Market crash (−20% shock, 2.5× vol), volatility spike (−2% drift, 3× vol), mild correction (−10%, 1.5×), bull rally (+15%, 0.8×), stagflation (−5%, 2×). Portfolio roll-ups: shocked annual return, vol, Sharpe, and VaR-style summary. Impact labeled from Very Positive to Severe.
Advisor. Merges: near-term ensemble forecast delta, ML direction probability and historical accuracy, RL recommendation and realized backtest stats, optimizer target weight, and stress test worst/best case. Output: BUY / HOLD-BUY / HOLD / HOLD-SELL / SELL with confidence, price target, and compact reasoning.
UI. Gradio with tabs for data/technicals, forecasts & ML, optimization, RL, stress tests, and advice. Charts: Plotly with dark theme, high-contrast annotations, readable overlays. States are reset per analysis; dropdowns update to the actual tickers successfully analyzed.
That’s the full arc.
Playground: https://huggingface.co/spaces/deadven7/stockbroker-assist
Code: https://github.com/deadven7/stockbroker-assistant/tree/main