VIX and Yield Signaling with VecViz INTERN-als

April 28, 2026

I wrote “Quantifying the Kindling” in mid March, after hearing Lloyd Blankfein on a podcast comparing market conditions to an abundance of kindling on a forest floor. I knew then that it was a precursor to something more comprehensive I wanted to share on the subject of macro risk, which is the subject of this blog.

Today VecViz has started publishing expected 5 day forward movements in the VIX and Yields for the 2 Year and 10 Year Treasury. You can find them here, and we will update them at the end of each week.

In this blog I will discuss the machine learning ensemble behind these forecasts. Then, I will discuss the “human learning ensemble” of five VecViz winter 2025/ 2026 interns: Bangrui Yan, William Chang, Fathmat Samira Bakayoko, Grant Morgenfeld, and Jackson Keefer, whose research made it possible.

The Machine Learning Ensemble Behind the VIX and Yield Forecasts

The ensemble is a pair of Random Forests (RFs)¹, each classifying the next 5 trading days into one of three regimes: higher, flat, or lower².

We rely on the consensus of these two models. If they agree, the prediction is set. If they completely diverge (one says ‘higher’, the other ‘lower’), they cancel each other out, resulting in a ‘flat’ prediction. If one predicts ‘flat’ and the other takes a stance, we defer to the model with the directional conviction.

Details of how the models were trained and specified include:

The initial training data begins with the start of VecViz’s out of sample track record, 1/31/2022, and extends to 4/30/2024. Features considered were daily cross-ticker averages and dispersions of 42 features (VecViz features plus Sigma³), along with 3-day-over-10-day moving-average ratios of each — 168 features in all.
To these 168 features we applied a single RF classifier for this initial training period. The top 16 in terms of Feature Importance (the standard RF measure of how much each feature contributes to the trees’ splits) are the only features used in the subsequent testing period, which spans from 4/30/2024 through 4/24/2026.
We recalibrate the “All Ticker” model every six months of the Test period using only these 16 features. We also identify the sector whose RF, constructed from the 16 features, has the greatest trailing directional accuracy every six months. The winning sector RF along with the “All Ticker” RF is the forecast ensemble for the following 6 months.

Forecasts as of last Friday’s close illustrate the consensus approach discussed above:

Which leads to fairly steady prediction behavior, as illustrated by the consensus predictions generated in the backtest for the VIX. See the report for similar charts corresponding to the USTY_2yr and USTY_10yr:

Backtesting results and two key takeaways

1. Sigma-based features are entirely absent from the Top 16 of both yield target variables and appear no higher than #10 of the Top 16 features utilized in testing for the VIX. That is no small thing, particularly with regard to VIX, as VIX is forward Sigma implied by option pricing.
2. The directional predictions generated by this VecViz feature-driven Random Forest ensemble for VIX and Treasury yields had significantly better accuracy than chance during the 4/30/2024 through 4/24/2026 test period. Overall accuracy ranged from 40–43% (vs. a 33% baseline) over nearly 100 distinct 5-day periods. Directional accuracy ranged from 60–68% (vs. a 50% baseline) over the 50-60 distinct 5-day periods where both the prediction and the outcome were not “flat”. See the tables and charts below for more detail.

“Acc.” is an abbreviation for “Accuracy”, which is the frequency with which the prediction exactly matches the outcome, across all categories, including “Flat”. “DirAcc” is an abbreviation for “Directional Accuracy”, which is the frequency of exact matches of predictions and outcomes excluding “Flat”” predictions and outcomes.

The variability of the accuracy metrics for the VIX over the entire test period is depicted below. See the report for similar charts corresponding to the USTY_2Yr and USTY_10yr.

The Human Learning Ensemble Behind the Machine Learning Ensemble

The five winter interns listed above were brave enough to attempt forecasting VIX and Yields with VecViz data as their internship project. Here are the highlights of the contributions these interns made during that brief four week period, listed in chronological order. Each intern contributed at least one item listed.

1. A strong start that triggered sector exploration – a Python grid search at the ticker and feature level across a subset of tickers revealed some strong, though narrowly focused, signals. This prompted me to encourage sector-level exploration with the other interns.
2. Early validation of certain “All Ticker” feature averages – concept-driven exploration at the “All Ticker” average level with autoregressive adjustment surfaced noteworthy features. These broader, though limited, findings were encouraging and later confirmed with systematic grid search.
3. Curated exploration of “All Ticker” level average feature strength indicated variability by target regime. This prompted me to encourage the other interns to explore the same dynamic more systematically and triggered our initial expectation of an RF-based solution, given their strength with non-linear feature dynamics.
4. Grid search at the “All Ticker” and sector level of feature averages identified several additional promising features.
5. Grid search at the “All Ticker” and sector level of feature dispersion identified several more promising features.
6. Systematic evaluation of feature strength variability by target regime confirmed meaningful differentials, bolstering our expectations of an RF-based process.
7. A substantial part of our end-to-end production pipeline was built by an intern, initially with random forest and XGBoost across all data. It included the walk-forward with strict look-ahead controls⁴ and a best sector selection routine. It also included reporting of directional accuracy — nudging us toward the categorical structure we ultimately adopted.

I was still processing their work when I wrote “Quantifying the Kindling“, in mid March. I had to write that blog first, in order to demonstrate both conceptually and in terms of the performance of VecViz’s existing ticker level metrics, how VecViz features are pertinent to macro volatility. Similar blogs could be written about many of the other VecViz based features in the top 16 of each target variable, listed below:

*The feature suffix “_disp” refers to cross-ticker dispersion. The feature suffix “_ma3o10” refers to a 3 day moving average divided by a 10d moving average.* Features with neither suffix are evaluated as cross-ticker means. Some of these features are discussed in the FAQ and in other methodology blogs. Many were also surfaced by the interns in their research. Reach out to us at admin@vecviz.com for a comprehensive definition list.

Brief musings on our experience vs. what I’ve read of agentic powered research

I detailed the contributions of VecViz’s winter interns not just to acknowledge their contribution, but also because of the contribution it can make to the discussion on agentic vs human driven quant research, which I know is important to many quants, especially those earlier in their career.

I am not even a novice with regard to building or even using agentic research frameworks. That said, from what I have read of them, I think there is a strong contrast to be made between the organic⁴ curiosity, flexibility, judgement, and synthesis that I believe is evident in the research journey I described, and the rigid, predefined role based tasks assigned to AI-agents coordinating with each other via a central, mostly procedural, “referee” agent.

Could an agentic research framework be sufficiently defined and resourced to allow for the creation of something like the RF ensemble described above? I get more and more out of Claude and Gemini each day, but I doubt they would be able to do it.

Even if there was an abundance of material about VecViz that the ai could use to orient the path of its research, there are just too many model permutations to prescript and proscribe. For sake of clarity, I distill some of those choices for you from the discussion above here:

1. What value to forecast (continuous vs. categorical, value vs. change, outright or percentile across historical context)?
2. What data, if any, should be excluded from the analysis?
3. For what forward time horizon (numerous choices)?
4. How to aggregate tickers by date, if at all (sector, all, etc.)?
5. What metrics to apply across tickers by date (central tendency, dispersion, etc. — several choices for each)?
6. What, if any, metrics to apply across dates (momentum, correlation, etc., all with countless lookback period choices)?
7. What, if any, feature combinations or transformations to implement (countless possibilities without conceptual justification⁶)?
8. What modelling framework (regression, KNN, RF, Neural Net, etc.)?
9. Whether to combine models into an ensemble, and if so, how?
10. Where to split the data for training and testing (numerous choices)?
11. Whether to walk-forward training during the test period, and if so, how frequently (again, numerous choices)?

Conclusion

Hubris is the biggest risk any quant faces, and many a quant has been burned attempting to forecast VIX and rates. Thus, I probably never would have attempted this project without the support of this group of talented interns. It is remarkable what they accomplished, despite the worthy distraction of the holidays and related travel, and against a feature library that gave them no off-the-shelf intuition to lean upon. This was an ambitious project, and they stepped up. I am grateful for their work and for the chance to have worked with them, and I look forward to employing this model and continuing to build upon it.

Random Forests have merit for this exercise beyond their thematic fit with the visual below the title. A Random Forest is a machine learning algorithm that prevents overfitting by building hundreds of individual “decision trees”—each trained on a random, limited slice of data and features—and averaging their predictions into a single, highly robust consensus. ↩︎
Based on equal tercile rankings of historic 5d movements of each. In the post 1/31/2022 period studied and tested, “Flat” for the VIX 5d forward is movement of approximately +/- 1. For the 2yr and 10yr yield it is approximately +/- 5 to 10bps. See the report for details. ↩︎
Gaussian “bell curve” based volatility, enhanced with exponentially weighted time decay of observations over a trailing two year period. Tracked by VecViz for comparison purposes only, plays no part in any VecViz feature or other model. See FAQ for more. ↩︎
In both a figurative and literal sense, I suppose. ↩︎

The Machine Learning Ensemble Behind the VIX and Yield Forecasts

Backtesting results and two key takeaways

The Human Learning Ensemble Behind the Machine Learning Ensemble

Brief musings on our experience vs. what I’ve read of agentic powered research

Conclusion

Leave a Comment Cancel Reply