Why Soymeal and Soy Oil Can Diverge: A Quantitative Breakdown for Developers
analyticscommoditiesquant

Why Soymeal and Soy Oil Can Diverge: A Quantitative Breakdown for Developers

UUnknown
2026-02-17
11 min read
Advertisement

Engineer crush spreads and divergence signals: compute margins, convert units, and build production-ready features to detect soymeal vs. soy oil decoupling.

Hook: stop guessing — instrument the soy complex like a data product

If your team builds price signals, risk monitors or arbitrage engines that depend on agricultural commodities, you know the pain: soymeal tanks while soy oil rockets, and your model alarms either false-positive or misses the move altogether. You need reproducible, machine-readable features that explain that divergence and can be operationalized in cloud-native ETL pipelines. This guide delivers exactly that — a quantitative walkthrough to compute crush spreads, convert units, align futures and cash series, and engineer robust features that capture soymeal/soy oil divergence for trading models, risk dashboards, or supply-chain alerts.

Executive summary (most important first)

  • Soymeal and soy oil diverge because they serve different physical markets: meal is livestock feed (protein) while oil is edible oil and an increasingly important feedstock for renewable diesel.
  • Crush spread quantifies the economics of processing soybeans into meal and oil. It is the core feature you should compute and monitor.
  • Practical engineering: unit conversions, contract-roll logic, and rolling correlation/z-score features turn raw prices into machine learning-ready signals.
  • 2026 context: ongoing renewable diesel capacity expansion (late 2025–2026) and tighter edible oil markets are key structural drivers that can decouple oil from meal.

Why soymeal and soy oil move apart — the mechanics

Soybeans are processed (“crushed”) into two main products: soymeal (protein-rich feed) and soy oil (edible oil and industrial feedstock). Price moves reflect different demand-supply elasticities and policy exposure:

  • Demand drivers: meal demand is tied to livestock margins, feed rations, and regional substitutability (corn/DDGs). Oil demand includes food-grade cooking oil and industrial users (biodiesel, renewable diesel).
  • Policy and energy overlap: since 2023–2026 renewable diesel investments and incentives in North America and Europe have increased soy oil demand independently of meal fundamentals, often strengthening oil while meal lags.
  • Processing constraints: crush capacity and maintenance cycles create local spreads; a shortage of crush capacity increases both products' values differently depending on relative demand.
  • Currency, freight and basis: meal is bulky and expensive to transport per unit protein; oil has globalized liquid markets with different freight & storage cost profiles, changing regional spreads.

Key takeaway

Think of soymeal and soy oil as correlated but distinct factors; their divergence is a structural signal — not noise — and can be quantified with derived features.

Quantitative breakdown: computing the crush spread

The most robust engineered feature for detecting divergence is the crush spread: the theoretical margin from crushing a bushel of soybeans into meal and oil. There are multiple conventions. We'll use a clear, reproducible formula and show code to compute it at scale.

Units and common market quotes

  • Soybeans: typically quoted in $ per bushel (1 bushel = 60 lb).
  • Soymeal: typically quoted in $ per short ton (1 short ton = 2000 lb).
  • Soy oil: commonly quoted in cents per pound (¢/lb) or $/lb.

Typical yields (industry convention)

One common crush conversion (approximate, industry-accepted baseline):

  • 1 bushel soybeans (60 lb) → ~44 lb of soybean meal
  • 1 bushel soybeans → ~11 lb of soybean oil
  • Mass balance: small weight loss becomes hulls & processing loss.

Simple crush spread formula (per bushel)

Compute product revenue per bushel and subtract the soybean cost:

crush_margin_per_bushel = (meal_price_per_lb * meal_yield_lb) + (oil_price_per_lb * oil_yield_lb) - bean_price_per_bushel

Where meal_price_per_lb = (meal_price_per_short_ton / 2000).

Numeric example

Use realistic quoted ideas to illustrate conversions:

  • Bean price: $10.00 / bushel
  • Meal price: $370 / short ton → $0.185 / lb (370 / 2000)
  • Oil price: 65 ¢ / lb → $0.65 / lb
  • Yields: meal_yield = 44 lb, oil_yield = 11 lb

Compute per bushel revenue:

meal_rev = 0.185 * 44 = $8.14
oil_rev  = 0.65  * 11 = $7.15
crush_margin = 8.14 + 7.15 - 10.00 = $5.29 per bushel

This $5.29 number measures how attractive it is for crushers to process soybeans. If oil spikes and meal is flat, the margin rises mostly because oil_rev increases.

Code-first: compute crush spread at scale

Below are compact, production-minded examples for Python (pandas), JavaScript (Node), and SQL. These are ready to drop into ETL jobs. They include unit conversions, rolling features, and contract-roll handling notes.

Python: bulk time-series computation (pandas)

import pandas as pd

def compute_crush(df):
    # df must contain: date, bean ($/bu), meal ($/short_ton), oil (cents/lb)
    df = df.copy()
    # convert to dollars per lb
    df['meal_$per_lb'] = df['meal_$per_st'] / 2000.0
    df['oil_$per_lb']  = df['oil_cents_per_lb'] / 100.0
    # yields
    MEAL_YIELD = 44.0
    OIL_YIELD  = 11.0
    df['meal_rev_per_bu'] = df['meal_$per_lb'] * MEAL_YIELD
    df['oil_rev_per_bu']  = df['oil_$per_lb'] * OIL_YIELD
    df['crush_per_bu']    = df['meal_rev_per_bu'] + df['oil_rev_per_bu'] - df['bean_$per_bu']

    # engineered features
    df['crush_z'] = (df['crush_per_bu'] - df['crush_per_bu'].rolling(90, min_periods=30).mean()) / df['crush_per_bu'].rolling(90, min_periods=30).std()
    df['oil_meal_ratio'] = df['oil_$per_lb'] / (df['meal_$per_lb'] + 1e-9)
    df['rolling_corr'] = df['meal_rev_per_bu'].rolling(60).corr(df['oil_rev_per_bu'])
    return df

# usage
# prices = pd.read_csv('soy_prices.csv', parse_dates=['date']).set_index('date')
# enriched = compute_crush(prices)

Node.js / JavaScript for streaming ETL

// assumes a pipeline that yields price ticks per date
const MEAL_YIELD = 44.0
const OIL_YIELD = 11.0

function perRecordCrush(record){
  const mealPerLb = record.meal_usd_per_st / 2000.0
  const oilPerLb  = record.oil_cents_per_lb / 100.0
  const mealRev = mealPerLb * MEAL_YIELD
  const oilRev  = oilPerLb * OIL_YIELD
  const crush = mealRev + oilRev - record.bean_usd_per_bu
  return { ...record, mealPerLb, oilPerLb, mealRev, oilRev, crush }
}

module.exports = { perRecordCrush }

SQL: window functions for crush and z-score

-- table soy_prices(date, bean_usd_bu, meal_usd_st, oil_cents_lb)
WITH converted AS (
  SELECT
    date,
    bean_usd_bu,
    meal_usd_st / 2000.0 AS meal_usd_lb,
    oil_cents_lb / 100.0 AS oil_usd_lb,
    (meal_usd_st / 2000.0) * 44.0 AS meal_rev_bu,
    (oil_cents_lb / 100.0) * 11.0 AS oil_rev_bu
  FROM soy_prices
)
SELECT
  date,
  bean_usd_bu,
  meal_rev_bu + oil_rev_bu - bean_usd_bu AS crush_bu,
  -- 90-day rolling mean/std for z-score
  ( (meal_rev_bu + oil_rev_bu - bean_usd_bu)
    - AVG(meal_rev_bu + oil_rev_bu - bean_usd_bu) OVER (ORDER BY date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW)
  )
  / NULLIF(STDDEV_SAMP(meal_rev_bu + oil_rev_bu - bean_usd_bu) OVER (ORDER BY date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW),0) AS crush_z
FROM converted
ORDER BY date;

Feature engineering to capture divergence

Beyond the raw crush margin, build multiple orthogonal features so models can detect different flavors of divergence.

  1. Crush z-score: how extreme is the current margin vs. rolling historical mean?
  2. Oil-to-meal revenue ratio: (oil_rev_per_bu / meal_rev_per_bu) highlights oil-driven margins.
  3. Rolling correlation: decreasing correlation between meal_rev and oil_rev often precedes sustained divergence.
  4. Volatility regime indicators: volatility of oil returns vs meal returns (ratio of realized volatilities).
  5. Basis-adjusted features: cash futures basis for each leg (cash - front-month futures) to capture local dislocations.
  6. Seasonal & phenology features: incorporate calendar windows (planting, harvest) and weather indices (e.g., NOAA anomalies) as covariates.
  7. Supply shock flags: binary features for known events (crusher outages, policy announcements, refinery start-ups) derived from event feeds.
  8. Cointegration residual: if bean, oil and meal have long-run equilibrium, the residual from an OLS combination is a stationary feature. Use Engle-Granger / Johansen tests in backfill.

Example: compute an oil-meal divergence signal

# signal = z-score of (oil_rev_bu / meal_rev_bu)
df['oil_meal_rev_ratio'] = df['oil_rev_per_bu'] / (df['meal_rev_per_bu'] + 1e-9)
df['ratio_z'] = (df['oil_meal_rev_ratio'] - df['oil_meal_rev_ratio'].rolling(120).mean()) / df['oil_meal_rev_ratio'].rolling(120).std()

# build a composite signal
df['divergence_signal'] = 0.6 * df['crush_z'].clip(-3,3) + 0.4 * df['ratio_z'].clip(-3,3)

Operational considerations for ETL and production pipelines

Producing reliable features requires attention to messy realities of commodity data.

  • Contract rolls: futures prices are contract-specific. Build back-adjusted continuous series using volume/open interest rules (back-adjust or proportional roll) and store both front-month and nearby spreads.
  • Unit normalization: convert everything to a single base (we used $/lb + $/bu) early in the pipeline to avoid unit bugs.
  • Temporal alignment: soy oil (vegetable oil markets) may trade on different exchanges/timezones. Use UTC timestamps and align end-of-day snapshots to your business reporting hour.
  • Missing data and interpolation: use forward-fill for market holidays but be explicit — do not impute economic moves across multi-day outages.
  • Latency: for real-time alerting, ingest tick data from exchange feeds or low-latency market-data providers or low-latency market-data providers; for daily models, end-of-day settlement is sufficient.
  • Provenance & licensing: choose data providers with clear licensing for commercial use (USDA reports are public; exchange data often requires licensing for redistribution).

Backtesting and validating divergence signals

Before you deploy signals into production, test them against historical outcomes and business KPIs.

  1. Define objective metrics: e.g., P&L of a hedged crush trade, forecast error for meal price, alerts per month.
  2. Use proper walk-forward cross-validation and simulate realistic execution (bid/ask, slippage).
  3. Measure signal persistence: compute how long divergence persists (half-life of mean reversion) using an AR(1) fit on the signal.
  4. Validate across regimes: verify performance across 2018–2026 windows, including late-2025 renewable diesel buildouts.

Data engineers and quant teams should build features with the following 2026 realities in mind:

  • Renewable diesel & biofuel policy (late 2025–2026): expanding renewable diesel capacity has strengthened edible oil demand — especially in the U.S. Gulf and Northwest Europe — decoupling oil from feed-focused meal.
  • Precision ag yields & satellite monitoring: higher-frequency yield estimates from satellite/NDVI feeds reduce surprise crop shocks, enabling quicker revisions of supply-side features.
  • Data availability & APIs: exchanges and government sources have improved machine-readable endpoints and cloud object storage (2025–26). Integrate versioned datasets and caching to keep your features reproducible.
  • Climate variability: increased frequency of extreme weather events introduces non-stationarity; regime-aware features (volatility regimes, time-varying coefficients) become essential.

Case study (short): oil-driven divergence in late 2025

In late 2025 several renewable diesel plants came online, creating incremental demand for soy oil independent of feed demand. Teams that quickly added a simple oil_meal_rev_ratio feature and a rolling_corr feature could have detected the divergence early and adjusted hedges to protect crush margins. Quantitatively, oil price returns led meal returns by several days and the rolling correlation dropped from ~0.8 to ~0.4 over a 60-day window — a clear, actionable signal for margin management.

  1. Ingest: exchange and cash quotes via scheduled pulls or streaming (CME, USDA, regional cash feeds).
  2. Normalize & convert units: canonicalize to $/lb and $/bu as early as possible.
  3. Contract processing: back-adjust front-month continuous series and compute basis.
  4. Feature store: compute and store crush_margin, crush_z, oil_meal_ratio, rolling_corr and volatility ratios in a feature store (e.g., Feast, Snowflake + UDFs). Consider integrating with cloud providers and object storage for versioned artifacts.
  5. Model/alerting: use features for ML models, rule-based alerts, and dashboards (Grafana/Looker). Deploy monitoring for data drift and feature freshness.

Operational recipe: checklist before go-live

  • Confirm unit conversion tests (round-trip checks).
  • Validate back-adjusted continuous series against exchange front-month settlements.
  • Implement holiday and roll calendars for each market.
  • Run sensitivity tests (shock oil +10%, meal -5%) and ensure downstream systems behave safely.
  • Document data licensing and retention for audits.

Advanced strategies & features

Once you have stable, reproducible features, consider:

  • Volatility-scaled signals: divide divergence by realized vol to improve signal stability across regimes.
  • Multi-horizon signals: compute crush_z for 30/90/180-day windows and use a model ensemble to combine horizons.
  • Cross-asset signals: include palm oil and canola oil prices to detect substitution-driven oil moves.
  • NLP event features: ingest news and policy announcements to flag exogenous shocks that may cause divergence; build scrapers and event feeds responsibly (see best practices).
  • Online learning: update model weights incrementally to adapt to structural shifts such as renewable diesel demand curves.

Practical pitfalls and how to avoid them

  • Mixing units: always canonicalize. A single unit mismatch can destroy model accuracy.
  • Static coefficients: avoid fixed linear mixes that ignore regime changes. Use rolling or adaptive fits.
  • Survivorship bias: include delisted instruments and older contracts when backtesting continuous series.
  • Overfitting to a single event: require performance across multiple historical regimes before deploying.
  • ML-specific pitfalls: watch for pattern leakage and false invariants — see common ML pitfalls and design tests accordingly.

Actionable takeaways — build this week

  1. Implement the basic crush compute (per the Python example) and schedule daily job to produce crush_per_bu.
  2. Store derived features in a feature store with metadata: crush_per_bu, crush_z, oil_meal_rev_ratio, rolling_corr(60).
  3. Create an alert rule: trigger when crush_z > 2.0 or oil_meal_rev_ratio_z > 2.0 for 3 consecutive days.
  4. Backtest these signals for at least 3 historical regimes (including late-2025) before using for trade execution or hedging adjustments.

Final notes on trust and data sourcing

Reliable features start with trustworthy inputs. Use authoritative sources (public USDA reports for supply/demand balances; exchange settlement data for tradable prices) and include provenance in your metadata. For commercial production, confirm redistribution and latency allowances in provider contracts. Also document audit trails and retention policies so you can answer questions from compliance teams (audit trail best practices).

Call to action

If you want reproducible, API-ready soy complex feeds and prebuilt ETL blueprints that include unit conversions, contract-roll logic, and feature recipes (crush spread, z-scores, rolling correlations), explore our data APIs and sample notebooks. Instrument the soy complex like a data product today — run the Python example on your price history and push the features into a feature store to see divergence signals in under an hour. For operational notes on pipelines and scaling, see our cloud pipelines case studies and object storage reviews linked below.

Advertisement

Related Topics

#analytics#commodities#quant
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-17T02:14:05.262Z