Open Insurance Ratings Dataset: Collecting AM Best Actions and Insurer Financials for Trend Analysis
A practical guide to building an open AM Best ratings dataset for trend analysis, alerts and risk monitoring—with schema, pipelines and code.
Hook: Stop missing early warning signs — build a resilient AM Best ratings dataset
Financial risk teams and platform engineers tell us the same friction points in 2026: rating actions arrive as unstructured press releases, insurer filings live in disparate regulatory silos, and critical financial metrics are scattered across PDFs and XBRL feeds. The result: slow detections, manual triage, and noisy alerts that erode trust. This guide shows how to build an open, maintainable dataset of AM Best rating actions, insurer filings and harmonized financial metrics to enable robust trend analysis and automated alerts.
Executive summary (most important first)
In this article you will get:
- A production-ready data model for rating actions, filings and financials
- Proven pipeline architecture (ingest, normalize, validate, serve)
- Code-first examples (Python scraper/parser, BigQuery SQL, webhook alerts)
- Operational guidance: cadence, provenance, licensing and monitoring
- A short case study capturing a Jan 2026 AM Best upgrade for Michigan Millers
Why this matters in 2026: trends shaping insurer risk datasets
Three developments drove this playbook in late 2025 and early 2026:
- Regulatory machine-readability — broader adoption of XBRL and standardised statutory feeds (NAIC, EU regulators) makes financial extraction easier but uneven across jurisdictions.
- Real-time monitoring expectations — internal risk teams expect near-real-time alerts tied to quantitative metrics, not just event notifications.
- AI-driven context enrichment — LLMs and embedding search are now commonly used to extract rationale snippets and map qualitative rating rationales to structured risk tags (concentration risk, reinsurance support, ESG/climate exposure).
Core dataset design: entities and schema
Design for identity-first joins and traceable provenance. Use globally unique IDs like LEI, CIK, and NAIC company codes where applicable.
Primary tables (recommended)
- insurers: insurer_id (UUID), legal_name, LEI, NAIC_code, CIK, country, primary_group, status, website
- rating_actions: action_id, insurer_id, agency (AM Best), action_date, rating_type (FSR/ICR/PDR), previous_rating, new_rating, outlook, action_code (upgrade/downgrade/affirmation), rationale_snippet, reinsurance_affiliation_code, parent_entity, source_url, source_captured_at, license_terms
- filings: filing_id, insurer_id, filing_type (annual, quarterly, statutory, regulatory), filing_date, format (XBRL/PDF/CSV), raw_url, parsed_url, capture_hash, trust_score
- financial_metrics: metric_id, insurer_id, period_end, metric_name (RBC_ratio, total_assets, loss_ratio, combined_ratio, ROE), metric_value, currency, reporting_basis (GAAP/statutory), source_filing_id
- derived_signals: signal_id, insurer_id, signal_date, signal_type (downgrade_risk, liquidity_shortfall), score, rule_id, supporting_facts
Key design principles
- Store both raw text (press release HTML/PDF) and structured extracts to preserve provenance.
- Tag every record with source_url, captured_at and license_terms.
- Normalize ratings to a fixed scale (e.g., numeric score where A+ = 2, aa- = 3) for trend analysis.
- Use time-versioned tables or CDC for historical reconstruction and backtesting.
Pipeline architecture — from raw press release to alert
Build a modular, observable pipeline. Below is an architecture that teams are using successfully in 2026.
- Ingest: scheduled crawlers + API connectors (AM Best feed if available), SEC/EDGAR, NAIC, company sites, newswires
- Capture raw: store original HTML/PDF/XBRL in object storage (S3/GS)
- Parse & extract: HTML/PDF parsing -> extract metadata and text; XBRL -> map tags to canonical metrics
- Normalize: unify units, currencies, identifiers (LEI/NAIC/CIK), map rating labels to numeric rank
- Enrich: add parent group, reinsurance codes, sector tags, climate exposure score
- Validate: run schema tests and data quality rules (Great Expectations, dbt tests)
- Serve: expose dataset via SQL (BigQuery/Snowflake/Postgres) and an API for event subscriptions
- Alerting & Ops: rules engine (SQL-based or stream processor) to emit Slack/email/PagerDuty alerts
Tech stack choices (battle-tested)
- Ingestion: Airbyte / custom Python crawlers for press releases
- Orchestration: Prefect or Airflow
- Parsing: Python (requests, BeautifulSoup, pdfminer, lxml), Arelle for XBRL
- Storage: S3 + Delta Lake or BigQuery / Snowflake for analytics
- Transformation: dbt for SQL transformations and documentation
- Quality: Great Expectations, Monte Carlo for data observability
- Alerts: Kafka/Cloud PubSub -> serverless functions -> Slack/PagerDuty
Practical extractor example: parsing AM Best press releases (Python)
Keep parsers resilient: use content-based selectors, rate-limit requests, and capture the full HTML for rebuilds. Below is a compact example to extract structured fields from a press release.
import requests
from bs4 import BeautifulSoup
from datetime import datetime
url = 'https://www.insurancejournal.com/news/midwest/2026/01/16/854699.htm'
resp = requests.get(url, timeout=10)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, 'html.parser')
# naive selectors — adapt per publisher
title = soup.find('h1').get_text(strip=True)
date_text = soup.select_one('time').get_text(strip=True)
date = datetime.strptime(date_text, '%B %d, %Y')
body = ' '.join(p.get_text(' ', strip=True) for p in soup.select('article p'))
# quick heuristic extraction
if 'upgrad' in body.lower():
action = 'upgrade'
else:
action = 'other'
record = {
'source_url': url,
'captured_at': datetime.utcnow().isoformat(),
'title': title,
'action_date': date.isoformat(),
'rationale_snippet': body[:1000],
'action_code': action
}
print(record)
Parsing XBRL and statutory filings
For public insurers use SEC EDGAR (XBRL) to extract GAAP metrics. For domestic statutory filings (NAIC) rely on regulators' datasets and third-party aggregators. Use Arelle or pandas-xbrl to map tags to canonical metric names.
Mapping example: RBC and Combined Ratio
- RBC: extract statutory risk-based capital ratio from NAIC filings
- Combined Ratio: from income statements, compute (losses + expenses) / premiums earned
Trend analysis: sample queries and signals
Below are high-value queries you can run once the dataset is assembled.
1) Quarterly downgrade rate (12-month rolling)
SELECT
date_trunc(action_date, MONTH) AS month,
COUNTIF(action_code='downgrade') / COUNT(*) OVER (ORDER BY month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS rolling_downgrade_rate
FROM rating_actions
WHERE agency = 'AM Best'
GROUP BY month
ORDER BY month;
2) Correlate RBC decline with downgrades
WITH rbc_change AS (
SELECT insurer_id, period_end,
metric_value AS rbc
FROM financial_metrics
WHERE metric_name = 'RBC_ratio'
),
ranked AS (
SELECT insurer_id, rbc,
LAG(rbc) OVER (PARTITION BY insurer_id ORDER BY period_end) AS prev_rbc
FROM rbc_change
)
SELECT r.insurer_id,
(r.rbc - r.prev_rbc) AS change,
COUNT(a.action_id) AS downgrade_count
FROM ranked r
LEFT JOIN rating_actions a ON a.insurer_id = r.insurer_id AND a.action_code = 'downgrade' AND a.action_date BETWEEN r.period_end AND DATE_ADD(r.period_end, INTERVAL 90 DAY)
GROUP BY r.insurer_id, change
ORDER BY change ASC
LIMIT 100;
Automated alerts: rules, examples and best practices
Design alerts that combine qualitative rating actions and quantitative degradations. Use a rules engine or streaming SQL. Prioritize low false positives.
Alert rule examples
- High severity: AM Best downgrade AND RBC decline > 15% vs prior period OR combined ratio > 110% → page to PagerDuty
- Medium severity: AM Best outlook negative OR downgrade in a peer with systemic reinsurance exposure → Slack channel
- Low severity: Affirmations but worsening metrics on small carriers → daily digest email
Webhook example (Node.js) to push Slack notification
const fetch = require('node-fetch');
async function sendSlack(text) {
const url = process.env.SLACK_WEBHOOK;
await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text })
});
}
sendSlack('AM Best downgrade detected for Michigan Millers (A+ -> AA-) with RBC -18%');
Data quality, testing and monitoring
Instrument the pipeline with observable checks and business-facing SLAs.
- Schema tests: ensure required fields (action_date, source_url, insurer_id) exist
- Freshness checks: no more than X minutes/hours delay for most important feeds
- Anomaly detection: detect metric jumps using rolling z-score and surface to ops
- Backfill and reconciliation: weekly totals vs. external aggregator counts
Provenance, licensing and legal constraints
AM Best press releases and ratings text often carry copyright and terms of use. In 2026 legal risk is still a real constraint when making a public dataset.
- Never mirror proprietary full-text without permission. Store original URLs and small rationale snippets (fair use) and link to source.
- Prefer storing derived facts (rating action, previous/next rating, outlook) rather than verbatim press release content.
- Record license_terms per-row: source_copyright, license_url, allowed_redistribution (true/false).
- When using SEC/EDGAR and regulatory XBRL, verify the filing license — most are public domain; NAIC statutory may have restrictions.
Operational matters: cadence, scale and cost
Suggested update cadence:
- AM Best rating feeds & news: poll every 6–15 minutes for press release indexes, daily full crawl
- SEC XBRL/EDGAR: daily incremental fetch
- Statutory filings: weekly for bulk feeds, daily for specific insurers under watch
Storage costs can be managed by keeping only parsed outputs in analytics stores and archiving raw PDFs/HTML to cheaper object storage with lifecycle policies.
Case study: capturing the Jan 16, 2026 Michigan Millers upgrade
Example: AM Best upgraded Michigan Millers Mutual Insurance Company to FSR A+ and Long-Term ICR "aa-" (Jan 16, 2026). Key facts to capture and how to model them:
- insurer_id: link to Michigan Millers (LEI or NAIC if available)
- action_date: 2026-01-16
- agency: AM Best
- rating_type: Financial Strength Rating + Long-Term ICR
- previous_rating: A / a
- new_rating: A+ / aa-
- outlook: stable (revised from positive)
- rationale_snippet: citations referencing balance sheet strength, operating performance, reinsurance affiliation with Western National and regulatory approval
- source_url: insurancejournal.com/news/midwest/2026/01/16/854699.htm
- supporting_event: regulatory approval effective 2026-01-01 and pooling participation
Store the press release HTML and a structured action row. Enrich with Western National's group rating and set reinsurance_affiliation_code = 'p'. An automated alert rule would correlate the upgrade with no immediate metric change; downgrade alerts would not trigger, but a watchlist update could be created for consolidation impacts.
Advanced strategies and future-proofing (2026+)
Prepare for the next five years by adopting these advanced patterns.
- Semantic enrichment: use embedding models to categorize rationales into risk types (reinsurance dependency, reserve adequacy, catastrophes).
- Data contracts and mesh: define data contracts between ingestion teams and consumers so SLAs and versioning are enforced.
- Explainable signals: record the exact metrics and thresholds that triggered an alert for auditability and regulator scrutiny.
- Cross-market signals: integrate reinsurer notices and market-wide indicators (reinsurance rates, catastrophe losses) to detect systemic pressure.
- Privacy & security: treat credentials and subscriptions to proprietary feeds as sensitive secrets; rotate and audit access.
Checklist: launch in 30 days
- Define core schema and required identifiers (LEI/NAIC/CIK).
- Implement daily crawlers for AM Best press pages and wire services.
- Connect to SEC/EDGAR XBRL and configure Arelle extracts for key GAAP metrics.
- Deploy dbt transformations and Great Expectations tests.
- Write 3 alert rules: high/medium/low severity and map delivery channels.
- Document provenance and licensing for every ingestion source.
Actionable takeaways
- Capture facts not full text — store structured rating actions and link to source URLs to avoid copyright issues.
- Use canonical IDs (LEI/NAIC/CIK) to reliably join ratings to financials.
- Blend qualitative and quantitative — correlate rating actions with RBC, combined ratio and liquidity metrics for high-precision alerts.
- Automate quality checks — enforce freshness and schema tests so alerts remain trustworthy.
Final thoughts & call-to-action
Building an open, maintainable AM Best dataset is no longer optional for modern financial risk teams — it's a foundation for timely decisions and defensible alerts. In 2026 the tooling and regulatory signals exist to make this both practical and cost-effective: XBRL feeds, robust open-source parsers, and cloud-native pipelines let you scale from a pilot to enterprise-grade monitoring.
Ready to prototype? Clone a starter repo, deploy the sample crawler and dbt project, or contact your platform team to schedule a 90‑day pilot. If you'd like, we can provide a checklist and sample schema JSON you can drop into your data warehouse.
Related Reading
- How Lenders Should Communicate During a Platform Outage: A Template for Transparency
- Where to Buy Beauty Tech: From Amazon Bargains to Neighborhood Convenience Shelves
- Safe Meme Use: How Influencers Can Ride Viral Trends Without Cultural Appropriation
- A Parent’s Guide to Buying Electric Bikes for Family Errands and Toy Hauls
- Placebo or Powerhouse? Separating Olive Oil Health Facts from Marketing Myths
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Consumer Sentiment Data: Impacts on Technology Investment Trends
AI and Meme Creation: A New Paradigm for Personal Content Generation
Emerging Technologies: AMD vs. Intel in Current Market Dynamics
Analytical Insights on the Dollar's Recent Decline: A Data Perspective
The Intersection of Privacy and Data Collection: TikTok as a Case Study
From Our Network
Trending stories across our publication group