Open AM Best Ratings Dataset Guide

A practical guide to building an open AM Best ratings dataset for trend analysis, alerts and risk monitoring—with schema, pipelines and code.

Hook: Stop missing early warning signs — build a resilient AM Best ratings dataset

Financial risk teams and platform engineers tell us the same friction points in 2026: rating actions arrive as unstructured press releases, insurer filings live in disparate regulatory silos, and critical financial metrics are scattered across PDFs and XBRL feeds. The result: slow detections, manual triage, and noisy alerts that erode trust. This guide shows how to build an open, maintainable dataset of AM Best rating actions, insurer filings and harmonized financial metrics to enable robust trend analysis and automated alerts.

Executive summary (most important first)

In this article you will get:

A production-ready data model for rating actions, filings and financials
Proven pipeline architecture (ingest, normalize, validate, serve)
Code-first examples (Python scraper/parser, BigQuery SQL, webhook alerts)
Operational guidance: cadence, provenance, licensing and monitoring
A short case study capturing a Jan 2026 AM Best upgrade for Michigan Millers

Why this matters in 2026: trends shaping insurer risk datasets

Three developments drove this playbook in late 2025 and early 2026:

Regulatory machine-readability — broader adoption of XBRL and standardised statutory feeds (NAIC, EU regulators) makes financial extraction easier but uneven across jurisdictions.
Real-time monitoring expectations — internal risk teams expect near-real-time alerts tied to quantitative metrics, not just event notifications.
AI-driven context enrichment — LLMs and embedding search are now commonly used to extract rationale snippets and map qualitative rating rationales to structured risk tags (concentration risk, reinsurance support, ESG/climate exposure).

Core dataset design: entities and schema

Design for identity-first joins and traceable provenance. Use globally unique IDs like LEI, CIK, and NAIC company codes where applicable.

Primary tables (recommended)

insurers: insurer_id (UUID), legal_name, LEI, NAIC_code, CIK, country, primary_group, status, website
rating_actions: action_id, insurer_id, agency (AM Best), action_date, rating_type (FSR/ICR/PDR), previous_rating, new_rating, outlook, action_code (upgrade/downgrade/affirmation), rationale_snippet, reinsurance_affiliation_code, parent_entity, source_url, source_captured_at, license_terms
filings: filing_id, insurer_id, filing_type (annual, quarterly, statutory, regulatory), filing_date, format (XBRL/PDF/CSV), raw_url, parsed_url, capture_hash, trust_score
financial_metrics: metric_id, insurer_id, period_end, metric_name (RBC_ratio, total_assets, loss_ratio, combined_ratio, ROE), metric_value, currency, reporting_basis (GAAP/statutory), source_filing_id
derived_signals: signal_id, insurer_id, signal_date, signal_type (downgrade_risk, liquidity_shortfall), score, rule_id, supporting_facts

Key design principles

Store both raw text (press release HTML/PDF) and structured extracts to preserve provenance.
Tag every record with source_url, captured_at and license_terms.
Normalize ratings to a fixed scale (e.g., numeric score where A+ = 2, aa- = 3) for trend analysis.
Use time-versioned tables or CDC for historical reconstruction and backtesting.

Pipeline architecture — from raw press release to alert

Build a modular, observable pipeline. Below is an architecture that teams are using successfully in 2026.

Ingest: scheduled crawlers + API connectors (AM Best feed if available), SEC/EDGAR, NAIC, company sites, newswires
Capture raw: store original HTML/PDF/XBRL in object storage (S3/GS)
Parse & extract: HTML/PDF parsing -> extract metadata and text; XBRL -> map tags to canonical metrics
Normalize: unify units, currencies, identifiers (LEI/NAIC/CIK), map rating labels to numeric rank
Enrich: add parent group, reinsurance codes, sector tags, climate exposure score
Validate: run schema tests and data quality rules (Great Expectations, dbt tests)
Serve: expose dataset via SQL (BigQuery/Snowflake/Postgres) and an API for event subscriptions
Alerting & Ops: rules engine (SQL-based or stream processor) to emit Slack/email/PagerDuty alerts

Tech stack choices (battle-tested)

Ingestion: Airbyte / custom Python crawlers for press releases
Orchestration: Prefect or Airflow
Parsing: Python (requests, BeautifulSoup, pdfminer, lxml), Arelle for XBRL
Storage: S3 + Delta Lake or BigQuery / Snowflake for analytics
Transformation: dbt for SQL transformations and documentation
Quality: Great Expectations, Monte Carlo for data observability
Alerts: Kafka/Cloud PubSub -> serverless functions -> Slack/PagerDuty

Practical extractor example: parsing AM Best press releases (Python)

Keep parsers resilient: use content-based selectors, rate-limit requests, and capture the full HTML for rebuilds. Below is a compact example to extract structured fields from a press release.

import requests
from bs4 import BeautifulSoup
from datetime import datetime

url = 'https://www.insurancejournal.com/news/midwest/2026/01/16/854699.htm'
resp = requests.get(url, timeout=10)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, 'html.parser')

# naive selectors — adapt per publisher
title = soup.find('h1').get_text(strip=True)
date_text = soup.select_one('time').get_text(strip=True)
date = datetime.strptime(date_text, '%B %d, %Y')
body = ' '.join(p.get_text(' ', strip=True) for p in soup.select('article p'))

# quick heuristic extraction
if 'upgrad' in body.lower():
    action = 'upgrade'
else:
    action = 'other'

record = {
    'source_url': url,
    'captured_at': datetime.utcnow().isoformat(),
    'title': title,
    'action_date': date.isoformat(),
    'rationale_snippet': body[:1000],
    'action_code': action
}
print(record)

Parsing XBRL and statutory filings

For public insurers use SEC EDGAR (XBRL) to extract GAAP metrics. For domestic statutory filings (NAIC) rely on regulators' datasets and third-party aggregators. Use Arelle or pandas-xbrl to map tags to canonical metric names.

Mapping example: RBC and Combined Ratio

RBC: extract statutory risk-based capital ratio from NAIC filings
Combined Ratio: from income statements, compute (losses + expenses) / premiums earned

Trend analysis: sample queries and signals

Below are high-value queries you can run once the dataset is assembled.

1) Quarterly downgrade rate (12-month rolling)

SELECT
  date_trunc(action_date, MONTH) AS month,
  COUNTIF(action_code='downgrade') / COUNT(*) OVER (ORDER BY month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS rolling_downgrade_rate
FROM rating_actions
WHERE agency = 'AM Best'
GROUP BY month
ORDER BY month;

2) Correlate RBC decline with downgrades

WITH rbc_change AS (
  SELECT insurer_id, period_end,
    metric_value AS rbc
  FROM financial_metrics
  WHERE metric_name = 'RBC_ratio'
),
ranked AS (
  SELECT insurer_id, rbc,
    LAG(rbc) OVER (PARTITION BY insurer_id ORDER BY period_end) AS prev_rbc
  FROM rbc_change
)
SELECT r.insurer_id,
  (r.rbc - r.prev_rbc) AS change,
  COUNT(a.action_id) AS downgrade_count
FROM ranked r
LEFT JOIN rating_actions a ON a.insurer_id = r.insurer_id AND a.action_code = 'downgrade' AND a.action_date BETWEEN r.period_end AND DATE_ADD(r.period_end, INTERVAL 90 DAY)
GROUP BY r.insurer_id, change
ORDER BY change ASC
LIMIT 100;

Automated alerts: rules, examples and best practices

Design alerts that combine qualitative rating actions and quantitative degradations. Use a rules engine or streaming SQL. Prioritize low false positives.

Alert rule examples

High severity: AM Best downgrade AND RBC decline > 15% vs prior period OR combined ratio > 110% → page to PagerDuty
Medium severity: AM Best outlook negative OR downgrade in a peer with systemic reinsurance exposure → Slack channel
Low severity: Affirmations but worsening metrics on small carriers → daily digest email

Webhook example (Node.js) to push Slack notification

const fetch = require('node-fetch');

async function sendSlack(text) {
  const url = process.env.SLACK_WEBHOOK;
  await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text })
  });
}

sendSlack('AM Best downgrade detected for Michigan Millers (A+ -> AA-) with RBC -18%');

Data quality, testing and monitoring

Instrument the pipeline with observable checks and business-facing SLAs.

Schema tests: ensure required fields (action_date, source_url, insurer_id) exist
Freshness checks: no more than X minutes/hours delay for most important feeds
Anomaly detection: detect metric jumps using rolling z-score and surface to ops
Backfill and reconciliation: weekly totals vs. external aggregator counts

Provenance, licensing and legal constraints

AM Best press releases and ratings text often carry copyright and terms of use. In 2026 legal risk is still a real constraint when making a public dataset.

Never mirror proprietary full-text without permission. Store original URLs and small rationale snippets (fair use) and link to source.
Prefer storing derived facts (rating action, previous/next rating, outlook) rather than verbatim press release content.
Record license_terms per-row: source_copyright, license_url, allowed_redistribution (true/false).
When using SEC/EDGAR and regulatory XBRL, verify the filing license — most are public domain; NAIC statutory may have restrictions.

Operational matters: cadence, scale and cost

Suggested update cadence:

AM Best rating feeds & news: poll every 6–15 minutes for press release indexes, daily full crawl
SEC XBRL/EDGAR: daily incremental fetch
Statutory filings: weekly for bulk feeds, daily for specific insurers under watch

Storage costs can be managed by keeping only parsed outputs in analytics stores and archiving raw PDFs/HTML to cheaper object storage with lifecycle policies.

Case study: capturing the Jan 16, 2026 Michigan Millers upgrade

Example: AM Best upgraded Michigan Millers Mutual Insurance Company to FSR A+ and Long-Term ICR "aa-" (Jan 16, 2026). Key facts to capture and how to model them:

insurer_id: link to Michigan Millers (LEI or NAIC if available)
action_date: 2026-01-16
agency: AM Best
rating_type: Financial Strength Rating + Long-Term ICR
previous_rating: A / a
new_rating: A+ / aa-
outlook: stable (revised from positive)
rationale_snippet: citations referencing balance sheet strength, operating performance, reinsurance affiliation with Western National and regulatory approval
source_url: insurancejournal.com/news/midwest/2026/01/16/854699.htm
supporting_event: regulatory approval effective 2026-01-01 and pooling participation

Store the press release HTML and a structured action row. Enrich with Western National's group rating and set reinsurance_affiliation_code = 'p'. An automated alert rule would correlate the upgrade with no immediate metric change; downgrade alerts would not trigger, but a watchlist update could be created for consolidation impacts.

Advanced strategies and future-proofing (2026+)

Prepare for the next five years by adopting these advanced patterns.

Semantic enrichment: use embedding models to categorize rationales into risk types (reinsurance dependency, reserve adequacy, catastrophes).
Data contracts and mesh: define data contracts between ingestion teams and consumers so SLAs and versioning are enforced.
Explainable signals: record the exact metrics and thresholds that triggered an alert for auditability and regulator scrutiny.
Cross-market signals: integrate reinsurer notices and market-wide indicators (reinsurance rates, catastrophe losses) to detect systemic pressure.
Privacy & security: treat credentials and subscriptions to proprietary feeds as sensitive secrets; rotate and audit access.

Checklist: launch in 30 days

Define core schema and required identifiers (LEI/NAIC/CIK).
Implement daily crawlers for AM Best press pages and wire services.
Connect to SEC/EDGAR XBRL and configure Arelle extracts for key GAAP metrics.
Deploy dbt transformations and Great Expectations tests.
Write 3 alert rules: high/medium/low severity and map delivery channels.
Document provenance and licensing for every ingestion source.

Actionable takeaways

Capture facts not full text — store structured rating actions and link to source URLs to avoid copyright issues.
Use canonical IDs (LEI/NAIC/CIK) to reliably join ratings to financials.
Blend qualitative and quantitative — correlate rating actions with RBC, combined ratio and liquidity metrics for high-precision alerts.
Automate quality checks — enforce freshness and schema tests so alerts remain trustworthy.

Final thoughts & call-to-action

Building an open, maintainable AM Best dataset is no longer optional for modern financial risk teams — it's a foundation for timely decisions and defensible alerts. In 2026 the tooling and regulatory signals exist to make this both practical and cost-effective: XBRL feeds, robust open-source parsers, and cloud-native pipelines let you scale from a pilot to enterprise-grade monitoring.

Ready to prototype? Clone a starter repo, deploy the sample crawler and dbt project, or contact your platform team to schedule a 90‑day pilot. If you'd like, we can provide a checklist and sample schema JSON you can drop into your data warehouse.

Open Insurance Ratings Dataset: Collecting AM Best Actions and Insurer Financials for Trend Analysis

Hook: Stop missing early warning signs — build a resilient AM Best ratings dataset

Executive summary (most important first)

Why this matters in 2026: trends shaping insurer risk datasets

Core dataset design: entities and schema

Primary tables (recommended)

Key design principles

Pipeline architecture — from raw press release to alert

Tech stack choices (battle-tested)

Practical extractor example: parsing AM Best press releases (Python)

Parsing XBRL and statutory filings

Mapping example: RBC and Combined Ratio

Trend analysis: sample queries and signals

1) Quarterly downgrade rate (12-month rolling)

2) Correlate RBC decline with downgrades

Automated alerts: rules, examples and best practices

Alert rule examples

Webhook example (Node.js) to push Slack notification

Data quality, testing and monitoring

Provenance, licensing and legal constraints

Operational matters: cadence, scale and cost

Case study: capturing the Jan 16, 2026 Michigan Millers upgrade

Advanced strategies and future-proofing (2026+)

Checklist: launch in 30 days

Actionable takeaways

Final thoughts & call-to-action

Related Topics

worlddata

Up Next

World Population Growth Trends: Which Regions Are Growing Fastest and Why

How to Compare Countries Fairly: Per Capita, PPP, Median, and Other Data Adjustments

Exchange Rates Explained: Why Currency Moves Matter for Country Data Comparisons

Hook: Stop missing early warning signs — build a resilient AM Best ratings dataset

Executive summary (most important first)

Why this matters in 2026: trends shaping insurer risk datasets

Core dataset design: entities and schema

Primary tables (recommended)

Key design principles

Pipeline architecture — from raw press release to alert

Tech stack choices (battle-tested)

Practical extractor example: parsing AM Best press releases (Python)

Parsing XBRL and statutory filings

Mapping example: RBC and Combined Ratio

Trend analysis: sample queries and signals

1) Quarterly downgrade rate (12-month rolling)

2) Correlate RBC decline with downgrades

Automated alerts: rules, examples and best practices

Alert rule examples

Webhook example (Node.js) to push Slack notification

Data quality, testing and monitoring

Provenance, licensing and legal constraints

Operational matters: cadence, scale and cost

Case study: capturing the Jan 16, 2026 Michigan Millers upgrade

Advanced strategies and future-proofing (2026+)

Checklist: launch in 30 days

Actionable takeaways

Final thoughts & call-to-action

Related Reading

Related Topics

worlddata

Up Next

World Population Growth Trends: Which Regions Are Growing Fastest and Why

How to Compare Countries Fairly: Per Capita, PPP, Median, and Other Data Adjustments

Exchange Rates Explained: Why Currency Moves Matter for Country Data Comparisons