A 2,846-line Python ETL system that extracts, validates, snapshots, and transforms raw portfolio data from a 313K-cell Excel workbook into investor-grade delinquency analytics — with zero manual intervention.
The 35-sheet, 3.8 MB Excel workbook that serves as the single source of truth for the Rogue Credit Union portfolio. The DQ Engine reads from 9 structured input tabs auto-populated by 126,199 live formulas.
| Sheet Name | Purpose | Cells | Formulas | Rows x Cols |
|---|---|---|---|---|
| Loan Sale Data | Master loan tape — 92 fields per loan | 30,142 | 27,862 | 443 x 92 |
| Rogue | Payment transaction ledger | 48,230 | 13,068 | 3,275 x 17 |
| Rogue Data Dump | Raw servicer data import (02-15-2026) | 33,501 | 0 | 409 x 88 |
| Loan Sale Data (PRE) | Pre-transformation loan tape | 28,212 | 19,577 | 440 x 84 |
| P2 Originated Loans | Phase 2 originated, serviced loans | 26,368 | 797 | 399 x 76 |
| Trial Balance Nov2025 | Trial balance reconciliation | 25,196 | 22,742 | 400 x 68 |
| Midwest Validation | Servicer validation crosscheck | 17,820 | 17,776 | 405 x 44 |
| Midwest TBAL | Midwest trial balance | 19,296 | 0 | 402 x 50 |
| Midwest | Midwest servicer data | 16,872 | 4,067 | 582 x 31 |
| PorchPass | Internal payment tracking | 13,213 | 6,811 | 2,768 x 17 |
| MAIC Analytics | Portfolio analytics & DQ calculations | ~8,500 | ~6,200 | 710 x 34 |
| + 24 additional sheets | Config, QA, Servicing, Mapping, Calendar... | ~46,643 | ~7,299 | — |
1_KPI_Strip — 40 fields2_DQ_Trend_Chart — 12 fields3_DPD_Distribution — 25 fields4_Vintage_Segmentation — dynamic5_Tables_Narrative — 12 + 5 arrays6_Note_Rate_Bands — 18-col matrix7_Weighted_Avg_Stats — 6 metrics8_Land_Type — 18-col matrix9_Credit_Score_Tier — 19-col matrixComplete data flow: Raw Excel → 9 Extractors → Validation → JSON Snapshot → Template/Export/Dashboard.
Each extractor uses hardcoded cell-coordinate maps to pull exact values from the master workbook. Every field traces back to a specific Excel cell address.
Maps 40 JSON keys to exact Excel row/column coordinates. Each DQ tier captures Rate, Count, UPB, UPB Abbreviated, UPB%, and Tooltip — 6 fields each for DQ30+, DQ60+, Demanded, NTV, Repo, Performing plus metadata.
A single polymorphic extractor handles three matrix tabs (Note Rate Bands, Land Type, Credit Score Tier). Dynamically discovers column headers at row 2, then iterates all data rows capturing every cell by header name.
Every snapshot runs through 12 automated QC checks before persistence. Cross-references data between tabs to catch inconsistencies that would undermine investor confidence.
Central datastore holding all historical snapshots with auto-rotating backups (last 10 copies retained before every write).
574 lines generating N-column Excel templates (2-12 periods) with exact cell styling, number formats, color-coded input cells, and dynamic note generation.
Weekly sub-periods (W1-W4) for intra-month tracking. MTD→EOM promotion auto-archives all MTD variants with hard copies in 6_archived_periods/.
Generates a comprehensive Excel workbook with 7 analytical sheets featuring MoM delta calculations, RAG status coloring, and trend arrows.
| # | Sheet | Content | Features |
|---|---|---|---|
| 1 | Executive Summary | 19 headline metrics x N periods | RAG (G/A/R) + MoM + trend arrows |
| 2 | KPI History | All 40 KPI fields x N periods | $, %, integer formatting |
| 3 | DPD History | 25 DPD bucket fields x N periods | UPB + Pct formats |
| 4 | Portfolio Stats | WAC, WALA, WARM, FICO, LTV, DTI | Metric-specific formats |
| 5 | Vintage Analysis | Loan counts + DQ rates by vintage | Chronological sorting |
| 6 | DQ Trend | DQ30/60/120+ rate tracking | Multi-period comparison |
| 7 | Narratives | Watch Items, Strengths, Exec Summary | Wrap text, section-grouped |
| Engine Version | v3.1 (Gap Closure) |
| Lines of Code | 2,846 |
| Language | Python 3.x + openpyxl |
| Extraction Modules | 9 |
| Validation Checks | 12 |
| Menu Operations | 9 |
| Template Builder | 574 lines, 11 output tabs |
| Export Engine | 7 analytical sheets |
| Audit Gaps Resolved | 14/14 |