The problem.
Pablo runs a trauma and rehab practice with two own operating rooms, ten partner hospitals, and over forty collaborating doctors across his catchment area. Every month, the centers send settlement forms with completed acts and surgeries: structured PDFs with ordered tables, XLS files with no rules applied, custom listings with formats invented by each hospital, hand-written scans with no OCR, and XLSX files with the rule already applied. Five irreconcilable formats arriving by email in the admin team's inbox.
A qualified person on the fiscal team spent a full weekend each month opening each document, transcribing lines into a master Excel with forty-four tabs, looking up the doctor, applying the right rule (percentage on schedule, fixed amount, or percentage on profit by act × center × insurer), calculating fees, generating an individual PDF per collaborator, and sending it. In parallel, the secretarial team spent half an hour per surgery filling out insurer authorization templates, coordinating material suppliers, scheduling the operating room in Google Calendar, and sending reminders to the patient. Six hundred surgeries a year.
What we built.
Multi-format capture and OCR + vision pipeline
An n8n workflow monitors the dedicated mailbox, classifies each incoming form by center, and routes to the appropriate extraction prompt. For PDFs and XLS we use direct table parsing; for handwritten scans, GPT-4o Vision with a structured prompt that returns date, doctor, act, applicable insurer, and amount.
Doctor matching with vector store
Doctor names arrive with typos, partial names, and inconsistencies between centers. We vectorize each name and match it against the doctor master in pgvector. 92% match automatically with high confidence; the rest goes to a review queue with the model's top 3 suggestions.
Rules engine with insurer + center matrix
Each act has a different rule depending on doctor, center, and insurer. We codified the master Excel into a rules engine that applies percentages, fixed amounts, or profit-based calculations. Output: net fee per doctor, per act, traceable.
Generation and Holded posting
The system generates an individual PDF per collaborator with the breakdown, posts the corresponding entry in Holded (multi-entity), and sends each doctor their PDF via SMTP. What used to take a weekend now runs unattended overnight.
I went from being scared of every month-end to literally not noticing it. The system runs Saturday night, I review 30 minutes Monday morning, and we're done.
How we shipped it.
5-phase process — see full process.
- Week 1 · Discovery + roadmap: session with owner, secretarial, and fiscal team. Mapped the 5 form formats, inventoried active insurers and centers, documented the master Excel of rules, closed proposal with fixed fee. Quick win: start with the center concentrating 40% of volume.
- Weeks 2-3 · Build: staging build on dedicated VPS and Supabase EU-Frankfurt. Connections with Holded, Google Calendar (the 2 own ORs), Gmail/IMAP, corporate SMTP, and OpenAI API.
- Weeks 4-7 · Build with real data: 4 weeks of running parallel against 6 months of historical forms. Refined matching, rules, exception handling.
- Week 8 · Cutover: 5 days of shadow alongside the manual process, then replacement. System running.
- Weeks 9-11 · Stabilization (3 supervised weeks): weekly meetings with the team to review edge cases, tune validations with rare forms appearing only in real use, and refine the confidence threshold. At close: 100% autonomous production.
- Production: monthly retainer recommended for the first 6 months to onboard new insurers, build parsers for new centers, and evolve the rules engine.
Stack used.
n8nGPT-4o · GPT-4o VisionClaudepgvectorSupabaseHolded multi-entityGoogle CalendarPostgreSQL