Shapefiles, OCR, and 3G: Building a Geo-Political Platform for Low-Bandwidth Mexico
Disclaimer: This project was developed under NDA. I can’t share specific implementation details, client information, or proprietary architecture. What I can share are the general engineering challenges and the lessons I took from them.
The Brief
I was hired to build a web application centered on electoral cartography and geodata for a political campaign in Mexico. The core functionality: visualize geographic districts, overlay demographic and electoral data, and process large volumes of scanned documents via OCR.
Simple enough on paper. In practice, it became the most demanding performance engineering challenge I’ve faced.
The Shapefile Awakening
This was my first encounter with Shapefiles — the decades-old geospatial format that powers most GIS systems worldwide.
For the uninitiated: a Shapefile isn’t really a file. It’s a collection of files (.shp, .shx, .dbf, .prj, and sometimes more) that together describe geometric shapes — polygons for districts, points for locations, lines for boundaries — along with their associated data.
The challenge wasn’t parsing them. Libraries exist for that. The challenge was rendering thousands of complex polygons in a browser on hardware and connections that could barely handle a Google Maps embed.
Electoral districts in Mexico are intricate. They’re not clean rectangles — they follow rivers, streets, neighborhood boundaries. A single state’s district map can contain hundreds of thousands of coordinate points. Loading and rendering all of that client-side on a mid-range phone over a shaky cellular connection is a recipe for a blank screen and a crashed tab.
Going Low-Level for Performance
The initial stack was too high-level. We were rendering geospatial data with standard web mapping libraries, and it was unacceptably slow for the target users — campaign workers in the field, often in rural areas with poor connectivity.
What followed was a process I’d describe as descending the stack:
-
Simplify the geometry. We applied topological simplification (reducing coordinate density while preserving shape fidelity) to bring file sizes down by orders of magnitude. A district that was 2MB of coordinates became 200KB — visually identical at the zoom levels users actually needed.
-
Tile and cache aggressively. Instead of sending entire shapefiles to the client, we pre-rendered map tiles at fixed zoom levels. This turned a dynamic GIS problem into a static asset serving problem — and static assets can be cached at the edge, close to users.
-
Move computation server-side. Anything that could be pre-computed was pre-computed. Intersections, overlays, demographic calculations — all done on the server and served as lightweight JSON. The client became a thin rendering layer, not a GIS engine.
-
Optimize for first meaningful paint. The app needed to show something useful within seconds, even on 3G. We prioritized loading the user’s immediate geographic context first, then progressively loading surrounding areas. If you were in District 5, you saw District 5 instantly. Districts 1-4 loaded in the background.
The lesson was universal: when your users’ hardware and connectivity are the bottleneck, every kilobyte is a design decision.
The OCR System: Accidental MLOps
The second major component was an OCR pipeline for processing scanned documents — thousands of them, in varying quality, many photographed at angles or in poor lighting.
We deployed Microsoft’s open-source OCR model (Tesseract-adjacent, if I recall correctly) to handle the text extraction. This was, without me realizing it at the time, my first real MLOps project.
The pipeline looked something like this:
- Ingestion: Documents uploaded via the web app (optimized for large batch uploads over slow connections).
- Pre-processing: Image normalization — rotation correction, contrast adjustment, noise reduction — to improve OCR accuracy.
- Model serving: The OCR model running as a service, accepting image inputs and returning structured text.
- Post-processing: Parsing the extracted text into structured data fields (names, numbers, locations) with validation rules.
The engineering challenges were classic MLOps problems, even though I didn’t have the vocabulary for it yet:
- Resource management: The model was heavy. Running it on shared infrastructure alongside the web app caused resource contention. We had to isolate the OCR service with its own compute allocation.
- Throughput vs. latency tradeoffs: Processing a single document needed to be fast (for real-time feedback). Processing thousands needed to be efficient (for batch jobs). These are different optimization targets.
- Accuracy monitoring: OCR isn’t binary — it doesn’t “work” or “not work.” It works partially, with varying confidence scores. We needed to flag low-confidence extractions for human review rather than silently ingesting garbage data.
Looking back, this was the project that planted the seed for my later work in AI infrastructure. The problems — serving a model, managing compute, monitoring quality — were the same problems I’d face years later at much larger scale. I just didn’t know the name for them yet.
The .NET Collaboration
An unexpected dimension of this project was collaborating closely with a .NET and C# developer. Coming from the JavaScript/Python ecosystem, this was my first real exposure to the Microsoft development world.
What struck me was how different the conventions were — not better or worse, just different. Strongly typed everything. Verbose but explicit. Enterprise patterns that felt heavy at first but made the codebase remarkably navigable for newcomers.
It reinforced something I’d later experience in hackathons: working with engineers from different ecosystems teaches you things that staying in your own ecosystem never will. Not just new tools, but new ways of thinking about structure.
What I Took From This
- Performance optimization is a form of empathy. Every millisecond you save is a user in a rural area who doesn’t give up and close the tab.
- Geodata is humbling. The real world is messy. Rivers don’t follow pixel grids. Districts have exceptions. If your system can handle geographic data gracefully, it can handle almost anything.
- MLOps starts before you know it’s MLOps. The first time you deploy a model as a service and worry about throughput, you’re doing MLOps — whether or not it’s in your job title.
- NDAs protect the work, but lessons are portable. I can’t show you the code. But the patterns — simplify, cache, pre-compute, push work to the server — apply everywhere.
Diego Jiménez Vergara — AI Infrastructure & DevOps Engineer. Building performant systems for users wherever they are — even on 3G.