The hardest bugs in production geospatial systems are not the ones that crash your pipeline. They’re the ones that let it keep running.
I spent the last few years building systems that process millions of agricultural land parcels. The work involves dozens of data sources, multiple coordinate systems, and complex validation rules. In that time, I’ve learned that the most expensive problems are the ones that pass all your checks but still produce wrong results.
A few months ago we noticed farm area calculations in one region were consistently 30-40% too high. The geometries were valid according to PostGIS. The pipeline logs showed no errors. Everything was green. But the numbers were wrong, and they had been wrong for weeks. By the time we caught it, those numbers had already gone into client reports.
The root cause was winding order. Some KML files from field surveys had interior rings with the same vertex order as exterior rings. PostGIS loaded them without complaint. Our validation passed them. But when we calculated areas, the holes were being added instead of subtracted. The geometry was syntactically valid, but it did not represent reality correctly.
Why this costs more than you think
When a service crashes, the failure is visible and contained. You get an alert, you investigate, you fix it, you redeploy. Silent errors are different. They compound over time.
Bad geometries get stored in your database. They get aggregated in your analytics pipeline. They influence your models. They appear in dashboards that inform business decisions. By the time you notice something is wrong, the corruption has spread through multiple systems and time periods.
The real cost is not technical. A winding order fix takes an hour. The real cost is trust. When you tell a client their quarterly metrics were wrong because some polygons had inverted holes three months ago, you’re not just explaining a geometry bug. You’re explaining why your validation let bad data through in the first place.
The pattern repeats
After you’ve seen enough of these, you start recognizing the pattern. Different specifics, same structure.
Mixed coordinate systems are common. You’re joining farm boundaries in WGS84 with raster footprints in UTM. You forget to transform. PostGIS doesn’t complain because the join operation itself is valid. Your spatial matches are now shifted by dozens of meters. Some farms get matched to the wrong satellite tiles. Nobody notices until someone overlays the results on a map.
Multipart confusion happens when your code assumes single polygons but your database stores multipolygons. Most features work fine. But someone owns two non-contiguous parcels, and your area calculation only counts one of them. The pipeline runs successfully. The number is just wrong.
Sliver polygons appear after union operations. Floating-point precision creates hundreds of tiny boundary artifacts. Each one is valid. Your feature count goes from 800 to 1,000. Nobody notices because it’s spread across a large dataset. Your aggregate metrics are inflated by 25%.
None of these trigger error handlers. They all pass standard validation. They all produce output that looks plausible until you compare it against ground truth.
What actually catches silent errors
The answer is not more validation functions. We run ST_IsValid at ingestion. We check for null geometries. We verify CRS. But ST_IsValid doesn’t check winding order. It doesn’t catch CRS mismatches. It doesn’t flag sliver polygons. It tells you if a geometry follows OGC rules, not if it represents reality.
What catches silent errors is observability. Log the right metrics. Know what normal looks like.
We log area before and after every transformation. We log coordinate systems explicitly. We log geometry types and part counts. When something changes unexpectedly, it shows up in the logs even if the pipeline succeeds.
SELECT
farm_id,
ST_IsValid(geom) as passes_validation,
ST_IsPolygonCW(geom) as correct_winding,
ST_Area(geom::geography) / 10000 as area_hectares
FROM farms
WHERE ST_IsValid(geom) = true
AND (NOT ST_IsPolygonCW(geom)
OR ST_Area(geom::geography) / 10000 NOT BETWEEN 0.1 AND 10000);
We also enforce constraints that matter for our use case. Our rendering stack expects right-hand rule winding. We force it on insert rather than hope the source data is correct.
The other thing that helps is adversarial testing. We maintain a collection of geometries that caused production issues: bowties, inverted holes, slivers, touching rings. Every pipeline change runs against these test cases. Not because they’re common, but because when they appear, they’re expensive to fix after the fact.
The lesson
In production systems, validation that says “this data is not broken” is not the same as validation that says “this data is correct.” The difference is measured in weeks of cleanup work and difficult client conversations.
Loud failures stop your pipeline. Silent failures quietly corrupt it. And by the time you notice, the damage is already done.
If you’ve run production data systems for a while, you’ve probably seen similar patterns in your domain. The specifics change, but the dynamic doesn’t. What silent errors have cost you the most?