My experience building Plotter, a visual data platform that aggregates dozens of APIs into a unified system, taught me that data integration is a huge process bottleneck. Too many businesses mistakenly view API integration as straightforward until they actually do it.
The Duck Parable
First, what is an API and how can we understand the challenges at a high level?
API stands for Application Programming Interface and is a way for programs or websites to talk to one another.
Let me explain the issue with a simple parable. Say you want to find the number of migrating ducks in the US after reading an article in National Geographic.
Being thorough, you want to verify across multiple sources.
First Source – Login Issues
You open the first site and, in classic internet fashion, you’re greeted with a message: “You need to log in to see this information.” APIs work the same way. Some are wide open — just grab the data and go, like most free APIs. Others require you to sign in or even pay for access. And then there are the truly “secure” ones: the sites that want your ID, your credit card, and a 3D scan of your face. APIs can be just as demanding.
Certain sources make you run entire third-party authentication tools or generate special keys that expire every so often, forcing you to jump through hoop after hoop just to get to the actual data. You clear all the hurdles on the first site and confirm: 32.3 million ducks. Great—now you want a second opinion.
Second Source – File Transformation
The next source is a government site, so at least there’s no login. Instead, you’re met with a wall of legalese on a page that hasn’t been redesigned since the ’90s. Eventually you spot the data link… which downloads as a .tar.bz2 file. You Google what a bz2 even is, extract it, and finally get an XML file, technically readable, but painfully tedious to navigate.
APIs can be the same. JSON is common, but plenty of APIs deliver data in odd formats, compressed archives, or “human-friendly” text that needs to be converted back into machine-friendly numbers. Just like cracking open that .tar.bz2 file only to dig through XML, sometimes getting to the real data takes a few extra steps.
Third Source – Cross-Referencing
The next site looks promising a fan-made duck database with filters for state, species, and gender. You don’t care about the details; you just want the total. So you cycle through every combination and add up the results… only to get around 90 million ducks. That can’t be right.
But it is: ducks migrate, so they’re counted in every state they pass through. Now you need outside knowledge to confirm migration patterns because the raw numbers alone don’t make sense. So now you’re stuck figuring out which states each duck species actually flies through, and how to divide those counts based on their migration paths. Researching these details and building workarounds for sources that don’t return clean or expected data can be a huge time sink.
And to make things harder, the data isn’t in one place; it’s scattered across dozens of tiny tables. You have to gather them one by one and assemble them into one bigger table. APIs are similar: the raw responses can be fragmented, and only by adding outside context and combining them together do you get a coherent result.
Fourth Source – Hierarchical Aggregation
Now imagine a scientific site that catalogs every animal. Its duck data sits in a table of species and scientific names. You might not know offhand that Cairina moschata is a duck, but the site also provides tables linking family → subfamily → genus → species. So you start at the Anatidae family, pull its subfamilies, then its genera, then its species, building the full list step by step.
Computers are great at this kind of repetitive cross-referencing as long as the relationships between tables are documented. APIs often work the same way. You may need to join multiple datasets, follow chains of IDs, and map one table to another. The only difference is that instead of scientific names, you’re usually dealing with cryptic identifiers like 6452183.
Scaling Issues
Finally, imagine applying that same logic across hundreds, thousands, or even millions of tables. Manually checking each one simply isn’t possible, so we need systems that can handle all of this automatically. That, in a nutshell, is what API integration really is.
Connecting to an API is trivial
Getting usable data is not
APIs bridge systems — like a waiter taking orders between client and server. Connecting to an API is easy but getting all available data in the format you want in a timely manner is challenging.
In practice, two major factors determine how hard an API is to work with:
1. Data Format Complexity
Different APIs structure their data in wildly different ways. Each system organizes information in whatever format makes the most sense for its internal logic, not yours.
Some APIs are built around single, detailed records that you fetch one at a time. Others return massive tables. Some use simple JSON; others send compressed files, XML, or custom encodings. Before you can store or use any of it, you often have to clean, normalize, and reshape the data.
In short: unifying inconsistent formats into one cohesive structure takes real work.
2. Data Discovery and Retrieval
Even once you understand the format, you still have to figure out where the data actually lives.
Many APIs don’t tell you what data exists. You have to manually investigate which tables matter, which fields update, and how they relate. APIs can hide their information across dozens or sometimes hundreds of endpoints, tables, or nested relationships. Learning which parts exist, which matter, and how they connect is often half the job. And if you miss a table, you miss the data. These aren’t edge cases. They’re the norm.
The core issue
The problem isn’t the connection itself, it’s everything that happens after the connection:
formatting, normalizing, discovering structures, resolving relationships, and stitching scattered pieces into a usable whole.
What’s Coming
This post sets up the problem. In the next one, I’ll break down the three categories of API challenges we encountered building Plotter:
- Technical complexity: Architecture, documentation, authentication, compute requirements
- Data reliability: Format consistency, discovery, cross-referencing, rate limits
- Compliance and operations: Security, data ownership, vendor lock-in
After that, I’ll show the framework we use to evaluate APIs upfront – best-case vs worst-case scenarios – and how we built tooling to handle the worst cases without multiplying engineering effort.
If you’ve done API integration work, you know exactly what I’m talking about. If you haven’t, the next post will show you why “just use their API” is never the answer.
→ Continue to the next post: API Integration Challenges: The Breakdown
