The 5 Major Reasons My Data Integration Projects Failed
9th Sep 2019Itâs 3 AM. My alarm goes off and I groggily climb out of bed and crack open my laptop. One of our biggest customers needs their data delivered by 9 AM, and Iâm getting up before sunrise to triple-check every data point before their delivery. Even with our robust data platform, built with hundreds of data audits, the complexity in this particular delivery for this customer is just too high to feel 100% confident that weâve captured all potential issues. This scenario would soon become a typical morning for me. Wake up. Coffee. Pray to the data gods for an inbox without 500 Zendesk ticket escalations.
My name is Tim Liu. I quite recently joined Hull as head of data integration, but Iâve been working in the data management space my whole career. The story above happened years ago in a different industry, with a different team, and during a time where I knew very little about the hidden complexities of data integration. Since then, Iâve been beaten up a lot, but not without learning a lot of lessons about the nature of data integration.
We all subscribe to the mantras âData is a companyâs most important assetâ and of course, âYou canât manage what you canât measureâ, but beyond the ideal state for many companies lies a vast wasteland of expensive, failed data projects. Thatâs because data integration is the foundation of most data analytics and ops projects, but itâs also undoubtedly the trickiest part to get right.
Data integration issues will kill your project before you start to see any value, but the good news is â itâs not all doom and gloom. There is a path to success, but itâs a path less traveled, and even less talked about.
Iâm here to tell you about what happened when the data projects I worked on failed, why they failed, and what I learned so that you donât run into the same landmines with your own customer data integration project.
Reason #1: I didnât fully think through our identity management strategy
Iâve managed enough data integration projects to realize that identity management (and identity resolution) is at the center of many common data problems. At its core, it deals with the major entities in your system that youâre trying to analyze and bend to your will. Letâs take the customer data space for example. What defines a Person in your system? What defines a Company? Poor identity management is the cause of expensive deduplication cleanups and manual intervention. Knowing what identifies the entities in your system is of paramount importance, and Iâve come to the realization that this is one of the first things your data team will need to decide on upfront. Changing your identity strategy in the middle will inevitably lead to an explosion of duplicates, bad relationships, and a manual cleanup effort.
Reason #2: I didnât clearly define our leading system of record
If identity management is the first spinning plate, then the relationships between those identified entities is the second. For example, youâve got the People in your marketing system, but youâve also got the Companies to which theyâre related. Without an accurate association it will be very difficult to execute on any automated ABM workflows. Relationships between entities are even harder to maintain sometimes because they rely on a robust identity management strategy (itâs still difficult to compensate for all the edge cases). This is why itâs important, especially for maintaining the correct relationships, that you have a leading system.
What is a leading system? Itâs a single system thatâs the arbiter for a particular attribute or relationship. Especially when it comes to a Person-to-Company relationship, you want to make sure that youâre creating that relationship in one place. Otherwise, youâre in for a world of data loops where individuals are hopping between companies that look similar: AmazonInc.fr vs Amazon.us. Ideally, a leading system should be easily accessible by your data admin, in case thereâs a scenario where you have to manually intervene to make the correct association.
![]()
"Data integration issues will kill your project before you start to see any value, but the good news is â itâs not all doom and gloom. There is a path to success, but itâs a path less traveled, and even less talked about."
Tim LiuDirector of Integration at Hull
Reason #3: I underestimated âscope creepâ
Okay, so Reason #3 isnât something I personally did, but something I heard enough from customers and prospects that I thought it worthwhile to mention.
In the pre-sales process, I had many conversations with prospects who ended up talking themselves into building the integrations themselves. I always had the same response: Godspeed and good luck! The number of services and the nuances in each application makes this problem ridiculously hard to solve even for the experts. Even if youâre able to secure engineering time to build the integrations for the handful of applications that you have, you canât forget about the time it takes to tweak and maintain the solution. Oh, and did I mention bugs? Yeah, itâs not like those are going to happen, right? The truth is: Yes, there are simple scenarios where it may make sense to complete the project internally. But usually anything thatâs more complex takes a lot more work.
Reason #4: I didnât have a clear plan in place for legacy data
In some data integration projects, this may not be a problem at all, yet in others, it may be the only problem. A LOT of customers have this fear of losing their legacy data. âBut the insights!â theyâll say. First, you should check yourself for a minute and determine whether or not youâre a data hoarder. Many times, the juice just isnât worth the squeeze. The likelihood of legacy data telling you something useful in the future may be slim. Now consider the time and cost of integrating the legacy data pipeline with the new one.
Many times, especially in projects where new data accumulates fairly quickly, itâs easier just to develop a strategy going forward. With customers who insisted on integrating legacy data sets so they could have several months of history, I would usually tell them that the project to clean and integrate their data would be hard, but we could certainly do it in a few months time. If you need 6 months of clean, pristine history, my general wisdom would be to recommend ensuring your existing data strategy is solid, and then collect 6 months of data from there instead of embarking on a costly data cleanup project.
But in the end, it all depends. At Hull, we have had customers who wanted to bring over legacy cookie data. We ended up keeping that intact for them so that they could differentiate new web visitors from returning visitors. My advice would be to look hard at your legacy data set, save what you absolutely need to, and then Marie Kondo the rest of it. If you must, you can always save a backup of your data somewhere inexpensive to satisfy the hoarder inside yourself!
![]()
"Beyond any particular list of potential issues, as long as you understand that data integration is a hard problem and have a data partner you can trust, you should be able to find the balance between the hype and the reality on the ground."
Tim LiuDirector of Integration at Hull
Reason #5: I took on too much too fast
Me on day 1 of my first data integration project: Letâs do this thing! Alright, weâre going to pull data from Intercom, then weâll cross reference it with the product data in our database, then marry it with our marketing campaigns, and maybe personalize the landing pages based on how far along the prospect is in their customer journeyâŠ
Me on day 37: SooooâŠthat was a little ambitious.
Since then, Iâve learned to start with some smaller, easy wins. My recommendation for companies integrating customer data into a customer data platform would be to start by identifying a clear use case that will bring your team value once implemented. For your first use case, keep things simple. The fewer the implementation points, the better.
If you donât know what that initial use case is, thatâs okay. Itâll take some time to figure out what makes the most sense for your business. If you need some inspiration, you can check out CDP Use Cases for some use cases with leading SaaS applications.
Hereâs what we often recommend at Hull: start by bringing in the data. Hook up your different systems to a customer data platform like Hull, and bring the data into one place. From there, you can explore the different data points from your applications, and the âwhat ifsâ that will inevitably come find you soon after.
Disclaimer: Your new database probably wonât end up as your production system because itâs now probably a big pile of disorganized data sets. But itâs your starting sandbox for exploration and discovery.
Let my failures guide you to success
I could elaborate on each reason above in their own small novel. I probably havenât seen it all...but Iâve seen a lot. Iâve sweated and bled for projects that were doomed from the start, but Iâve also been surprised at the projects that overcame tremendous odds to bring real value to our customers. Beyond any particular list of potential issues, as long as you understand that data integration is a hard problem and have a data partner you can trust, you should be able to find the balance between the hype and the reality on the ground.

Tim Liu is the Head of Product at Hull. Outside of work, he loves spending time with his wife and three kids, trying new restaurants, and getting the best deal on live lobsters.