Customer Data Integration Best Practices At Any Scale
31st Jan 2020The problem every sales, marketing & customer-facing team has - the data you need in one tool is usually in another.
Customer data integration is the process of syncing the data you need about any website visitor, lead, customer, account - any person or any company that has ever interacted with your brand or product - into these tools. It's about making all your tools, teams, and data work seamlessly together.
In the fifth chapter of The Complete Guide to Customer Data, you'll learn the strategies for integrating customer data, and choosing the right method for your business.
Objectives:
- Understanding the integration strategies
- Evaluate integration strategies for your company stage
- Understand the data integration paradigm shift as you scale
- Learn the data integration infrastructure you need
The Complete Guide to Customer Data
Subscribe for the complete 7-part series of our guides & best practices we teach at Hull. Get the next guide to your inbox before anyone else:
- Create an ideal customer profile & data models
- Customer journey mapping, analytics & attribution models
- Creating a “single source of truth” for your customer data
- The MarTech Napkin: Choosing marketing tech
- Customer data integration best practices at any scale
- Orchestration is personalization: Creating 1:1 experiences at scale
- Structuring your internal teams and resources
Lesson #1: The overview of customer data integration methods
There are dozens of different methods to integrate customer data, but they fall broadly into three main categories.
- Manual
- Automated
- Engineered
Manual customer data integration involves all things copy-paste, manual data entry, importing & exporting CSVs, updating spreadsheets, and so on.
Automated customer data integration use APIs and webhooks to sync data between tools. Integration libraries like Salesforce AppExchange, HubSpot Connect, and the Intercom App Store give a variety of "one-click" integrations. Workflow tools like Zapier come under this category too.
Engineered customer data integration is when your teams use APIs and webhooks to build and maintain their own integrations and data flows. This will include writing & maintaining scripts, building & maintaining data warehouses, setting up webhooks, fetching data from your backend database, and so on.
In general, the more complex your data integration needs, the fewer manual methods, and the more automated & engineered your data integration methods will need to be.
Customer data integration beyond tools
Many data integration methods and services focus on the ability to integrate data between tools - any data.
This is a problem, since it's not about integration tools. It's about integrating data. You need to be able to make sure the exact data within each each tool. This means matching identifiers between tools, and matching like-for-like attributes & events (including event properties to attributes.
Lesson #2: Choosing the integration strategy for your company stage
There's a power dynamic here; teams own tools, tools own data. All tools, teams, and data all need to tie together. Tooling should never be the limiting factor to the tactics you can employ. Assuming you can hire & train the right people, neither should your team.
As your teams grow, your tooling expands, your data (and integrating your data) often becomes the limiting factor. Effective data integration empowers the use of each tool and each team member's productivity. Poor data integration does the opposite.
Your customer data integration strategy is highly dependent on the stage of your company's maturity. Later stage companies have more teams, tools, and budget, which all impacts how you tie your data together.
In Part 4: Choosing Your Marketing Technology Stack, we shared the MarTech Napkin - our framework for how you should choose tools for sales, marketing, and all things customer-facing. At the bottom of this framework was the note on integration methods.
To recap, early stage (or pre-product/market fit) companies do not have complex data or multiple different tools. When maturing and moving off simple spreadsheets and using tools, it's important to establish a single source of truth - the key tools that will host the entire, up-to-date context on every visitor, lead, and customer.
The goal is to make sure your single source of truth is complete and up-to-date, whilst minimizing the "cost" of managing its data. This is commonly done by replacing common manual data integration with automated data flows - integrations and workflow tools.
If the number of tools is kept low, you can usually integrate every tool directly with each other and maintain a two-way sync. Sometimes you may find you need to "forward" data between tools via another. For instance, a lead captured via a Typeform through HubSpot to Salesforce.
The mistake amongst small businesses is to attempt to try to do too much, which leads to too many tools (which are under-used), which complicates your data integration strategy. It becomes more challenging to aggregate and use your data between all your tools, which means you can't leverage your data.
Lesson #3: The paradigm shift of scaling customer data integration
When you scale-up, there are four trends which fundamentally impact your customer data integration strategy.
- You have significantly more resources, which means more tools, teams, budget & data to manage
- Management of your team & management of your data splits (usually owned by ops)
- Your customer journey (channels, lifecycle, "conversions") becomes more complex
- You have significantly more data and more types of data to integrate.
A simple early marketing team (of one) might only have a few points of conversion. A newsletter subscriptions, a demo request or trial signup, and a few actions via email or chat. Scale-ups and enterprises support dozens of separate conversions across dozens of separate channels managed by dozens of separate tools.
As the number of tools radically increases (to use your increased resources take advantage of the new opportunities to grow), the previous models of data integration breakdown.
Native integrations (like the Salesforce AppExchange, HubSpot Connect, and Intercom App Store) are designed mostly as pairwise, two-way integrations. It is very simple and clean with a small number of tools to integrate everything with everything else.
At scale, this totally breaks down. Most scale-ups and enterprises have dozens of different (often overlapping) tools, each hosting customer data. Without that data shared and integrated across every tool, it is impossible to build a single customer view with the complete context of every person and company that has interacted with your brand ever.
If n is the number of tools in your system of tools, then the number of pairwise integrations needed is (n*(n-1))/2 tools to make sure all your data stays in sync. You'll notice how this increases exponentially.
Number of tools | Number of pairwise integrations |
---|---|
3 | 3 |
5 | 10 |
10 | 45 |
20 | 190 |
It's not uncommon to reach this high number of tools hosting customer data very quickly. Core tools for scale-ups include:
- A sales CRM, with person, company, and opportunity/deal objects (e.g. Salesforce)
- A marketing automation tool as they key tool for marketing (e.g. HubSpot, Marketo, Pardot)
- Data enrichment (e.g. Clearbit)
- A website-based live chat tool for sales (e.g. Drift)
- An in-app live chat tool for support & messaging (e.g. Intercom)
- A support ticketing system (e.g. Zendesk)
- A website analytics tracking tool
- A product analytics tracking tool
- Your backend database that powers your product
- Your ad audiences
That's ten categories of tools right away. That's before adding billing tools, landing pages, 3rd party review sites, forms, meetings, SMTP email providers, sales enablement tools, and so on.
Each of your channels (and jobs-to-be-done) needs a best-in-class tool. You could reduce this into fewer tools, which may be simpler to manage (in terms of teams & data) but would compromise your ability to grow faster by maxing out each channel by using an inferior toolset.
To integrate all these tools in the same pairwise fashion needs real-time two-way data flows, and universal support for every type of data you support. Unlimited attributes, every event (with event properties), segments, person & company-level profiles - these should all be able to sync across all your tools.
In reality, this is not the case. Not every tool can support every data type (like events) without some level of transposition (like writing the latest Email Opened
as a last_email_opened
date attribute). Not every API or integration can support every type of data type in both directions, even if the tool can. Pairwise integration at scale is rarely "complete" - they can only sync part of your customer data.
Even if every tool doesn't integrate with every other, and you use some tools as intermediaries (like your CRM and marketing automation platforms which usually have stronger integration ecosystems), you still end up with a lot of complexity to manage.
Having hundreds of partial integrations, zapier workflows, custom scripts (that may or may not be maintained), forwarding data through intermediary tools, and ongoing manual export/import/copy/paste jobs becomes incredibly complex, opaque, and unreliable. It creates a "fRaNkeNsTaCk" of tools & teams which don't work seamlessly together. It also makes customer data integration everyone's problem.
Centralizing your customer data for scaling
Instead of connecting everything to everything else (which requires (n*(n-1))/2 integrations), you need to centralize your customer data. You need a single source of truth.
Earlier in The Complete Guide to Customer Data, Part 3: Creating a "Single Source of Truth" for your Customer Data the importance and make-up of your single source of truth was outlined. The process of creating your single source of truth involved:
- Tracking data from every source
- Resolving all that data correctly into person and company-level profiles
- Cleansing the data to make it immediately usable
- Defining fallback strategies to determine the "truest" data source
- Computing a "golden customer record" to sync to all your tools.
Centralizing your data also escapes the problem with exponential complexity with the number of integrations. Instead, you have a constant level of complexity to maintain. 10 tools need only 10 integrations, instead of 45. 20 tools need only 20 integrations, not 190.
Centralizing data operations at scale
In the same way you need to centralize your data, you need to centralize the management of your customer data.
At scale, your customer data is completely cross-functional. Marketing needs product data needs sales data needs finance data - and so on.
Rather than becoming everyone's problem, centralize the teams that manage your customer data, so your sales reps, marketing managers, support staff, and leadership don't have to wrangle data between their own job - the data they need is in the tools they want to use, and it's someone else's job to make sure that happens.
More of this in Part 7: Customer data management & operations
Take a break? We'll email you the rest
Subscribe for the complete 7-part series of The Complete Guide to Customer Data. Get this guide and the next to your inbox before anyone else:
- Create an ideal customer profile & data models
- Customer journey mapping, analytics & attribution models
- Creating a “single source of truth” for your customer data
- The MarTech Napkin: Choosing marketing tech in 2018
- Customer data integration best practices at any scale
- Orchestration is personalization: Creating 1:1 experiences at scale
- Customer data management & operations
Lesson #4: The difference between transactional and analytics data processing
Often, at the same time as the data integration paradigm shift becomes an obvious pain point, there are other data integrations projects that become common - notably data warehousing.
Data warehouses (like Amazon Redshift) are a cheap repository of data from your entire companies operational systems. This doesn't just include customer data, but other data types too like financial or product data.
Data warehouses become the source for business intelligence and analytical tools (like Tableau and Looker) which help to bring visibility over your data across your entire organization.
To fill data warehouses, you need to connect these with your operational tools and systems. These tools are usually classified as "ETL" tools - extract, transform & load.
It is important to understand that is this is a one-way data flow. Data warehouses are a dump of data. This services the analytical layer - often referred to as OLAP (Online analytical processing).
Second, data warehousing data flows are often very slow. Rarely does analysis occur in real-time (reporting does not need to be produced, consumed & acted upon millisecond-by-millisecond). ETL processes are often a nightly job that run in the background.
Data warehousing is not customer data integration since it is only one-way, and not in real-time.
Remember, the jobs-to-be-done all center around making the data you need from one tool available in another. Your sales & support reps, marketing campaigns & workflows, and other systems of engagement need this data to be available in real-time so they can react to customers (and potential customers) activity.
Data warehouses do not make data readily accessible to any of these tools or teams. They need the data in their tool of choice; data warehousing is simply dumping the data elsewhere. It is akin to asking a guest at a restaurant to leave and go to a warehouse of raw, unprepared food instead of preparing, cooking, and serving food in front of them.
Whilst the insights drawn by business intelligence from data warehousing can be valuable, it requires manual work to run the analysis, report it to teams, and those teams to adjust the customer experience. None of this is real-time.
But customer experiences are very time sensitive:
- Leads qualification
- Drip email campaigns
- Live chat conversations & chat bots
- Website personalization
This means you need a different system for customer data integration between all your tools, teams, and data as you do for your data warehousing. Customer data integration is real-time and transactional - often called OLTP (Online Transactional Processing).
Just as you might choose the best-in-class tools for each channel (vs. all-in-one tools), teams choose tooling optimized for each role. An OLAP toolset for data warehousing and business intelligence, and an OLTP toolset for real-time customer data integration.
Hull enables real-time customer data integration.
Unify data about every lead, customer & company from all your tools, tracking & databases into Hull Profiles. Map, segment & stream every update about people, companies & segments two-way between all your team’s tools.
Lesson #5: Real-time customer data integration infrastructure at scale
Centralized, real-time, two-way customer data integration becomes a data engineering problem. There are ten jobs-to-be-done which make this work that you can use to compare against other tools.
1. Ingest data from all your source of customer data
Your customer data comes from many different sources, and you must be able to track it all:
- All your tools. Not just sales, marketing & customer support tools, but also product, finance, operations - everywhere
- All your tracking. Across your website, your product, and 3rd party platforms (like review sites & social)
- All your databases. Including your products backend, and even your data warehouse
For most of this data, like your marketing tools and product tracking, this will be streaming data. Real-time, constantly updating data.
For other data, and for setting up data flows, there will be large and infrequent (or one-off) data imports.
You need an OLTP process that can ingest all of this data from every source.
2. Detect any data type & format, and organize them by source
The data structure between tools often varies, with various levels of "strictness" to the data quality. For example, Salesforce will drop data with incomplete or incorrectly formatted fields.
Whatever the data source, different data types like strings, arrays, numerics, booleans, and dates should all be detected from whatever source. Your data should also be organized by data source to make sense of it later. For instance, hubspot.contact.job_title
.
You need an OLTP process that can detect & format any type of data, and organize it.
3. Identity resolution to match data around people & companies
Customer data will be tied to identifiers like id
's, email
's, ip
's, and domain
's in all your different tools.
As discussed in Part 3: Creating a "Single Source of Truth" for your Customer Data", you need to be able to match up the identifiers from all your different tools, prioritized by "stability" (to minimize false merging of profiles), and aggregate all that data under one common profile for each true person.
For B2B companies, you need to have the same process for aggregating data around companies. This needs to account for the complexity of how company groups can be organized (like holding companies, subsidiaries, or acquisitions), as well as accurately associating people with the company - including anonymous website visitors.
This logic should be set and defined, but also be flexible & adjustable to match up your particular business logic.
You need an OLTP process with a reliable identity resolution system.
4. Display a single customer view
With all the data ingested, formatted, and matched with the right identities, you need to be able to "view" this data in a profile (like you can in a CRM).
Profiles should be able to display a complete view of everything known about a person, everything they've ever done (as a timeline of events), and every action taken onto their data (like attribute or segment changes).
These profiles should be updated automatically with any new data ingested into the system.
You need an OLTP process that can produce a real-time unified customer profile.
5. Change detection
As new data is ingested, your profiles will need to update too. This might be a new data (a new profile, or attribute or event on a profile) or updating existing data (like updating an existing attribute).
To do this, you need to be able to compare the latest data to be ingested with the existing record.
Change detection is key to preventing "silent" updates being synced to your other tools, which can cause severe backlogs and data loss due to 3rd party API rate limits.
// The Pope is still Catholic
hull.traits({ religion: "Catholic" })
Since your customer experience is real-time, change detection will need to be calculated quickly too - for instance, as someone views different pages on your website, or has a live chat conversation.
You need an OLTP process that can detect the changes between new and existing data.
6. Transform & re-compute data
Usually, data ingested from one tool needs to be transformed in some way before being used in another. Common examples of this include:
- Enrichment: appending data to existing profiles
- Cleansing: normalizing & reformatting data
- Segmentation: grouping people or companies by a query
- Transformation: changing the format of data
You need the flexibility to change data in any way. For some tasks like segmentation, a point-and-click editor is an intuitive way to query and change data.
For other methods, like transforming data according to complex logic (like writing lead_source
attributes for a multi-attribution model), a visual point-and-click editor can be very limiting compared to writing code.
You need an OLTP process that gives you complete flexibility over what and how you transform your customer data.
7. Map fields & filter data flows to tools
With your data ingested, matched with profiles, and transformed, you need to be able to choose the exact data to sync across all your tools.
This is important since not every tool can process every type of data. Most tools are also limited in the number of a data type (like the number of attributes per profile) or charge extra (such as HubSpot's per-contact pricing, or Salesforce's data storage overage charges).
You should be able to whitelist every data object you want to sync, and to what data object, field, or "thing" within your end tools. Common data objects include:
- Person (Contact, User, Lead, Visitor)
- Company (Account)
- Segment (List, View, Audience)
- Attribute (Property, Trait)
- Event
Even if your end tool doesn't natively support one of these objects (like company profiles), you should have a method to still write this data in (like "flattening" company profile data and appending to every associated person's profile).
You need an OLTP process that lets you precisely map & filter the exact data being synced out to each tool.
8. Build new notifications to 3rd party tools
Your end tools only need to be updated when a change of a "whitelisted" data object is detected. This prevents overwhelming end services and burning through API limits.
Burning out API limits is a common cause of data loss, particularly if multiple tools are integrated with an external service. This is another advantage of a centralized customer data integration strategy as all updates (and therefore all API calls) are funneled and moderated through one system.
For some services like Salesforce, excess API calls result in overage charges. Salesforce is limited to 15,000 API calls per day by default, which can easily be consumed by streaming data from multiple tools.
Most tool's APIs also support simultaneous 'bulk' updates to minimize the number of API calls. For instance, one call that includes updates for many people and attributes. Your tooling should micro-batch updates to take advantage of these API methods. This is particularly important when setting up new tools in your system, bulk importing data, or changing your setup - anything that results in a lot of updates across a lot of tools.
You need an OLTP process that can build & batch notifications of "true" changes to sync out to all your tools.
9. Detect dropped data and automatically retry
There are a number of reasons why an API may go down or experience latency. Even the best APIs don't maintain 100% uptime.
When managing data integration across multiple tools, your systems need the resilience to maintain data flows if any part of the system fails. This needs to account for the streaming, real-time data flows instead of just nightly ETL jobs - there is far less tolerance.
To do this, when data is synced out, the response needs to be detected. If there is a slow or negative response, then the outgoing update needs to be sent later and an automatic retry sent. This means temporarily storing the outgoing update in a database, then comparing with any new updates, and retrying sending later.
You need an OLTP process that can detect, store, and retry every outgoing data.
10. Log all data flows and make it queryable
Just as you need web and product analytics to understand your customer's experience, you need logs to understand how your data flows.
All ingested, re-computed, and synced data should have fully queryable logs by every data object type (person, company, attribute, segment, event), and the action that happened (like Successfully Imported
).
This level of transparency into data flows is needed to identify the causes of problems, which can become complex and slow in a large system of tools. Without logging and querying tools, it becomes impossible to diagnose and fix issues, which blocks teams from trusting and using your customer data.
You need an OLTP process that can log all incoming, outgoing, and changes made to your customer data.
Lesson #6: Customer data platforms are built for real-time customer data integration at scale
The ten jobs-to-be-done outline the data infrastructure used inside tech companies to make data integration at scale. They also outline exactly how customer data platforms work behind the scenes.
Customer data platforms are tools for creating this centralized OLTP data infrastructure to integrate all your tools, teams, and data.
Unlike OLAP data warehouses, they build a single customer view, they transform and sync data two-ways between all your tools, and they work in real-time.
You can see the detail of how a customer data platform works in Hull's data lifecycle documentation.
Customer data platforms enable companies to grow faster by fully enabling every tool and team to leverage all your customer data.
Fake customer data platforms
As CDPs become more prevalent, there are more tools positioning themselves as customer data platforms. Often these take the form of all-in-one tools, analytics tracking tools, or data warehousing tooling.
In the same way, many tools can claim to be a "CRM" or an "email" tool, the difference between solutions comes from their true capability.
The ten jobs-to-be-done outline exactly what you need to look for in your real-time customer data integration infrastructure. This is the minimum viable requirement of what you either need to build and maintain within your team, or you need to list as a requirement from a customer data platform you're looking to buy.
Hull scores 10/10 for the real-time customer data integration scorecard
Hull was built for solving hard customer data integration problems at scale. Trusted by Drift, Appcues, Mention & more - learn how Hull's customer data platform works.
Anticipating your customer data integration strategy
Your company will be either deep amidst the customer data integration problem (an enterprise), or growing rapidly and beginning to experience the complexity of customer data (a scale-up).
In SaaS, this often occurs approaching and after a Series B round where funding is secured to significantly accelerate your already established go-to-market strategy.
The best teams anticipate the problem of data integration, like they do with managing their teams (hiring senior leadership) and tools ("hiring" best-in-class tools for each channel).
If you can recognize the symptoms of the problem early, you can establish a single source of truth and data integration strategy early on, and continue to accelerate growth.
Next up: Orchestration & personalization: Creating 1:1 experiences at scale
So far in The Complete Guide to Customer Data, you've defined the data you need from your ideal customer profile and customer journey map, then you have chosen your tools & how you're going to integrate them to deliver the customer experience you've mapped out.
The next guide is all about how to apply your data with your tools and teams to "orchestrate" a perfectly personalized 1:1 customer experience with our six-part framework.
Get the next guide in the series
Subscribe for the complete 7-part series of The Complete Guide to Customer Data. Get the next guide to your inbox before anyone else:
- Create an ideal customer profile & data models
- Customer journey mapping, analytics & attribution models
- Creating a “single source of truth” for your customer data
- The MarTech Napkin: Choosing marketing tech in 2018
- Customer data integration best practices at any scale
- Orchestration is personalization: Creating 1:1 experiences at scale
- Customer data management & operations
Prev 'Ed of Growth at Hull, working on all things content, acquisition & conversion. Conference speaker, flight hacker, prev. employee #1 at inbound.org (acq. HubSpot). Now at Behind The Growth
If you've questions or ideas, I'd love to geek out together on Twitter or LinkedIn. 👇