How To Create A Single Source Of Truth For Your Customer Data

Your sales, marketing, customer success, and entire customer-facing operation is only as good as your data.

Creating a single source of truth for your customer data is the first step to good customer data management. It's the key to making your tools and teams work together because it offers a single data set that everyone can work off in the company. This includes non-customer facing teams who also manage customer data like product, finance, operations, and so on.

A single source of truth also enables better customer experiences. By having the full context on every person and company in one place, you can be more personal and relevant without continuing to “ask again”. A single source of truth enables personalization at scale (discussed in Part 6 of The Complete Guide to Customer Data).

With a more contextual, more relevant experience, your entire customer-facing operation can become more efficient. Sales, marketing, customer success - everyone - can reach the right person with the right content at the right time. Complete context empowers sales reps to reach out and be relevant when they might otherwise not, marketing to precisely personalize content, and so on.

single source of truth

In the third chapter of The Complete Guide to Customer Data, we’ll show you our framework for unifying, cleansing & distilling data to create your single customer view.

Objectives:

  • Map the identity graph of your data sources
  • Choose a tool to build your single customer view
  • Cleanse your customer data
  • Define fallback strategies
  • Create your “golden customer record”

The Complete Guide to Customer Data

Lesson #1: Why you have no “single source of truth”

In reality, your customer data is comprised of lots of data sources. There are many “sources of truth”.

Data sources are often siloed in different tools owned by different teams. Like missing pieces of a jigsaw, it’s impossible to create a single source of truth from siloed data. Sales tools don’t easily integrate web or product analytics data. Marketing tools don’t easily integrate subscription data. You need to have a complete, unified profile of every person and company that has ever interacted with your brand.

Second, data is often messy and “unusable” out of the box:

  • User-entered data through forms can have errors
  • Form data often has open text fields (which needs a human to sift through)
  • Data enrichment providers can sometimes send inconsistent data
  • Sales reps entering and editing data in a CRM can sometimes be inconsistent too 🙂

The challenge isn’t just unifying and cleansing all your customer data. It's making sense of many disparate data sources. Data sources often compete for providing identical or synonymous data about a person or company. Some of them are “truer” (what is truth?) than others, in being more precise, more up-to-date, or more consistent.

Finally, you need the logic for this to all run in as close to real-time as possible. Since data updates all the time, so does the context that every tool and team needs. Marketing tools and sales reps also need to take action “in the moment”, and so they need a persistent “source of truth” profile to pull from — unifying, cleansing and organizing data then syncing it straight into the end tools where they are needed.

The goal is everywhere tool to be “true”, so your teams can trust and depend on your data instead of guess around it.

golden customer record

Lesson #2: Map the identity graph of your data sources

In the previous two parts of *The Complete Guide to Customer Data*, you’ve outlined your customer data model to identify your ideal customers and you’ve mapped your entire customer journey to build a tracking plan and attribution model. This included outlining the data sources you need to store the attributes and track every event.

Merging on name is highly unreliable (how many Joe Bloggs can there be?) so most tools use other methods.

Each of these data sources uses an identifier. These often take some form of ID (like a database ID) and email for a person and domain for a company. The ID is to give a constant, never-changing identifier since it is possible identifiers like email addresses and domains can change or don’t truly reflect one person or company.

  • A lead might request a demo with a professional email (e.g. @hull.io) but be subscribed to your newsletter via freemail address (e.g. gmail.com)
  • Most website visitors do not pass an email address to you, so you only have some web session id to identify them
  • Companies corporate and “marketing” website domains can be different
  • Companies may be part of a larger group, so the domain name doesn’t reflect the company

Without a stable identifier, you cannot match your data together. This causes duplicate contacts and accounts in one tool, which can result in duplicate profiles or dropped data across all your tools.

For each of your data sources that you are using to track your customer journey, enrich your customer data model to identify ideal customers, and manage your key contact data, you need to map and outline their identifiers. You will find this most often in their API documentation for their contacts and for accounts.

Data Source Person-level Identifiers
HubSpot vid, email
Salesforce Id, email
Clearbit email
Your own backend SQL database

You’ll notice most services have a form of id as a primary identifier. This is the strongest identifier and should be prioritized.

In the same way, as you need a stable identifier within each of your tools, you need a stable identifier to associate known identifiers altogether. This may be one of your tools existing identifiers, or an identifier you create in your product database, customer data platform, or data warehouse.

identity resolution stable ID

Lesson #3: Choose the tool to build your single customer view (and its owner)

With all your identifiers mapped together, you can have a method to join data together from all your different sources. This enables you to create a single customer view.

Your single customer view should show everything you know about a person and everything they’ve ever done, according to all your tools, tracking, and databases - ever.

You need to choose a tool to host your master identifier, unify all your customer data together, and become the “single source of truth” for all your other tools. This is all one and the same. There are four common places this happens:

  • CRM systems
  • Marketing automation
  • Customer data platform
  • Data warehouse

Single customer view in your CRM system

CRMs like Salesforce can be used to provide a single customer view. They’re a popular choice because they tend to have strong integration ecosystems to feed the data in (because sales need it) and support reporting capabilities too.

However, most CRM technology dates from last decade weren’t built to ingest the huge volumes of data that modern teams use today -- data such as event data from your backend database, web session data from your own website and beyond, and large sets of enriched and computed attributes.

They often have limits to the format or amount of data you can store (without paying significantly more), the number of API calls you can send (making it difficult to stream event data) and can be harder to run data engineer tasks on. This limits the ability to build a single customer view, and run full funnel reporting from your CRM.

Single customer view in your marketing automation platform

Marketing automation platforms like HubSpot, Marketo, and Pardot can also be used to provide a single customer view. They’re built for a higher volume of contacts and people, including website data. They also have strong integration ecosystems like other tools.

However, the data models in marketing automation platforms can be limiting. Anonymous website traffic is one of the largest groups of people and companies that have interacted with your brand, but most marketing automation platforms only manage simple website tracking, and you can only associate “known” people and companies if they convert in that session. Product usage data, custom data objects (like in Salesforce) are also challenging to create and manage in their data models. This limits your ability to build a single customer view.

Marketing automation platforms are often best for using data to engage leads vs. managing lead and customer data themselves. This means they tend to have easy to use tools, to build segments, create automated workflows, and so on. This simplicity makes them more accessible to more people to manage their data.

Single customer view in your customer data platform

Customer data platforms like Hull are designed to unify and sync your customer data, not manage tools for sales reps or creating marketing campaigns. They automatically map all your identifiers, making it possible to aggregate data from all your data sources (including data that’s been historically difficult for marketers to access, like all anonymous website traffic and product usage data from your backend) into one single customer view.

Customer data platforms host a master identifier to associate all your data and outline your identity resolution strategy to create a unified profile. This enables you to manage your data layer in one place instead of the advanced settings of your CRM, marketing automation, and everywhere else in your martech stack.

However, customer data platforms add an additional tool to your stack. If your team, toolset, and customer data is small and simple, you may find it simpler and more straightforward to use one of your other key tools (like a CRM or marketing automation platform) instead of adding another moving part.

Tools need owners too. Customer data platforms work “behind the scenes”. In smaller companies, it may be the sales or marketing manager who line manages the teams also owns the tools and data. In scaling companies, this can be divided - management of people vs. management of data - and a customer data platform might become a preferred tool of choice for the data operations layer as your company matures.

Single customer view in your data warehouse

Data warehouses like Redshift are common amongst companies with large amounts of data who are looking to aggregate this data in one place to run reporting and business intelligence tools from.

Usually, this is a task owned by engineering teams. ETL tools are used to extract, transform, and load data from your tools, tracking, and other databases into your data warehouse. Once in your data warehouse, it can be organized to pull together all the data around a person.

However, a data warehouse is just another database - there’s no “view” of the customer. You need other tools to produce this view from the data, such as a business intelligence tool. This is why business intelligence tools like Looker and Tableau often come hand-in-hand with warehouses like Redshift. However, they still need to be bought, setup, and configured.

The problem of tools owned by engineering is they aren’t easily accessible to sales, marketing, and other go-to-market functions. Even with a business intelligence tool, the data isn’t as easily accessible or usable like it is with a CRM, marketing automation platform, or customer data platform. The difficulty of using data doesn’t train the behavior amongst your teams to be data-driven.

Data warehousing is designed to be a cheap way to store data (and is certainly considerably cheaper than storing data in your “front line” tools like CRMs). However, the infrastructure (like ETL and business intelligence tools) to support your data warehouse will often run quickly into the six-figures. Though this may come from engineering budget instead of sales or marketing, this can crowd out investment in alternative data integration methods — particularly with expensive CRM and marketing automation tools in the mix too.

Lastly, data warehousing is a one-way data flow — a data dump. The goal of your single customer view is to be the single source of truth for all your sales and marketing tools. Extracting and writing data from your data warehouse into your other tools requires even more tooling and technical expertise. These data flows are rarely real-time, so your data can often be a day or more behind.

The discussion and best practices of what tools you need for what job is coming up in Part 4: Choosing Your Marketing Technology Stack in 2018.

Make Hull your single source of truth

Hull is a customer data platform. Automatically fetch data, build your identity graph, and create a single customer view for each person & company. Use Hull Processor to cleanse your data, apply fallback strategies, and reformat data to sync to all your tools in real-time.

Explore Hull's customer data platform

Lesson #4: Cleanse your data to be useful in your end tools

Data, even if unified in one tool, often needs a to be cleansed before it is useful. Think of it like a water treatment plant for your data — you need a mechanism to review, cleanse, and test the “cleanliness” of data before it can be used in all your tools.

Without cleansing data, you can quickly end up filling up all your tools with junk, nonsense data. Unusable data erodes your team’s trust and results in poor customer experience. Cleansing data means you can maximize the value of the data you’ve captured.

Three types of data to cleanse:

  1. Identifiers
  2. Attributes
  3. Events

Cleansing identifiers

You can often receive junk email addresses and other identifiers during lead gen activities. You need to make sure you unify all know data around the same real-life person and real-life company.

Using APIs like Neverbounce, you can see if an email address is still valid and not a junk or defunct email address. This ensures you don’t create “ghost” contacts across all your tools.

Even if the email (or another identifier) is valid, there are a number of challenges to this on the person level:

  • Different name and data variations (like location and job title) around the same email address
  • Multiple email address variations for the same person (such as gmail sibling ed+spam@hull.io emails)
  • Multiple separate email addresses for the same person (such as ed@gmail.com and ed@hull.io)
  • Upper and lower case email addresses handled separately
  • Incorrect aliasing and merging by other tools
  • Lack of hidden fields and methods for passing identifiers through

You need methods to reliably link and “join” data together around the same person. This means you need to set the most reliable identifier and be able to use any other identifiers to merge together.

For example, in Hull external_id is the primary, unique, and the most stable way to reference a person (“User”). Multiple duplicate email and anonymous_id's (identifiers from third party tools) will then be merged around the external_id. This provides a safe, reliable, and automated method for unifying data between profiles and triggers a User merged event.

(See how Hull’s identity resolution strategy works.)

You can source other temporary identifiers (which may change or be deleted in the long term) to try to join together data such as website session ID’s from your web analytics tools, social handles (like a LinkedIn or Twitter handle). Methods like this should be used (and tested) with caution, and only if more stable, permanent identifiers do not already exist.

Your tool to maintain your single customer view must be able to capture all possible identifiers, safely reformat identifiers (like upper and lower case email addresses), and enable you to define an identity resolution strategy to merge people’s profiles together.

duplicates-data-dropped-golden-customer-record

The same logic that applies to person-level identifiers also applies to company addresses. You need to have a stable, centralized identifier in your single customer view tool to associate and de-duplicate accounts using other identifiers such as domain name

Company data is widely available and can be used to backfill and verify company-level identifiers like domain, match the company from an email, and even return the company behind an IP address. This also helps with linking people with companies where those associations aren’t obvious (such as a lead with an @gmail.com address).

Cleansing attributes

Attributes on profiles often need cleaning up. These are susceptible to poor data entry from sales reps, or other formatting issues which need to be cleaned up before they can be used.

For instance, Salesforce is very specific with the data it requires:

  • Distinct First Name and Last Name for a Contact record
  • No “empty” values. (So set Unknown for values to sync if their value is empty)
  • Set formatting to the Salesforce format (like a pick list)

Salesforce is often “a source of truth” for sales reps and marketers using this data depend on this data to be reliably pre-formatted to populate email templates without needing formatting on their end. For example, setting the company size in terms of employees as a range (based on an integer value from a variety of data sources), or Unknown if there is no value.

Your tools to build and maintain your single customer view need to be able to transform a wide variety of possible inputs reliably into a standardized output. Here’s an example of the logic needed to transform the company size data into pre-defined ranges based on employee count.

// Set company size in terms of range of employees
if(_.has(user, 'traits') 
   && _.has(user.traits, 'clearbit_company_metrics_employees') 
   && user.traits.company_size == "Unknown") {
        var $nb = user.traits.clearbit_company_metrics_employees;
          if ($nb <= 50) {
      traits({ company_size: "1-50" });
    } else if ($nb > 50 && $nb <= 100) {
      traits({ company_size: "50-100" });
    } else if ($nb > 100 && $nb <= 500) {
      traits({ company_size: "100-500" });            
    } else if ($nb > 500) {
      traits({ company_size: "500+" });    
    }
}

Cleansing events

Unless you have a consistent tracking plan implemented (discussed in Part 2: Customer Journey Mapping, Analytics & Attribution Modeling), you may have a lot of inconsistent events.

The easiest way to cleanse your event data is at source — implement a clean, consistent tracking plan across all your tools.

If you have historical data that is “messy”, the simplest method isn’t necessarily to compute and backfill previous events with a new name. Few tools store & compute event data for long periods of time; even fewer tools allow you to rename events after they’ve happened.

To maintain the context of past actions, the simplest method is to write past events as attributes. For instance, you may have three different email signup events that are equivalent:

  • Newsletter subscribed
  • Email subscribed
  • Added to newsletter

Instead of standardizing historical events, you can write a Newsletter Subscribed at attribute based on the earliest date of any of these events. This gives you the same result and makes the data more portable (since more tools enable you to sync attributes than events).

Take a break? We'll email you the rest

Lesson #5: Build fallback strategies to determine the “truest” data source

In your list of data sources, you will likely have many competing data sources for the same (or similar) data types. This creates a problem in deciding what is the “true” source.

For instance, most B2B companies will collect some form of company_name. This is used in CRM account records, to personalize messaging, in billing, and so on. There can be many true sources for it including:

  • Demo request form
  • Chat conversation
  • Stripe subscription
  • Clearbit enrichment
  • Datanyze enrichment
  • Salesforce (sales rep input)

The same situation occurs for common B2B data like job titles, location, company size, and so on.

The rules to define which data source you use are called fallback strategies. This defines the cascading order of data sources for a given attribute to ensure the most accurate, known data - as well as prevent blank Unknown fields being synced everywhere.

Designing your fallback strategies

You will likely use multiple types of sources of data. You need to understand how these will relate and overwrite each other. There are two types to understand:

  1. Time-based fallback strategies: When you source what data
  2. Source-based fallback strategies: What data sources have precedence

It may be that you can’t source all the data you need upfront to identify a lead or customer as an ideal customer profile (as discussed in Part 1: How to create an ideal customer profile & data model).

In this case, you need to select which data to prioritize collecting. For instance, instead of long lead forms or live chat sequence to fill all the data types you need, focus on the top priority questions (besides contact information) that will help you focus in on ICPs (and eliminate most of the others).

To create this, transform your abstracted master customer data model into an ordered list by the priority you need to ask it, where you will source it from, and when. You may want to consider this with other sales qualifying criteria like budget, authority, need, and timing (i.e. whether they’re ready for sales or not) that you need to capture through your forms too.

Pull all the data you need to source into an ordered list:

Type Data Type Data Source Captured When
Contact Name Form 1 - On demo request
Contact Email Form 1 - On demo request
Master data model (ICP) Number of employees Form 1 - On demo request
Sales qualifying criteria (BANT) Problem to be solved Form 1 - On demo request
Master data model (ICP) Job title Enrichment 2 - After demo request
Sales qualifying criteria (BANT) Marketing budget CRM 3 - First sales call
Sales qualifying criteria (BANT) Authority to purchase CRM 3 - First sales call
Sales qualifying criteria (BANT) Timing CRM 3 - First sales call

Second, you need to consider source-based fallback strategies. For something like job title, there are various different accurate sources of this:

  • Data through form submission
  • Data from 3rd party enrichment tool(s)
  • Reviewing their LinkedIn profile and email signature (e.g. sales rep updating CRM records)

Depending on the exact type of data in your industry, you may find different data sources more reliable (accurate, complete, up-to-date) than others. Should sales reps be able to overwrite anything submitted directly by a lead or customer? So for each data point in your master customer data model, you need to define how you prioritize the different sources.

For instance, for identifying job title you may prioritize your sources like so:

Source of Job Title Data Tool Store as Priority
CRM record Salesforce job_title 1
Form submission HubSpot job_title 2
Data enrichment Clearbit employement_title 3

Whether you manage your customer data (your CRM, marketing automation, or customer data platform), you should be able to setup a fallback strategy.

Here’s an example of the logic you need to implement this.

// Fallback Stategy for Job Title
// Salesforce Contact (CRM record updated), HubSpot (form fill) & Clearbit (enrichment)
const userTraits = {};
if(_.get(user, "unified_data.job_title", null) === null) {
        const companyNameFallbacks = [
      { dataObject: user, attribute: "salesforce.job_title" },
      { dataObject: user, attribute: "hubspot.job_title" },
      { dataObject: user, attribute: "clearbit.employment_title" }
    ];

  executeFallbackStrategy(userTraits, "unified_data/job_title", jobTitleFallbacks);
}

traits(userTraits);

function executeFallbackStrategy(dataObject, attributeName, strategy) {        
  _.forEach(strategy, (s) => {
      if (!_.isNil(_.get(s.dataObject, s.attribute, null))) {
        _.set(dataObject, attributeName, _.get(s.dataObject, s.attribute));
        return false;
      }
    });
}

Your tool that maintains your single customer view must enable you to define fallback strategies to prioritise your data sources.

identity resolution fallback strategy

Lesson #6: Complete, Real-Time “Golden Customer Record”

With your data all unified into one profile, cleansed, and your fallback strategies defined, you have a complete, accurate, single view of the customer — this is your “golden customer record”.

Your golden customer record is the single source of truth for all your other tools. Instead of mapping data between all your tools (where everything has to integrate with everything else), you pass all your data through the process of unifying, cleansing & running fallback strategies so your golden customer record is always up-to-date. From here, you sync complete & accurate customer data across all your tools.

However, for this to be effective, the data flow needs to happen at the speed at which your teams and tools need to react. This depends on the time you allow between:

  • A Demo requested event and your sales reps reaching out
  • A User signup event and the onboarding drip email series being triggered
  • A website visitor Chat started event and your chatbot responding
  • A subscriber hitting unsubscribe and your next email being sent

For all these points of engagement, they should be able to leverage the full context of your golden customer record to personalize the engagement — instead of sending the same message & experience to everyone. This means the data flow needs to happen in as close to real-time as possible.

Earlier, when discussing tools to host your single customer view, the one-way and slow nature of ETL tools and data warehousing was discussed. There are plenty of use cases where customer data needs to be captured and stored that don’t require a real-time data flow.

However, for common B2B use cases like lead qualification, email personalization, triggering live chat, and so on, data warehousing doesn’t supply the persistent always-up-to-date golden customer record that these tools need.

This isn’t only about optimizing data flows based on the job-to-be-done, or empowering teams to be more efficient. Maintaining a shared, near-real-time context of every lead and customer across all your tools enables you to create a better customer experience, as discussed in Part 2: Customer Journey Mapping, Analytics & Attribution Modeling.

single source of truth best practices

Put it into practice

There's no ready-made template for providing your single source of truth. Every organization has this challenge, but like a snowflake, every combination of tools, teams & data is different.

Follow the principles in this guide to create a real-time data flow and single source of truth.

Make Hull your single source of truth

Hull is a customer data platform. Automatically fetch data, build your identity graph, and create a single customer view for each person & company. Use Hull Processor to cleanse your data, apply fallback strategies, and reformat data to sync to all your tools in real-time.

Explore Hull's customer data platform

Next up: Choosing your marketing technology stack in 2018

With your ideal customer profile and customer journey map defined, and all your data sourced, unified & condensed into your golden customer record - your single source of truth - you need to consider which tools to use.

The temptation for many sales and marketing teams is to buy tools for every pain point, fad, and fashion. This creates a bloated, expensive frankenstack that creates a disjointed customer experience.

In Part 4 of The Complete Guide to Customer Data, we’ll share our pragmatic framework for “hiring tools to do a job”.

Get the next guide in the series

Ed Fry

Prev 'Ed of Growth at Hull, working on all things content, acquisition & conversion. Conference speaker, flight hacker, prev. employee #1 at inbound.org (acq. HubSpot). Now at Behind The Growth

If you've questions or ideas, I'd love to geek out together on Twitter or LinkedIn. 👇