Ingest


The first stage of the data lifecycle is to ingest data. Ingestion has four stages:

  1. Data capture
  2. Data formatting
  3. Identity resolution
  4. Publishing User and Account Report

Data ingestion in Hull

The Hull Platform captures data through two APIs:

  • Firehose API designed for streaming updates (e.g. from connectors)
  • Import API designed for bulk updates (e.g. SQL Importer jobs)

Captured data is then cast into Attributes and Events. Attributes and Events are associated with Entities (Users & Accounts) according to claims. Updates are then published as User and Account Reports.

Firehose API

All connectors sync data to Hull through the Firehose API. This is exposed as a single endpoint on firehose.hullapp.io.

Messages

Each message in the batch represents Firehose::Event to ingest. It is composed of:

Message Description
Type traits, track, alias or unalias
Body Payload of the message to ingest
Headers Context to the message, including Hull-Access-Token which contains a JWT with a set of claims signed with the Application secret. Those claims are the data points used in identity resolution
Timestamp Local timestamp of the client generating the payload

Here is an example payload for the Firehose API.

POST https://firehose.hullapp.io/
> Hull-Organization: example.hullapp.io
> Hull-App-Id: 5a7962e2793d3e942a03b29a
> Hull-Access-Token: 123456789
> Content-Type: application/json
{
  "batch": [{
    "type": "traits",
    "body": {
      "email": "bob@bob.com",
      "name": "Bobby Lapointe",
      "country": "France",
      "city": "Paris"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.146Z"
  }, {
    "type": "track",
    "body": {
      "ip": "1.2.3.4",
      "url": "https://www.hull.io/docs",
      "referrer": "https://www.google.com",
      "event_id""8fbebdde-68e3-43cf-ae45-0edc8057ab8e",
      "properties": { "name": "iPhone" },
      "event": "Viewed Product"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.574Z"
  }, {
    "type": "alias",
    "body": {
      "anonymous_id": "1234"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.574Z"
  }, {
    "type": "unalias",
    "body": {
      "anonymous_id": "5678"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.574Z"
  }],
  "timestamp": "2018-02-06T09:35:12.576Z",
  "sentAt": "2018-02-06T09:35:12.576Z"
}

Messaging Lanes (including Fastlane)

Unless otherwise marked, all messages will be processed first in, first out.

For some use cases (such as real-time web personalization), ingestion needs be accelerated. To enable this, Hull can accelerate ingestion and computation for marked Users. Users marked as active enter FastLane for 10 minutes.

Notifications of Events (including Attributes Changed) for all active Users will be prioritized ahead of of other User types. Note: end-to-end data flow speed may depend on the API limits of the external service.

Import API

Bulk updates can be imported into your Hull Organization through the Import API. Data imports are processed separately to the Firehose API to minimize the impact on ingestion of live data.

All data to import must be:

See our Import API reference documentation

Importing Users

Create and update Users in bulk through the Import API. Every imported record MUST include at least a valid email or userId identifier to associate with a User. anonymous_id is not supported via the Import API).

User-Account associations can be specified by adding an accountId identifier in the record.

{
  "userId": "123",
  "accountId": "456",
  "traits": {
    "email" : "john@coltrane.com",
    "name" : "John Coltrane"
  }
}

Importing Accounts

Create and update Accounts in bulk through the Import API. Every imported record MUST include an accountId identifier.

{
  "accountId": "111",
  "traits" : {
    "domain" : "hull.io",
    "name" : "Hull"
  }
}

Import User Events

You can update User Events in bulk through the Import API. This performs a Users lookup before importing the event, to create or update a User.

{
  "userId":"12453",
  "timestamp":"2018-04-16T00:00:28.000Z",
  "event": "User Registered",
  "eventId":"1754752",
  "properties":{
    "email": "ed@hull.io",
    "plan": "business",
    "price": 129.00
  }
}

Attributes Data Formats

Attributes are properties collected and associated to Entities (Users and Accounts). Attributes are ingested by:

  • traits events from the Firehose API
  • User or Account records from the Import API

Trait types & type detection

Hull discovers new Attributes as new data is ingested and builds a Schema of known Attributes for Users and Accounts. These are visible and managed in the Attributes view on the Dashboard.

Hull Attributes view

Supported Attribute values include:

  • Strings
  • Array of strings
  • Numeric
  • Booleans
  • Dates (ISO-8601 formatted strings or UNIX timestamps)
  • Nested JSON Object (experimental support, use carefully)

Attributes types are detected when Attributes are added to the Schema and can depend on the name of the Attributes or the first value captured:

  • If the Attribute name ends with _at or _date, the Attribute will be typed as a Date
  • Otherwise the type will set by the first value captured (String, Numeric or Boolean)
  • Nested objects are not supported and will be silently ignored

Attribute names are lowercased in the ingestion step to make them case insensitive.

The following characters are not allowed (and are silently ignored) in Attribute names: . and $.

Attributes casting

Incoming data is cast into Attribute according to the Schema for that type.

If casting is not possible because the original value captured is not compatible, it will result in a null value.

Updating Attributes

Values can write new Attributes or update existing Attributes. Values can be formatted as atomic operations to be applied to existing Attributes.

Operators Example
setIfNull { "foo" : { "operation" : "setIfNull", "value" : "bar" } }
inc { "foo" : { "operation" : "inc", "value" : 100 } }
dec { "foo" : { "operation" : "dec", "value" : 100 } }
set { "foo" : { "operation" : "set", "value" : "bar" } }

Grouping Attributes

Attributes are recorded in a flat key/values structure, Hull does not support complex nested objects.

Similar Attributes can be grouped with a common prefix delimited by / to be visually grouped. For example, all Salesforce Attributes.

{
    "type": "traits",
    "body": {
      "salesforce/id": "123",
      "salesforce/name": "Bobby Lapointe",
      "salesforce/type": "Contact"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
     },
    "timestamp": "2018-02-06T09:35:11.146Z"
}

JSON Attributes BETA

Hull supports capturing raw JSON objects and arrays with JSON objects for Users and Accounts. This enables you to:

  • capture raw data coming in from Incoming webhooks, Segment, the REST api, and various other connectors to transform it with the Processor to the top-level attributes that other services will be able to consume.
  • simplify data manipulation in the Processor thanks to ability to store JSON objects (see first example below)

Limitations

Currently nested JSON objects feature is in BETA and is a subject to few limitations listed below:

  • you can’t build Audience segments with these attributes, you can find them in the Attribute Selector but there are no filtering operations you can perform. The reason is that those deeply nested items aren’t indexed by our engine. If you need to rely on this data to build segments, we suggest to extract the items you need and write them as top-level attributes with the processor. You’ll probably need to do this to send this data out to services anyways.
  • we don’t support partial updates or complex operations. The full attribute will be replaced if you send it again, we can’t update only a subset of keys from a JSON object, or add new entries to an array. You can always use a Processor to perform complex updates (see second example below).
  • you can select these attributes to be sent to the destinations, but keep in mind that most services will either ignore them, flat out or reject the whole update.
  • if you want to store an array of json objects you need to make sure that the first time this attribute is received, the array has some JSON values in it. Sending an empty array will always result in storing it as an array of strings - our core format. Once the data format for a given attribute is established it cannot be changed. Valid Example : [{ foo: "bar"}]. Invalid example: []. In the latter case, the attribute will be detected as an array of strings

Examples

How to handle session data?


const {
  sessions = {},
  latest_session = {}
} = user;

// sessions = {
//   1234: { id: 1234, start: "foo", referrer: "https://google.com" },
//   4567: { id: 4567, start: "bar", referrer: "https://facebook.com" },
// }

// Add or update the session object
sessions[latest_session.id] = latest_session;

// Update it.
traits({ sessions });

// Write to top-level attributes as you need to send results out to services.
traits({
  session_count: _.keys(sessions).length,
  session_referrers: _.map(sessions, s => s.referrer);
});

How to update a JSON object?

As you know we don’t provide partial or atomic operations on JSON objects. You can always use a Processor to perform complex operations. See the example below:

// user 1234 before:
// {
//   external_id: 1234,
//   foo: ["a", "b"]
// }

hull.asUser({ external_id: 1234 }).traits({ foo: ["c"] });

// user 1234 after:
// {
//   external_id: 1234,
//   foo: ["c"]
// }


//  user 4567 before:
// {
//   external_id: 4567,
//   foo: { earth: "mars" }
// }

hull.asUser({ external_id: 4567 }).traits({ foo: { bar: "bat" } });

// user 4567 after:
// {
//   external_id: 4567,
//   foo: { bar: "bat" }
// }

If you want to update objects, you need to first capture the updates in one place, and use the Processor to generate an aggregated, manually merged object:

// For user 4567 changed:
const { foo } = changes.user;
// foo = [{ earth: "mars" }, { bar: "bat" }];

const merged_foo = Object.assign({}, foo[0], foo[1]);

// Don't merge it to `foo` or you'll get an infinite loop...
traits({ merged_foo });

Event data formats

Events are actions collected and associated to Users. Events are ingested by:

  • track events from the Firehose API
  • User Records from the Import API

Events have the following values:

  • Event name
  • Unique event_id
  • Properties

The body of each event contains the following entries:

Event The event name
event_id A unique ID to the Hull organization for each event
Properties Additional contextual data about the event

Event properties

Hull can capture unlimited event properties for each tracked Event.

Event properties are stored as a set of flat key values associated to the event. No schema is enforced.

Additional contextual data is associated with each event to the event:

source Defines a namespace (e.g. stripe)
type Defines a event type (e.g. email)
created_at Defines an event date. defaults to now()
ip Defines the Event’s IP. Set to null if you’re storing a server call, otherwise, geoIP will locate this event.
referrer Defines the Referrer. null for server calls.

Event Context

Every event in Hull has an additional object called context that has a fixed schema.

Tracking calls from the hull.js library will fill in some of this schema, and so will the Segment.com connector.

You can pass it as the third object in the Hull.track(name, properties, context) method. Here are the accepted fields:

{
  "useragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
  "device": {
    "name": "Other"
  },
  "referrer": {
    "url": "https://example.com/",
    "host": "example.com",
    "path": "/",
    "campaign": {
      "term": "funding news",
      "medium": "email",
      "name": "Funding announcement Newsletter",
      "content": "image link",
      "source": "Newsletter"
    }
  },
  "os": {
    "name": "Mac OS X",
    "version": "10.13.6"
  },
  "browser": {
    "major": 69,
    "name": "Chrome",
    "version": "69.0.3497"
  },
  "location": {
    "country": "FR",
    "city": "Paris",
    "timezone": "Europe/Paris",
    "longitude": 2.3833,
    "latitude": 48.9167,
    "region": "IDF",
    "countryname": "France",
    "regionname": "Île-de-France",
    "zipcode": "93300"
  },
  "campaign": {
    "term": "funding news",
    "medium": "email",
    "name": "Funding announcement Newsletter",
    "content": "image link",
    "source": "Newsletter"
  },
  "ip": "10.10.10.10",
  "page": {
    "url": "https://example.com/test",
    "host": "example.com",
    "path": "/884a12fc/overview"
  }
}

Identity Resolution

To associate ingested data with Entities (Users & Accounts), Hull operates on a set of identifiers.

Users and Account have different identity resolution strategies.

User Identity Resolution strategy

The following identifiers are used in order of priority to resolve User data:

Identifier Description Notes
id ID We do not recommend using this identifier since id‘s can be deleted at any moment when a User is deleted or merged
external_id Primary, unique, and most stable way to reference a User If external_id is used, then email and anonymous_id identifiers will be used to merge Users
email Email address Duplicate emails are allowed (some use cases require it), however these will be merged if there is no stronger identifier
anonymous_id Anonymous IDs (e.g. website visitors before signup) and aliases from 3rd party tools Designed to alias anonymous traffic when they later signup and identify themselves. This is also where the IDs from 3rd party tools (like HubSpot, Mailchimp, Intercom etc.) are stored.

At least one identifier MUST be present for the resolution step to return a User.

You can configure rules for how a valid external_id looks like in the Organization Settings. This is useful if some tools send invalid data, such as email in this field, or their own format.

User merging

Users without an external_id can be merged. These are marked as mergeable. Only Users that are explicitly mergeable can be merged.

If the resolution step results in the merging of two Users, the returned User is the recipient of the merge.

The merge operation is destructive and will:

  • Merge Attributes of the merged User to the recipient User of the merge.
  • Re-associate all identifiers and aliases
  • Re-parent and re-index all events
  • Emit a User merged event that captures the id and Attributes of the User being merged

If there are different values for the same Attribute, the recipient of the merge will keep its original value.

User and Account aliasing

anonymous_id and alias event is used to alias Users and Accounts. Alias is used to associate multiple identifiers with a single User or Account (e.g. IDs from many third party tools).

After an alias is added (visible as anonymous_id) it can be removed from an user or an account with unalias operation. When done it only removed given anonymous_id from an entity, but it does not revert the merge operation which was performed due to presence of this anonymous_id.

Account Identity Resolution strategy

The following identifiers are used in order of priority to resolve User data:

Identifier Description Notes
id Internal Hull identifier We do not recommend using this identifier since id’s can be deleted at any moment when an Account is deleted or merged
external_id Primary, unique, and most stable way to reference an Account
domain Website domain name Domain can be sourced from third party tools, or inferred from the email address (e.g. @intercom.com)

These free email domains (e.g. gmail.com) are ignored when passing domain claims.

You can configure additional rules for how a valid external_id looks like or which additional domains are rejected in the Organization Settings. This is useful if some tools send invalid data, such as dummy domains.

Resolving and linking Users and Accounts

Special tokens with both User and Account identifiers can be built to link a User to an Account. The link is declarative — the ingestion step will evaluate all the identifiers container in the token and always return a User and Account.

For example, to resolve an account that has the domain example.com and link both User and Account together.

{
  "io.hull.asUser": { "email": "hello@example.com" },
  "io.hull.asAccount": { "domain": "example.com" },
  "io.hull.subjectType": "account"
}

Account merging

As with Users, Accounts without an external_id can be merged. These are marked as mergeable internally. Only Accounts that are explicitly mergeable can be merged.

If the resolution step results in the merging of two Accounts, the resulting Account is the recipient of the merge.

The merge operation is destructive and will:

  • Merge Attributes of the merged Accounts to the recipient Account.
  • Re-associate all identifiers such as external_id, domain and anonymous_ids
  • Re-parent and re-index all users that belonged to these accounts

If there are different values for the same attribute, the recipient of the merge will keep its original value.

Example:

// Recipient account
{
  "id" : "1",
  "external_id": "123",
  "name": "Hull",
  "domain" : "hull.io",
  "created_at" : "2018-10-06T14:41:07Z"
}

// Account to be merged
{
  "id" : "2",
  "name": "Hull.io",
  "domain" : "hull.io",
  "created_at" : "2018-10-08T12:05:17Z",
  "hubspot/state" : "opportunity",
}


// Resulting Account
{
  "id" : "1",
  "external_id": "123",
  "name": "Hull",
  "domain" : "hull.io",
  "created_at" : "2018-10-06T14:41:07Z",
  "hubspot/state" : "opportunity"
}

User & Account Reports

The final step of ingestion of User & Account data is to prepare the User Report and Account Report. This is the record that will be indexed and made available for search and segmentation, and visible in the User Profile and Account Profile.

User Report Example

Here is an example of a User Report. It is formatted with:

  • Identifiers
  • Account
  • Identities
  • Sessions
  • Attributes
{
  // Root
  "id": "50cf040bb85d0c8031000001",
  "external_id": "123-456-789",
  "created_at": "2012-12-17T11:37:47Z",
  "email": "hello@hull.io",
  "domain": "hull.io",
  "name": "Stephane Hull",
  "last_name": "Hull",
  "first_name": "Hello",
  "address_city": "Paris",
  "address_state": "Ile-de-France",
  "address_country": "France",
  "accepts_marketing": false,
  "is_approved": true,
  "has_password": true,
  "anonymous_ids": ["310dd12c-a1f3-2dee-54c2-c12426d1367b"],
  "segment_ids": ["56a7902c8d371442330000ee", "595296234d8debfb330026a0"],
  "sign_up_url": "https://accounts.hullapp.io/",

  // Account
  "account": {
    "id": "5a04118fea0662ec4b0471fb",
    "domain": "hull.io",
    "clearbit/name": "hull",
    "created_at": "2017-11-09T08:27:59Z",
    "updated_at": "2017-11-20T20:29:05Z"
  },

  // Identities
  "identities_count":1,
  "main_identity": "github",
  "github_connected_at": "2012-12-17T11:37:47Z",
  "github_id": "42",
  "github_username": "hull",
  "google_connected_at": "2014-03-06T13:10:12Z",
  "google_id": "117779341887200000000",

  // Sessions
  "last_seen_at": "2018-02-11T14:22:47Z",
  "first_seen_at": "2016-09-07T08:30:02Z",
  "first_session_started_at": "2016-09-07T08:30:02Z",
  "first_session_platform_id": "53175bb2635c78c8790032cd",
  "first_session_initial_url": "https://www.hull.io/features",
  "first_session_initial_referrer": "https://www.google.com/",
  "signup_session_started_at": "2016-09-07T08:30:02Z",
  "signup_session_platform_id": "53175bb2635c78c8790032cd",
  "signup_session_initial_url": "https://www.hull.io/features",
  "signup_session_initial_referrer": "https://www.google.com",
  "latest_session_started_at": "2018-02-11T14:19:26Z",
  "latest_session_platform_id": "53175bb2635c78c8790032cd",
  "latest_session_initial_url": "https://dashboard.hullapp.io/",
  "latest_session_initial_referrer": "",

  // Attributes
  "traits_request_demo": true,
  "traits_company_name": "hull.io",
  "traits_nps_rating": 10,
  "traits_nps_score": 100,
  "traits_intercom_email": "hello@hull.io",
  "traits_salesforce_lead/status": "New",
  "traits_salesforce_lead/owner_id": "00546000001HobtAAC",
  "traits_salesforce_lead/company": "hull.io"
}

Account Report Example

Here is an example of a Account Report payload.

Note that when an Account Report is built, this also schedules a rebuild of all associated User Reports.

{
    "id": "5a04118fea0662ec4b0471fb",
    "domain": "Hull",
    "domain": "hull.io",
    "clearbit/name": "hull",
    "created_at": "2017-11-09T08:27:59Z",
    "updated_at": "2017-11-20T20:29:05Z"
}

Logs for Incoming Notifications

You can view and query incoming data from Connectors to Hull in the logs.

Incoming data are logged with the incoming.{entity}.{status} format. These are visible in the Logs view on the Dashboard, or Logs view within each Connector’s page (for viewing logs for a specific Connector only).

Log Type Description
incoming.user.success User Attributes have been updated successfully
incoming.user.error User Attributes have not been updated successfully
incoming.user.skip User update was skipped
incoming.account.success Account Attributes have been updated successfully
incoming.account.error Account Attributes have not been updated successfully
incoming.account.skip Account update was skipped
incoming.event.success Event was successfully ingested
incoming.event.error Event was not successfully ingested
incoming.event.skip Event was skipped

Identifiers for Incoming Logs

All logs feature an identifier to associate Users and Accounts with the logs. Learn more about identifiers and identity resolution on Hull. For outgoing notifications these include:

Identifier Description
user_id Hull User ID
user_email User email
user_external_id External ID on the User
user_anonymous_id Identifier from the external service or anonymous ID from web sessions
account_id ID on the Account
account_external_id External ID on the Account
account_domain Account domain

Connector logs

All data sent out through Connectors is logged and queryable by:

Connector identifier Description
connector_name Reference name of the connector (ex. salesforce or processor)
connector_id ID of the connector