Ingest


The first stage of the data lifecycle is to ingest data. Ingestion has four stages:

  1. Data capture
  2. Data formatting
  3. Identity resolution
  4. Publishing User and Account Report

Data ingestion in Hull

The Hull Platform captures data through two APIs:

  • Firehose API designed for streaming updates (e.g. from connectors)
  • Import API designed for bulk updates (e.g. SQL Importer jobs)

Captured data is then cast into Traits and Events. Traits and Events are associated with Entities (Users & Accounts) according to claims. Updates are then published as User and Account Reports.

Firehose API

All connectors sync data to Hull through the Firehose API. This is exposed as a single endpoint on firehose.hullapp.io.

Messages

Each message in the batch represents Firehose::Event to ingest. It is composed of:

Message Description
Type traits, track or alias
Body Payload of the message to ingest
Headers Context to the message, including Hull-Access-Token which contains a JWT with a set of claims signed with the Application secret. Those claims are the data points used in identity resolution
Timestamp Local timestamp of the client generating the payload

Here is an example payload for the Firehose API.

POST https://firehose.hullapp.io/
> Hull-Organization: example.hullapp.io
> Hull-App-Id: 5a7962e2793d3e942a03b29a
> Hull-Access-Token: 123456789
> Content-Type: application/json
{
  "batch": [{
    "type": "traits",
    "body": {
      "email": "bob@bob.com",
      "name": "Bobby Lapointe",
      "country": "France",
      "city": "Paris"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.146Z"
  }, {
    "type": "track",
    "body": {
      "ip": "1.2.3.4",
      "url": "https://www.hull.io/docs",
      "referrer": "https://www.google.com",
      "event_id""8fbebdde-68e3-43cf-ae45-0edc8057ab8e",
      "properties": { "name": "iPhone" },
      "event": "Viewed Product"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.574Z"
  }],
  "timestamp": "2018-02-06T09:35:12.576Z",
  "sentAt": "2018-02-06T09:35:12.576Z"
}

Messaging Lanes (including Fastlane)

Unless otherwise marked, all messages will be processed first in, first out.

For some use cases (such as real-time web personalization), ingestion needs be accelerated. To enable this, Hull can accelerate ingestion and computation for marked Users. Users marked as active enter FastLane for 10 minutes.

Notifications of Events (including Attributes Changed) for all active Users will be prioritized ahead of of other User types. Note: end-to-end data flow speed may depend on the API limits of the external service.

Import API

Bulk updates can be imported into your Hull Organization through the Import API. Data imports are processed separately to the Firehose API to minimize the impact on ingestion of live data.

All data to import must be:

See our Import API reference documentation

Importing Users

Create and update Users in bulk through the Import API. Every imported record MUST include at least a valid email or userId identifier to associate with a User. anonymous_id is not supported via the Import API).

User-Account associations can be specified by adding an accountId identifier in the record.

{
  "userId": "123",
  "accountId": "456",
  "traits": {
    "email" : "john@coltrane.com",
    "name" : "John Coltrane"
  }
}

Importing Accounts

Create and update Accounts in bulk through the Import API. Every imported record MUST include an accountId identifier.

{
  "accountId": "111",
  "traits" : {
    "domain" : "hull.io",
    "name" : "Hull"
  }
}

Import User Events

You can update User Events in bulk through the Import API. This performs a Users lookup before importing the event, to create or update a User.

{
  "userId":"12453",
  "timestamp":"2018-04-16T00:00:28.000Z",
  "event": "User Registered",
  "eventId":"1754752",
  "properties":{
    "email": "ed@hull.io",
    "plan": "business",
    "price": 129.00
  }
}

Trait Data Formats

Traits are properties collected and associated to Entities (Users and Accounts). Traits are ingested by:

  • traits events from the Firehose API
  • User or Account records from the Import API

Trait types & type detection

Hull discovers new Traits as new data is ingested and builds a Schema of known Traits for Users and Accounts. These are visible and managed in the Attributes view on the Dashboard.

Hull Attributes view

Supported trait values include:

  • Strings
  • Array of strings
  • Numeric
  • Booleans
  • Dates (ISO-8601 formatted strings or UNIX timestamps)

Nested objects are ignored.

Traits types are detected when Traits are added to the Schema and can depend on the name of the Trait or the first value captured:

  • If the trait name ends with _at or _date, the Trait will be typed as a Date
  • Otherwise the type will set by the first value captured (String, Numeric or Boolean)
  • Nested objects are not supported and will be silently ignored

Traits names are lowercased in the ingestion step to make them case insensitive.

The following characters are not allowed (and are silently ignored) in Trait names: . and $.

Traits casting

Incoming data is cast into Trait according to the Schema for that type.

If casting is not possible because the original value captured is not compatible, it will result in a null value.

Updating Traits

Values can write new Traits or update existing Traits. Values can be formatted as atomic operations to be applied to existing traits.

Operators Example
setIfNull { "foo" : { "operation" : "setIfNull", "value" : "bar" } }
inc { "foo" : { "operation" : "inc", "value" : 100 } }
dec { "foo" : { "operation" : "dec", "value" : 100 } }
set { "foo" : { "operation" : "set", "value" : "bar" } }

Grouping Traits

Traits are recorded in a flat key/values structure, Hull does not support complex nested objects.

Similar traits can be grouped with a common prefix delimited by / to be visually grouped. For example, all Salesforce traits.

{
    "type": "traits",
    "body": {
      "salesforce/id": "123",
      "salesforce/name": "Bobby Lapointe",
      "salesforce/type": "Contact"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
     },
    "timestamp": "2018-02-06T09:35:11.146Z"
}

Event data formats

Events are actions collected and associated to Users. Events are ingested by:

  • track events from the Firehose API
  • User Records from the Import API

Events have the following values:

  • Event name
  • Unique event_id
  • Properties

The body of each event contains the following entries:

Event The event name
event_id A unique ID to the Hull organization for each event
Properties Additional contextual data about the event

Event properties

Hull can capture unlimited event properties for each tracked Event.

Event properties are stored as a set of flat key values associated to the event. No schema is enforced.

Additional contextual data is associated with each event to the event:

source Defines a namespace (e.g. stripe)
type Defines a event type (e.g. email)
created_at Defines an event date. defaults to now()
ip Defines the Event’s IP. Set to null if you’re storing a server call, otherwise, geoIP will locate this event.
referrer Defines the Referrer. null for server calls.

Identity Resolution

To associate ingested data with Entities (Users & Accounts), Hull operates on a set of identifiers.

Users and Account have different identity resolution strategies.

User Identity Resolution strategy

The following identifiers are used in order of priority to resolve User data:

Identifier Description Notes
id ID We do not recommend using this identifier since id‘s can be deleted at any moment when a User is deleted or merged
external_id Primary, unique, and most stable way to reference a User If external_id is used, then email and anonymous_id identifiers will be used to merge Users
email Email address Duplicate emails are allowed (some use cases require it), however these will be merged if there is no stronger identifier
anonymous_id Anonymous IDs (e.g. website visitors before signup) and aliases from 3rd party tools Designed to alias anonymous traffic when they later signup and identify themselves. This is also where the IDs from 3rd party tools (like HubSpot, Mailchimp, Intercom etc.) are stored.

At least one identitfier MUST be present for the resolution step to return a User.

User merging

Users without an external_id can be merged. These are marked as mergeable. Only Users that are explicitly mergeable can be merged.

If the resolution step results in the merging of two Users, the returned User is the recipient of the merge.

The merge operation is destructive and will:

  • Merge Traits of the merged User to the recipient User of the merge.
  • Re-associate all identifiers and aliases
  • Re-parent and re-index all events
  • Emit a User merged event that captures the id and Traits of the User being merged

If there are different values for the same Trait, the recipient of the merge will keep its original value.

User aliasing

anonymous_id and alias event is used to alias Users. Alias is used to associate multiple identifiers with a single User (e.g. IDs from many third party tools).

Account Identity Resolution strategy

The following identifiers are used in order of priority to resolve User data:

Identifier Description Notes
id Internal Hull identifier We do not recommend using this identifier since id’s can be deleted at any moment when an Account is deleted or merged
external_id Primary, unique, and most stable way to reference an Account
domain Website domain name Domain can be sourced from third party tools, or inferred from the email address (e.g. @intercom.com)

Account merging and Account aliasing are not supported.

These free email domains (e.g. gmail.com) are ignored when passing domain claims.

Resolving and linking Users and Accounts

Special tokens with both User and Account identifiers can be built to link a User to an Account. The link is declarative — the ingestion step will evaluate all the identifiers container in the token and always return a User and Account.

For example, to resolve an account that has the domain example.com and link both User and Account together.

{
  "io.hull.asUser": { "email": "hello@example.com" },
  "io.hull.asAccount": { "domain": "example.com" },
  "io.hull.subjectType": "account"
}

User & Account Reports

The final step of ingestion of User & Account data is to prepare the User Report and Account Report. This is the record that will be indexed and made available for search and segmentation, and visible in the User Profile and Account Profile.

User Report Example

Here is an example of a User Report. It is formatted with:

  • Identifiers
  • Account
  • Identities
  • Sessions
  • Traits
{
  // Root
  "id": "50cf040bb85d0c8031000001",
  "external_id": "123-456-789",
  "created_at": "2012-12-17T11:37:47Z",
  "email": "hello@hull.io",
  "domain": "hull.io",
  "name": "Stephane Hull",
  "last_name": "Hull",
  "first_name": "Hello",
  "address_city": "Paris",
  "address_state": "Ile-de-France",
  "address_country": "France",
  "accepts_marketing": false,
  "is_approved": true,
  "has_password": true,
  "anonymous_ids": ["310dd12c-a1f3-2dee-54c2-c12426d1367b"],
  "segment_ids": ["56a7902c8d371442330000ee", "595296234d8debfb330026a0"],
  "sign_up_url": "https://accounts.hullapp.io/",

  // Account
  "account": {
    "id": "5a04118fea0662ec4b0471fb",
    "domain": "hull.io",
    "clearbit/name": "hull",
    "created_at": "2017-11-09T08:27:59Z",
    "updated_at": "2017-11-20T20:29:05Z"
  },

  // Identities
  "identities_count":1,
  "main_identity": "github",
  "github_connected_at": "2012-12-17T11:37:47Z",
  "github_id": "42",
  "github_username": "hull",
  "google_connected_at": "2014-03-06T13:10:12Z",
  "google_id": "117779341887200000000",

  // Sessions
  "last_seen_at": "2018-02-11T14:22:47Z",
  "first_seen_at": "2016-09-07T08:30:02Z",
  "first_session_started_at": "2016-09-07T08:30:02Z",
  "first_session_platform_id": "53175bb2635c78c8790032cd",
  "first_session_initial_url": "https://www.hull.io/features",
  "first_session_initial_referrer": "https://www.google.com/",
  "signup_session_started_at": "2016-09-07T08:30:02Z",
  "signup_session_platform_id": "53175bb2635c78c8790032cd",
  "signup_session_initial_url": "https://www.hull.io/features",
  "signup_session_initial_referrer": "https://www.google.com",
  "latest_session_started_at": "2018-02-11T14:19:26Z",
  "latest_session_platform_id": "53175bb2635c78c8790032cd",
  "latest_session_initial_url": "https://dashboard.hullapp.io/",
  "latest_session_initial_referrer": "",

  // Traits
  "traits_request_demo": true,
  "traits_company_name": "hull.io",
  "traits_nps_rating": 10,
  "traits_nps_score": 100,
  "traits_intercom_email": "hello@hull.io",
  "traits_salesforce_lead/status": "New",
  "traits_salesforce_lead/owner_id": "00546000001HobtAAC",
  "traits_salesforce_lead/company": "hull.io"
}

Account Report Example

Here is an example of a Account Report payload.

Note that when an Account Report is built, this also schedules a rebuild of all associated User Reports.

{
    "id": "5a04118fea0662ec4b0471fb",
    "domain": "Hull",
    "domain": "hull.io",
    "clearbit/name": "hull",
    "created_at": "2017-11-09T08:27:59Z",
    "updated_at": "2017-11-20T20:29:05Z"
}

Logs for Incoming Notifications

You can view and query incoming data from Connectors to Hull in the logs.

Incoming data are logged with the incoming.{entity}.{status} format. These are visible in the Logs view on the Dashboard, or Logs view within each Connector’s page (for viewing logs for a specific Connector only).

Log Type Description
incoming.user.success User Traits have been updated successfully
incoming.user.error User Traits have not been updated successfully
incoming.user.skip User update was skipped
incoming.account.success Account Traits have been updated successfully
incoming.account.error Account Traits have not been updated successfully
incoming.account.skip Account update was skipped
incoming.event.success Event was successfully ingested
incoming.event.error Event was not successfully ingested
incoming.event.skip Event was skipped

Identifiers for Incoming Logs

All logs feature an identifier to associate Users and Accounts with the logs. Learn more about identifiers and identity resolution on Hull. For outgoing notifications these include:

Identifier Description
user_id Hull User ID
user_email User email
user_external_id External ID on the User
user_anonymous_id Identifier from the external service or anonymous ID from web sessions
account_id ID on the Account
account_external_id External ID on the Account
account_domain Account domain

Connector logs

All data sent out through Connectors is logged and queryable by:

Connector identifier Description
connector_name Reference name of the connector (ex. salesforce or processor)
connector_id ID of the connector