Ingest


The Firehose

The Firehose API exposed as a single endpoint on firehose.hullapp.io. It is only responsible for capturing incoming traffic mainly coming from connectors.

  POST https://firehose.hullapp.io/
  > Hull-Organization: example.hullapp.io
  > Hull-App-Id: 5a7962e2793d3e942a03b29a
  > Hull-Access-Token: 123456789
  > Content-Type: application/json
  {
    "batch": [{
      "type": "traits",
      "body": {
        "email": "bob@bob.com",
        "name": "Bobby Lapointe",
        "country": "France",
        "city": "Paris"
      },
      "headers": {
        "Hull-Access-Token": "eyJ0eXA..."
      },
      "timestamp": "2018-02-06T09:35:11.146Z"
    }, {
      "type": "track",
      "body": {
        "ip": "1.2.3.4",
        "url": "https://www.hull.io/docs",
        "referrer": "https://www.google.com",
        "event_id": "8fbebdde-68e3-43cf-ae45-0edc8057ab8e",
        "properties": { "name": "iPhone" },
        "event": "Viewed Product"
      },
      "headers": {
        "Hull-Access-Token": "eyJ0eXA..."
      },
      "timestamp": "2018-02-06T09:35:11.574Z"
    }],
    "timestamp": "2018-02-06T09:35:12.576Z",
    "sentAt": "2018-02-06T09:35:12.576Z"
  }

Each message in the batch represents Firehose::Event to ingest.

It is composed of

  • type which can be either traits, track or alias
  • body which contains the payload of the message to ingest
  • headers contain headers that are contextual to the message, where Hull-Access-Token contains a JWT with a set of claims signed with the Application secret. Those claims are the data points used in the identity resolution.
  • timestamp is the local timestamp of the client generating the payload

Message lanes Messages can also decorated with a special marker that marks the resolved user as active and puts it for the next 10 minutes in the FastLane.

There are 2 ways to mark a message as active

  • via a dedicated io.hull.active = true claim in the signed token
  • by adding active = true to the payload of a track event

All following messages in the next 10 minutes (or as long as the user is marked as active) will be sent to connectors in a separate lane which is treated with a greater priority than the standard lane messages.

traits event

Firehose events of type traits are used to set properties on an Entity (User or Account).

The body contains a set of key values that will be applied as traits on the Entity once resolved.

Keys are ignored if they does not match the following KEY_REGEXP = /[**\$\.**]/

Values supported are - Strings - Numeric - Array of strings - Booleans - Dates, either as ISO-8601 formatted strings or UNIX timestamps

Nested objects are ignored.

  • values can be formatted as atomic operations to be applied to existing traits
    • setIfNull { "foo" : { "operation" : "setIfNull", "value" : "bar" } }
    • inc { "foo" : { "operation" : "inc", "value" : 100 } }
    • dec { "foo" : { "operation" : "dec", "value" : 100 } }
    • set { "foo" : { "operation" : "set", "value" : "bar" } }

Group traits

Traits are recorded in a flat key/values structure, Hull does not support complex nested object.

However, by convention, keys that share a common prefix delimited by / will be visually grouped as set of attributes in the dashboard.

example:

{
    "type": "traits",
    "body": {
      "salesforce/id": "123",
      "salesforce/name": "Bobby Lapointe",
      "salesforce/type": "Contact"
    },
    "headers": {
      "Hull-Access-Token": "eyJ0eXA..."
    },
    "timestamp": "2018-02-06T09:35:11.146Z"
}

track event

Firehose events of type track are used to record Events on a User (Account events are currently not supported yet)

The body contains the following entries:

  • event: the event name
  • event_id: a unique event_id. unique across the whole organization. Can be used to ensure idempotent tracking or imports.
  • properties: a set of flat key values associated to the event. No schema is enforced. Large cardinalities can be an issue though because the schema is discovered along the way and used in the dashboard to suggest existing values.

additional supported contextual information to the event:

  • source: Defines a namespace, such as zendesk, mailchimp, stripe
  • type: Define a event type, such as mail, ticket, payment
  • created_at: Define an event date. defaults to now()
  • ip: Define the Event’s IP. Set to null if you’re storing a server call, otherwise, geoIP will locate this event.
  • referrer: Define the Referer. null for server calls.

example:

{
  "type" : "track",
  "body" : {
    "ip": "1.2.3.4",
    "url": "https://www.hull.io/docs/",
    "source": "track",
    "created_at": "2018-04-16T12:21:22+00:00",
    "referer": "https://www.hull.io",
    "active": true,
    "event_id": "76bed736-b7ff-4ee8-a509-b400baa9f32d",
    "properties": {
        "path": "/docs/",
        "referrer": "https://www.hull.io",
        "search": "",
        "title": "Hull Docs",
        "url": "https://stootie.com/stoot/mission/menage-666842"
    },
    "event": "page"
  },
  "headers": { "Hull-Access-Token": "eyJ0eXA..." },
  "timestamp": "2018-04-16T12:21:22+00:00"
}

or

{
  "type" : "track",
  "body" : {
    "ip": null,
    "url": null,
    "referer": null,
    "source": "mailchimp",
    "event_id": "84ecd50ffab30e0fbc7984adf7a0fede",
    "created_at": "2018-04-16T12:21:22+00:00",
    "properties": {
        "campaign_name": "Hello docs",
        "list_id": "39cc4f20ca",
        "list_name": "HULL MASTER LIST",
        "ip": "0"
    },
    "event": "Email Opened"
  },
  "headers": { "Hull-Access-Token": "eyJ0eXA..." },
  "timestamp": "2018-04-16T12:21:22+00:00"
}

alias event

Firehose events of type alias are used to explicitly alias a User to a specified string (Account aliasing is currently not supported yet)

{
    "type": "alias",
    "body": { "anonymous_id": "123-456-789" },
    "headers": { "Hull-Access-Token": "eyJ0eXA..." },
    "timestamp": "2018-02-06T09:35:11.574Z"
}

Import data API

The Import API provides a more efficient way to import large volumes of data into your Hull Organization. Data imports jobs are processed in the background, to minimise the impact on the ingestion of live traffic coming from Connectors.

To start an Import, your data must be accessible via HTTP and formatted as a stream of JSON objects, each line representing a record. See here a reference of the options available to create Import Jobs (TODO: link to reference API docs here).

Importing Users

Users import means the import of User Traits. Each line in the data file MUST include at least a valid email or userId (anonymous_id resolution is not supported in Imports). The resolution rules and priority are similar the ones describe in the following section on Identity resolution. Imported records will either update existing records or create new ones of no record match the identifiers provided.

User <> Account links can also be specified via by adding accountId identifier in the Record.

example data :

{ "userId": "123", "accountId": "456", "traits": { "email" : "john@coltrane.com", "name" : "John Coltrane" } }
{ "userId": "235", "accountId": "216", "traits": { "email" : "miles@davis.com", "name" : "Miles Davis" } }

Importing Accounts

Accounts imports are very similar to Users import. The accountId identifier is mandatory.

example data :

{ "accountId": "111", "traits" : { "domain" : "hull.io", "name" : "Hull" } }
{ "accountId": "126", "traits" : { "domain" : "example.com", "name" : "Acme Corporation" } }

Importing Events

Events imports is currently only available for User Events. The import jobs performs a Users lookup before importing the Events making sure that Users with the proper identifiers are created before the Events are imported.

example data :

{"userId":"123","timestamp":"2018-04-16T00:00:28.000Z","event":"Completed  Order","eventId":"1754752","properties":{"productName":"Trumpet", "price" : 459.00, "quantity" : 12 }}
{"userId":"235","timestamp":"2018-04-16T00:01:08.000Z","event":"Completed  Order","eventId":"1754753","properties":{"productName":"Saxophone", "price" : 699.00, "quantity" : 1 }}

How Hull resolves Identities (Users & Accounts)

Identity resolution is the first step of the ingestion pipeline and operate on a set of claims that identify Entities. Entities can be User, Account and User/Account links.

Resolving Users

Users resolution operates on the following identifiers and in the following order of priority :

  • id internal hull identifier - we generally recommend not to use this claim because ids can be deleted at any moment when a User is deleted or merged.
  • external_id guaranty of uniqueness across the organization this is the primary and most stable way to reference a user. If the external_id is used the email and anonymous_id claims are used to detect potential candidates for merging.
  • email duplicate emails are allowed but the resolution step tries hard to merge users that have the same email.
  • anonymous_id an anonymous_id or guest_id can be associated with one user only. It is mainly used as the mechanism to alias anonymous traffic to explicit identities when an anonymous user logs in org gives his email address.

At least one claim MUST be present in order for the resolution step to return a user. If the set of claims is valid, the revolution will ALWAYS return a matching user, whether by finding the best match of by creating a new User that matches the claims.

The resolution step MIGHT result in a merging of 2 users. In that case the returned User is the recipient of the merge.

Guest Users

Guest users are a historical feature of Hull that allowed the tracking of traits and events to an anonymous identity identified by a browser cookie (browser_id or guest_id) before a real signup. This feature is enabled via an [“allow_guests” setting on the organization] which is also used in the Users resolution strategy to determine if the creation of users only identified by an anonymous_id is allowed or not.

Merging Users

Only users that are explicitely mergeable can be merged.

Mergeable users are users that are not “Real users” using the Identity management part of Hull (whether via social login or traditional email/password login) or users that have a external_id set are NOT mergeable. They are generally User objects for which you only have partial information like anonymous browsing sessions and that reveal their identity at one point (via a form submission for example).

The merge operation is destructive, and will

  • merge attributes of the merged user to the recipient of the merge, only writing traits that are empty
  • re-associate all aliases
  • re-parent and reindex all events
  • Emit a “User merged” event that captures the id and list of attributes of the user being merged

User aliasing

  • An alias is an identifier that can be associated to a User.
  • A given alias can only belongs to one User.
  • A User can have multiple aliases.

Aliases can be attached to a user via

  • an explicit anonymous_id claim in a user token
  • an alias event send through the firehose

A call to UserAlias trying to associate an alias to a User will

  • create the entry if the alias was never claimed by another user
  • do nothing if the alias was already associated to the user
  • merge the 2 users if one of them is mergeable
  • be ignored if none of the 2 users is mergeable

Resolving Accounts

Accounts resolution is very similar to Users resolution. It supports the following claims :

  • id Hull id, same as for users, use with caution
  • external_id which has the same behaviour as the User’s external_id
  • domain which behaves pretty much as the email for Users

Account merging is not supported.

A list of free email domains are ignored when passed a domain claims to avoid creating non relevant accounts with freemail domains.

Resolving and linking Users and Accounts

Special tokens with both user and account claims can be sent to the Firehose to link a User to an Account. The link, as for the different resolution strategies used here is purely declarative. The ingestion step will evaluate the full claims and will always return a User and Account, linked or not in a predictable way.

example claims linking a User to an Account:

{
    "io.hull.asUser": { "email": "hello@example.com" },
    "io.hull.asAccount": { "domain": "example.com" },
    "io.hull.subjectType": "account"
}

This will resolve to an account that has the domain example.com and link both User and Accounts together.

Traits Casting, supported data formats

Traits are properties collected and attached to Entities (Users and Accounts). Hull discovers new Traits as new data is collected and builds a Schema of known Traits for Users and Accounts.

Traits types and type detection

Traits types are detected when Traits are added to the Schema and can depend on the name of the Trait or the first value captured: If its name ends with _at or _date, the Trait will be typed as a Date, otherwise the type will set by the first value captured (String, Numeric or Boolean). Nested objects are not supported and will be silently ignored.

Traits names are lowercased in the ingestion step to make them case insensitive. The following characters are not allowed in Trait names . and $.

Traits casting

Traits are casted to the type determined in the Schema, if casting is not possible because the original value captured is not compatible, it will result in a null value.

UserReport and preparing records for Search and Segmentation

The Report building step is a step to prepare the record that will be indexed and available for Search and Segmentation.

This is an example of a User Report, built from known attributes, identities, Sessions data and Traits for Users :

{
  // Root
  "id": "50cf040bb85d0c8031000001",
  "external_id": "123-456-789",
  "created_at": "2012-12-17T11:37:47Z",
  "email": "stephane@hull.io",
  "domain": "hull.io",
  "name": "Stephane Hull",
  "last_name": "Hull",
  "first_name": "Stephane",
  "address_city": "Paris",
  "address_state": "Ile-de-France",
  "address_country": "France",
  "accepts_marketing": false,
  "is_approved": true,
  "has_password": true,
  "anonymous_ids": ["310dd12c-a1f3-2dee-54c2-c12426d1367b"],
  "segment_ids": ["56a7902c8d371442330000ee", "595296234d8debfb330026a0"],
  "sign_up_url": "https://accounts.hullapp.io/",

  // Account
  "account": {
    "id": "5a04118fea0662ec4b0471fb",
    "domain": "hull.io",
    "clearbit/name": "hull",
    "created_at": "2017-11-09T08:27:59Z",
    "updated_at": "2017-11-20T20:29:05Z"
  },

  // Identities
  "identities_count":1,
  "main_identity": "github",
  "github_connected_at": "2012-12-17T11:37:47Z",
  "github_id": "4250",
  "github_username": "sbellity",
  "google_connected_at": "2014-03-06T13:10:12Z",
  "google_id": "117779341887286898992",

  // Sessions
  "last_seen_at": "2018-02-11T14:22:47Z",
  "first_seen_at": "2016-09-07T08:30:02Z",
  "first_session_started_at": "2016-09-07T08:30:02Z",
  "first_session_platform_id": "53175bb2635c78c8790032cd",
  "first_session_initial_url": "https://www.hull.io/features",
  "first_session_initial_referrer": "https://www.google.com/",
  "signup_session_started_at": "2016-09-07T08:30:02Z",
  "signup_session_platform_id": "53175bb2635c78c8790032cd",
  "signup_session_initial_url": "https://www.hull.io/features",
  "signup_session_initial_referrer": "https://www.google.com",
  "latest_session_started_at": "2018-02-11T14:19:26Z",
  "latest_session_platform_id": "53175bb2635c78c8790032cd",
  "latest_session_initial_url": "https://dashboard.hullapp.io/",
  "latest_session_initial_referrer": "",

  // Traits
  "traits_request_demo": true,
  "traits_company_name": "hull.io",
  "traits_nps_rating": 10,
  "traits_nps_score": 100,
  "traits_intercom_email": "stephane@hull.io",
  "traits_salesforce_lead/status": "New",
  "traits_salesforce_lead/owner_id": "00546000001HobtAAC",
  "traits_salesforce_lead/company": "hull.io"
}

AccountReport

{
    "id": "5a04118fea0662ec4b0471fb",
    "domain": "hull.io",
    "clearbit/name": "hull",
    "created_at": "2017-11-09T08:27:59Z",
    "updated_at": "2017-11-20T20:29:05Z"
}

When an AccountReport is built we also schedule the rebuild of UserReports for all the users that belong to that account to propagate the changes and denormalise them in the corresponding UserReports.

This automatic propagation can cause huge work amplification when an Account has thousands of users.