The first stage of the data lifecycle is to ingest data. Ingestion has four stages:
The Hull Platform captures data through two APIs:
Captured data is then cast into Attributes and Events. Attributes and Events are associated with Entities (Users & Accounts) according to claims. Updates are then published as User and Account Reports.
All connectors sync data to Hull through the Firehose API. This is exposed as a single endpoint on firehose.hullapp.io
.
Each message in the batch represents Firehose::Event to ingest. It is composed of:
Message | Description |
---|---|
Type | traits, track, alias or unalias |
Body | Payload of the message to ingest |
Headers | Context to the message, including Hull-Access-Token which contains a JWT with a set of claims signed with the Application secret. Those claims are the data points used in identity resolution |
Timestamp | Local timestamp of the client generating the payload |
Here is an example payload for the Firehose API.
POST https://firehose.hullapp.io/
> Hull-Organization: example.hullapp.io
> Hull-App-Id: 5a7962e2793d3e942a03b29a
> Hull-Access-Token: 123456789
> Content-Type: application/json
{
"batch": [{
"type": "traits",
"body": {
"email": "bob@bob.com",
"name": "Bobby Lapointe",
"country": "France",
"city": "Paris"
},
"headers": {
"Hull-Access-Token": "eyJ0eXA..."
},
"timestamp": "2018-02-06T09:35:11.146Z"
}, {
"type": "track",
"body": {
"ip": "1.2.3.4",
"url": "https://www.hull.io/docs",
"referrer": "https://www.google.com",
"event_id""8fbebdde-68e3-43cf-ae45-0edc8057ab8e",
"properties": { "name": "iPhone" },
"event": "Viewed Product"
},
"headers": {
"Hull-Access-Token": "eyJ0eXA..."
},
"timestamp": "2018-02-06T09:35:11.574Z"
}, {
"type": "alias",
"body": {
"anonymous_id": "1234"
},
"headers": {
"Hull-Access-Token": "eyJ0eXA..."
},
"timestamp": "2018-02-06T09:35:11.574Z"
}, {
"type": "unalias",
"body": {
"anonymous_id": "5678"
},
"headers": {
"Hull-Access-Token": "eyJ0eXA..."
},
"timestamp": "2018-02-06T09:35:11.574Z"
}],
"timestamp": "2018-02-06T09:35:12.576Z",
"sentAt": "2018-02-06T09:35:12.576Z"
}
Unless otherwise marked, all messages will be processed first in, first out.
For some use cases (such as real-time web personalization), ingestion needs be accelerated. To enable this, Hull can accelerate ingestion and computation for marked Users. Users marked as active
enter FastLane for 10 minutes.
Notifications of Events (including Attributes Changed
) for all active
Users will be prioritized ahead of of other User types. Note: end-to-end data flow speed may depend on the API limits of the external service.
Bulk updates can be imported into your Hull Organization through the Import API. Data imports are processed separately to the Firehose API to minimize the impact on ingestion of live data.
All data to import must be:
See our Import API reference documentation
Create and update Users in bulk through the Import API. Every imported record MUST include at least a valid email
or userId
identifier to associate with a User. anonymous_id
is not supported via the Import API).
User-Account associations can be specified by adding an accountId
identifier in the record.
{
"userId": "123",
"accountId": "456",
"traits": {
"email" : "john@coltrane.com",
"name" : "John Coltrane"
}
}
Create and update Accounts in bulk through the Import API. Every imported record MUST include an accountId
identifier.
{
"accountId": "111",
"traits" : {
"domain" : "hull.io",
"name" : "Hull"
}
}
You can update User Events in bulk through the Import API. This performs a Users lookup before importing the event, to create or update a User.
{
"userId":"12453",
"timestamp":"2018-04-16T00:00:28.000Z",
"event": "User Registered",
"eventId":"1754752",
"properties":{
"email": "ed@hull.io",
"plan": "business",
"price": 129.00
}
}
Attributes are properties collected and associated to Entities (Users and Accounts). Attributes are ingested by:
traits
events from the Firehose APIHull discovers new Attributes as new data is ingested and builds a Schema of known Attributes for Users and Accounts. These are visible and managed in the Attributes view on the Dashboard.
Supported Attribute values include:
Attributes types are detected when Attributes are added to the Schema and can depend on the name of the Attributes or the first value captured:
_at
or _date
, the Attribute will be typed as a DateAttribute names are lowercased in the ingestion step to make them case insensitive.
The following characters are not allowed (and are silently ignored) in Attribute names: .
and $
.
Incoming data is cast into Attribute according to the Schema for that type.
If casting is not possible because the original value captured is not compatible, it will result in a null value.
Values can write new Attributes or update existing Attributes. Values can be formatted as atomic operations to be applied to existing Attributes.
Operators | Example |
---|---|
setIfNull | { "foo" : { "operation" : "setIfNull", "value" : "bar" } } |
inc | { "foo" : { "operation" : "inc", "value" : 100 } } |
dec | { "foo" : { "operation" : "dec", "value" : 100 } } |
set | { "foo" : { "operation" : "set", "value" : "bar" } } |
Attributes are recorded in a flat key/values structure, Hull does not support complex nested objects.
Similar Attributes can be grouped with a common prefix delimited by /
to be visually grouped. For example, all Salesforce Attributes.
{
"type": "traits",
"body": {
"salesforce/id": "123",
"salesforce/name": "Bobby Lapointe",
"salesforce/type": "Contact"
},
"headers": {
"Hull-Access-Token": "eyJ0eXA..."
},
"timestamp": "2018-02-06T09:35:11.146Z"
}
BETA
Hull supports capturing raw JSON objects and arrays with JSON objects for Users and Accounts. This enables you to:
Currently nested JSON objects feature is in BETA and is a subject to few limitations listed below:
[{ foo: "bar"}]
. Invalid example: []
. In the latter case, the attribute will be detected as an array of stringsHow to handle session data?
const {
sessions = {},
latest_session = {}
} = user;
// sessions = {
// 1234: { id: 1234, start: "foo", referrer: "https://google.com" },
// 4567: { id: 4567, start: "bar", referrer: "https://facebook.com" },
// }
// Add or update the session object
sessions[latest_session.id] = latest_session;
// Update it.
traits({ sessions });
// Write to top-level attributes as you need to send results out to services.
traits({
session_count: _.keys(sessions).length,
session_referrers: _.map(sessions, s => s.referrer);
});
How to update a JSON object?
As you know we don’t provide partial or atomic operations on JSON objects. You can always use a Processor to perform complex operations. See the example below:
// user 1234 before:
// {
// external_id: 1234,
// foo: ["a", "b"]
// }
hull.asUser({ external_id: 1234 }).traits({ foo: ["c"] });
// user 1234 after:
// {
// external_id: 1234,
// foo: ["c"]
// }
// user 4567 before:
// {
// external_id: 4567,
// foo: { earth: "mars" }
// }
hull.asUser({ external_id: 4567 }).traits({ foo: { bar: "bat" } });
// user 4567 after:
// {
// external_id: 4567,
// foo: { bar: "bat" }
// }
If you want to update objects, you need to first capture the updates in one place, and use the Processor to generate an aggregated, manually merged object:
// For user 4567 changed:
const { foo } = changes.user;
// foo = [{ earth: "mars" }, { bar: "bat" }];
const merged_foo = Object.assign({}, foo[0], foo[1]);
// Don't merge it to `foo` or you'll get an infinite loop...
traits({ merged_foo });
Events are actions collected and associated to Users. Events are ingested by:
Events have the following values:
event_id
The body of each event contains the following entries:
Event | The event name |
---|---|
event_id | A unique ID to the Hull organization for each event |
Properties | Additional contextual data about the event |
Hull can capture unlimited event properties for each tracked Event.
Event properties are stored as a set of flat key values associated to the event. No schema is enforced.
Additional contextual data is associated with each event to the event:
source | Defines a namespace (e.g. stripe ) |
---|---|
type | Defines a event type (e.g. email ) |
created_at | Defines an event date. defaults to now() |
ip | Defines the Event’s IP. Set to null if you’re storing a server call, otherwise, geoIP will locate this event. |
referrer | Defines the Referrer. null for server calls. |
Every event in Hull has an additional object called context
that has a fixed schema.
Tracking calls from the hull.js
library will fill in some of this schema, and so will the Segment.com
connector.
You can pass it as the third object in the Hull.track(name, properties, context)
method. Here are the accepted fields:
{
"useragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
"device": {
"name": "Other"
},
"referrer": {
"url": "https://example.com/",
"host": "example.com",
"path": "/",
"campaign": {
"term": "funding news",
"medium": "email",
"name": "Funding announcement Newsletter",
"content": "image link",
"source": "Newsletter"
}
},
"os": {
"name": "Mac OS X",
"version": "10.13.6"
},
"browser": {
"major": 69,
"name": "Chrome",
"version": "69.0.3497"
},
"location": {
"country": "FR",
"city": "Paris",
"timezone": "Europe/Paris",
"longitude": 2.3833,
"latitude": 48.9167,
"region": "IDF",
"countryname": "France",
"regionname": "Île-de-France",
"zipcode": "93300"
},
"campaign": {
"term": "funding news",
"medium": "email",
"name": "Funding announcement Newsletter",
"content": "image link",
"source": "Newsletter"
},
"ip": "10.10.10.10",
"page": {
"url": "https://example.com/test",
"host": "example.com",
"path": "/884a12fc/overview"
}
}
To associate ingested data with Entities (Users & Accounts), Hull operates on a set of identifiers.
Users and Account have different identity resolution strategies.
The following identifiers are used in order of priority to resolve User data:
Identifier | Description | Notes |
---|---|---|
id | ID | We do not recommend using this identifier since id ‘s can be deleted at any moment when a User is deleted or merged |
external_id | Primary, unique, and most stable way to reference a User | If external_id is used, then email and anonymous_id identifiers will be used to merge Users |
email | Email address | Duplicate emails are allowed (some use cases require it), however these will be merged if there is no stronger identifier |
anonymous_id | Anonymous IDs (e.g. website visitors before signup) and aliases from 3rd party tools | Designed to alias anonymous traffic when they later signup and identify themselves. This is also where the IDs from 3rd party tools (like HubSpot, Mailchimp, Intercom etc.) are stored. |
At least one identifier MUST be present for the resolution step to return a User.
You can configure rules for how a valid external_id
looks like in the Organization Settings. This is useful if some tools send invalid data, such as email in this field, or their own format.
Users without an external_id
can be merged. These are marked as mergeable
. Only Users that are explicitly mergeable
can be merged.
If the resolution step results in the merging of two Users, the returned User is the recipient of the merge.
The merge operation is destructive and will:
User merged
event that captures the id
and Attributes of the User being mergedIf there are different values for the same Attribute, the recipient of the merge will keep its original value.
anonymous_id
and alias event
is used to alias Users and Accounts. Alias is used to associate multiple identifiers with a single User or Account (e.g. IDs from many third party tools).
After an alias is added (visible as anonymous_id
) it can be removed from an user or an account with unalias
operation. When done it only removed given anonymous_id
from an entity, but it does not revert the merge operation which was performed due to presence of this anonymous_id
.
The following identifiers are used in order of priority to resolve User data:
Identifier | Description | Notes |
---|---|---|
id | Internal Hull identifier | We do not recommend using this identifier since id ’s can be deleted at any moment when an Account is deleted or merged |
external_id | Primary, unique, and most stable way to reference an Account | |
domain | Website domain name | Domain can be sourced from third party tools, or inferred from the email address (e.g. @intercom.com) |
These free email domains (e.g. gmail.com) are ignored when passing domain
claims.
You can configure additional rules for how a valid external_id
looks like or which additional domains are rejected in the Organization Settings. This is useful if some tools send invalid data, such as dummy domains.
Special tokens with both User and Account identifiers can be built to link a User to an Account. The link is declarative — the ingestion step will evaluate all the identifiers container in the token and always return a User and Account.
For example, to resolve an account that has the domain example.com
and link both User and Account together.
{
"io.hull.asUser": { "email": "hello@example.com" },
"io.hull.asAccount": { "domain": "example.com" },
"io.hull.subjectType": "account"
}
As with Users, Accounts without an external_id
can be merged. These are marked as mergeable
internally. Only Accounts that are explicitly mergeable
can be merged.
If the resolution step results in the merging of two Accounts, the resulting Account is the recipient of the merge.
The merge operation is destructive and will:
external_id
, domain
and anonymous_ids
If there are different values for the same attribute, the recipient
of the merge will keep its original value.
Example:
// Recipient account
{
"id" : "1",
"external_id": "123",
"name": "Hull",
"domain" : "hull.io",
"created_at" : "2018-10-06T14:41:07Z"
}
// Account to be merged
{
"id" : "2",
"name": "Hull.io",
"domain" : "hull.io",
"created_at" : "2018-10-08T12:05:17Z",
"hubspot/state" : "opportunity",
}
// Resulting Account
{
"id" : "1",
"external_id": "123",
"name": "Hull",
"domain" : "hull.io",
"created_at" : "2018-10-06T14:41:07Z",
"hubspot/state" : "opportunity"
}
The final step of ingestion of User & Account data is to prepare the User Report and Account Report. This is the record that will be indexed and made available for search and segmentation, and visible in the User Profile and Account Profile.
Here is an example of a User Report. It is formatted with:
{
// Root
"id": "50cf040bb85d0c8031000001",
"external_id": "123-456-789",
"created_at": "2012-12-17T11:37:47Z",
"email": "hello@hull.io",
"domain": "hull.io",
"name": "Stephane Hull",
"last_name": "Hull",
"first_name": "Hello",
"address_city": "Paris",
"address_state": "Ile-de-France",
"address_country": "France",
"accepts_marketing": false,
"is_approved": true,
"has_password": true,
"anonymous_ids": ["310dd12c-a1f3-2dee-54c2-c12426d1367b"],
"segment_ids": ["56a7902c8d371442330000ee", "595296234d8debfb330026a0"],
"sign_up_url": "https://accounts.hullapp.io/",
// Account
"account": {
"id": "5a04118fea0662ec4b0471fb",
"domain": "hull.io",
"clearbit/name": "hull",
"created_at": "2017-11-09T08:27:59Z",
"updated_at": "2017-11-20T20:29:05Z"
},
// Identities
"identities_count":1,
"main_identity": "github",
"github_connected_at": "2012-12-17T11:37:47Z",
"github_id": "42",
"github_username": "hull",
"google_connected_at": "2014-03-06T13:10:12Z",
"google_id": "117779341887200000000",
// Sessions
"last_seen_at": "2018-02-11T14:22:47Z",
"first_seen_at": "2016-09-07T08:30:02Z",
"first_session_started_at": "2016-09-07T08:30:02Z",
"first_session_platform_id": "53175bb2635c78c8790032cd",
"first_session_initial_url": "https://www.hull.io/features",
"first_session_initial_referrer": "https://www.google.com/",
"signup_session_started_at": "2016-09-07T08:30:02Z",
"signup_session_platform_id": "53175bb2635c78c8790032cd",
"signup_session_initial_url": "https://www.hull.io/features",
"signup_session_initial_referrer": "https://www.google.com",
"latest_session_started_at": "2018-02-11T14:19:26Z",
"latest_session_platform_id": "53175bb2635c78c8790032cd",
"latest_session_initial_url": "https://dashboard.hullapp.io/",
"latest_session_initial_referrer": "",
// Attributes
"traits_request_demo": true,
"traits_company_name": "hull.io",
"traits_nps_rating": 10,
"traits_nps_score": 100,
"traits_intercom_email": "hello@hull.io",
"traits_salesforce_lead/status": "New",
"traits_salesforce_lead/owner_id": "00546000001HobtAAC",
"traits_salesforce_lead/company": "hull.io"
}
Here is an example of a Account Report payload.
Note that when an Account Report is built, this also schedules a rebuild of all associated User Reports.
{
"id": "5a04118fea0662ec4b0471fb",
"domain": "Hull",
"domain": "hull.io",
"clearbit/name": "hull",
"created_at": "2017-11-09T08:27:59Z",
"updated_at": "2017-11-20T20:29:05Z"
}
You can view and query incoming data from Connectors to Hull in the logs.
Incoming data are logged with the incoming.{entity}.{status}
format. These are visible in the Logs view on the Dashboard, or Logs view within each Connector’s page (for viewing logs for a specific Connector only).
Log Type | Description |
---|---|
incoming.user.success | User Attributes have been updated successfully |
incoming.user.error | User Attributes have not been updated successfully |
incoming.user.skip | User update was skipped |
incoming.account.success | Account Attributes have been updated successfully |
incoming.account.error | Account Attributes have not been updated successfully |
incoming.account.skip | Account update was skipped |
incoming.event.success | Event was successfully ingested |
incoming.event.error | Event was not successfully ingested |
incoming.event.skip | Event was skipped |
All logs feature an identifier to associate Users and Accounts with the logs. Learn more about identifiers and identity resolution on Hull. For outgoing notifications these include:
Identifier | Description |
---|---|
user_id | Hull User ID |
user_email | User email |
user_external_id | External ID on the User |
user_anonymous_id | Identifier from the external service or anonymous ID from web sessions |
account_id | ID on the Account |
account_external_id | External ID on the Account |
account_domain | Account domain |
All data sent out through Connectors is logged and queryable by:
Connector identifier | Description |
---|---|
connector_name | Reference name of the connector (ex. salesforce or processor ) |
connector_id | ID of the connector |