What is ID Unification?
Overview
ID Unification is the process of stitching together multiple tables using various identifiers to assign a unique customer ID (canonical_id
or persistent_id
) to each user. In simpler terms, it consolidates identifiers like cookie_id
and email addresses from various user data sources to identify and group "the same person."
Since customer data often contains different identifiers across different data sources, simply aggregating this data doesn't link these sources together. This necessitates the ID Unification process to make the data usable.
The above diagram illustrates the relationships between the IDs (identifiers) associated with users. Below, we outline common types of IDs that are unified, showing how the process integrates data to uniquely identify individuals.
Types of IDs Linked to Users
User-Associated IDs
These are identifiers issued by various services, such as membership IDs or email addresses used during registration.
Examples: member_id
, customer_id
, email_address
Device-Associated IDs
IDs issued for each device, such as ADID/IDFA, are used when collecting application logs.
Examples: ADID
, IDFA
Browser-Associated IDs
Cookies issued per browser and source (1st-party or 3rd-party) are used. For stitching across data sources, having both 1st-party and 3rd-party cookie_id
is advantageous.
Examples: cookie_id
, td_ssc_id
ID Unification Feature Provided by Treasure Workflow
ID Unification is provided as a standard feature accessible to all users. To utilize this tool, users mainly need to prepare:
- A .dig file to invoke the unification workflow.
- A .yml file defining the data sources and stitching keys for ID Unification.
.dig File to Invoke the Unification Workflow
The .dig file makes an HTTP call to invoke the Unification Workflow. This approach eliminates the need to download workflow code from GitHub and ensures all users receive updates simultaneously.
+call_unification:
http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
headers:
- authorization: ${secret:td.apikey}
method: POST
…
.yml File for Defining Data Sources and Stitching Keys
The .yml
file specifies the source tables and the keys used for stitching. While .dig files remain relatively consistent across use cases, .yml
files depend entirely on the user's data structure and must be carefully written. Copy-pasting a template or another user's file won't suffice.
name: test_id_unification_ex1
keys:
- name: td_client_id
- name: td_global_id
tables:
- database: test_id_unification_ex1
table: ex1_site_aaa
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: td_global_id, key: td_global_id}
…
Defining Inputs and Outputs of ID Unification
Inputs of ID Unification
ID Unification requires enumerating:
- All tables with identifiers that can be stitched together.
- The identifiers (keys) within each table used for stitching.
These keys are used to traverse all tables and consolidate data.
Outputs of ID Unification
The most significant output of ID Unification is the assignment of a canonical_id
to each identified individual. This process enriches the source tables, appending the canonical_id
to facilitate further operations.
All stitched tables are output with the canonical_id
, enabling table joins and analysis at the user level. For instance, using this ID as a join key allows unification of other tables for user-based aggregation and analysis.
ID Unification for Audience Studio
ID Unification is a fundamental tool for utilizing Audience Studio. It begins by enumerating all source tables (attribute_table, behavior_table) and the keys within these tables.
Outputs for Audience Studio
In addition to enriched source tables, Audience Studio outputs a master_table containing the canonical\_id
. This ensures all necessary tables for Master Segment creation—master_table, attribute_table, and behavior_table are prepared with enriched data.
Workflow Examples
Worlfklow examples in this ID Unification doc is published at Treasure Boxes. You can try it out on your Treasure Data account.