This page is based on the previous page Learning canonical_id for Cookie ID, so some duplicate explanations have been omitted; please make sure you have read through the previous page first before continuing.
This page deals with a case where the same person is identified by stitching together the td_client_id, td_global_id, td_ssc_id and email of multiple sites. There are four sites, but each table has a different combination of keys that it has.
| site_aaa | site_aaa | site_aaa | site_xxx | site_xxx | site_yyy | site_yyy | site_zzz | site_zzz | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 日付 | td_client_id | td_global_id | td_ssc_id | td_ssc_id | td_global_id | td_ssc_id | td_client_id | ||||||
| 1月 | 5日 | aaa_001 | 3rd_001 | ssc_001 | a@ex.com | ssc_001 | |||||||
| 15日 | aaa_001 | 3rd_002 | ssc_001 | zzz_001 | a@ex.com | ||||||||
| 25日 | aaa_001 | 3rd_003 | ssc_001 | ||||||||||
| 2月 | 5日 | aaa_001 | 3rd_004 | ssc_001 | ssc_001 | 3rd_004 | |||||||
| 15日 | ssc_001 | 3rd_005 | a@ex.com | ssc_001 | |||||||||
| 25日 | a@ex.com | ssc_001 | zzz_003 | a@ex.com | |||||||||
| 3月 | 5日 | zzz_003 | a@ex.com | ||||||||||
| 15日 | ssc_001 | 3rd_008 | zzz_003 | a@ex.com | |||||||||
| 25日 | aaa_002 | 3rd_009 | ssc_002 | ssc_001 | 3rd_009 | ||||||||
| 4月 | 5日 | aaa_002 | 3rd_010 | ssc_002 | b@ex.com | ssc_003 | |||||||
| 15日 | b@ex.com | ssc_003 | zzz_004 | b@ex.com | |||||||||
| 25日 | ssc_002 | 3rd_010 | zzz_004 | c@ex.com | |||||||||
| 5月 | 5日 | aaa_003 | 3rd_013 | ssc_003 | ssc_002 | 3rd_013 | |||||||
| 15日 | aaa_003 | 3rd_014 | ssc_003 | ||||||||||
| 25日 | aaa_003 | 3rd_015 | ssc_004 | c@ex.com | ssc_003 | zzz_005 | c@ex.com | ||||||
| 6月 | 5日 | aaa_003 | 3rd_016 | ssc_004 | ssc_003 | 3rd_016 | |||||||
| 15日 | ssc_003 | 3rd_017 | zzz_005 | c@ex.com | |||||||||
| 25日 | c@ex.com | ssc_004 | zzz_005 | c@ex.com |
The above data is assumed to be stored in TD tables with the names site_aaa, site_xxx, site_yyy and site_zzz for each site, as in the site_aaa table below. This table can be generated by using Workflow Samples.
| time | site | td_client_id | td_global_id | td_ssc_id |
|---|---|---|---|---|
| 2023/01/05 | aaa.jp | aaa_001 | 3rd_001 | ssc_001 |
| 2023/01/15 | aaa.jp | aaa_001 | 3rd_002 | ssc_001 |
| 2023/01/25 | aaa.jp | aaa_001 | 3rd_003 | ssc_001 |
| 2023/02/05 | aaa.jp | aaa_001 | 3rd_004 | ssc_001 |
| 2023/02/15 | aaa.jp | |||
| 2023/02/25 | aaa.jp | |||
| 2023/03/05 | aaa.jp | |||
| 2023/03/15 | aaa.jp | |||
| 2023/03/25 | aaa.jp | aaa_002 | 3rd_009 | ssc_002 |
| 2023/04/05 | aaa.jp | aaa_002 | 3rd_010 | ssc_002 |
| 2023/04/15 | aaa.jp | |||
| 2023/04/25 | aaa.jp | |||
| 2023/05/05 | aaa.jp | aaa_003 | 3rd_013 | ssc_003 |
| 2023/05/15 | aaa.jp | aaa_003 | 3rd_014 | ssc_003 |
| 2023/05/25 | aaa.jp | aaa_003 | 3rd_015 | ssc_004 |
| 2023/06/05 | aaa.jp | aaa_003 | 3rd_016 | ssc_004 |
| 2023/06/15 | aaa.jp | |||
| 2023/06/25 | aaa.jp |
| time | site | td_ssc_id | td_global_id |
|---|---|---|---|
| 2023/01/05 | xxx.jp | NULL | NULL |
| 2023/01/15 | xxx.jp | NULL | NULL |
| 2023/01/25 | xxx.jp | NULL | NULL |
| 2023/02/05 | xxx.jp | ssc_001 | 3rd_004 |
| 2023/02/15 | xxx.jp | ssc_001 | 3rd_005 |
| 2023/02/25 | xxx.jp | NULL | NULL |
| 2023/03/05 | xxx.jp | NULL | NULL |
| 2023/03/15 | xxx.jp | ssc_001 | 3rd_008 |
| 2023/03/25 | xxx.jp | ssc_001 | 3rd_009 |
| 2023/04/05 | xxx.jp | NULL | NULL |
| 2023/04/15 | xxx.jp | NULL | NULL |
| 2023/04/25 | xxx.jp | ssc_002 | 3rd_010 |
| 2023/05/05 | xxx.jp | ssc_002 | 3rd_013 |
| 2023/05/15 | xxx.jp | NULL | NULL |
| 2023/05/25 | xxx.jp | NULL | NULL |
| 2023/06/05 | xxx.jp | ssc_003 | 3rd_016 |
| 2023/06/15 | xxx.jp | ssc_003 | 3rd_017 |
| 2023/06/25 | xxx.jp | NULL | NULL |
| time | site | td_ssc_id | |
|---|---|---|---|
| 2023/01/05 | yyy.jp | a@ex.com | ssc_001 |
| 2023/01/15 | yyy.jp | NULL | NULL |
| 2023/01/25 | yyy.jp | NULL | NULL |
| 2023/02/05 | yyy.jp | NULL | NULL |
| 2023/02/15 | yyy.jp | a@ex.com | ssc_001 |
| 2023/02/25 | yyy.jp | a@ex.com | ssc_001 |
| 2023/03/05 | yyy.jp | NULL | NULL |
| 2023/03/15 | yyy.jp | NULL | NULL |
| 2023/03/25 | yyy.jp | NULL | NULL |
| 2023/04/05 | yyy.jp | b@ex.com | ssc_003 |
| 2023/04/15 | yyy.jp | b@ex.com | ssc_003 |
| 2023/04/25 | yyy.jp | NULL | NULL |
| 2023/05/05 | yyy.jp | NULL | NULL |
| 2023/05/15 | yyy.jp | NULL | NULL |
| 2023/05/25 | yyy.jp | c@ex.com | ssc_003 |
| 2023/06/05 | yyy.jp | NULL | NULL |
| 2023/06/15 | yyy.jp | NULL | NULL |
| 2023/06/25 | yyy.jp | c@ex.com | ssc_004 |
| time | site | td_client_id | |
|---|---|---|---|
| 2023/01/05 | zzz.jp | NULL | NULL |
| 2023/01/15 | zzz.jp | zzz_001 | a@ex.com |
| 2023/01/25 | zzz.jp | NULL | NULL |
| 2023/02/05 | zzz.jp | NULL | NULL |
| 2023/02/15 | zzz.jp | NULL | NULL |
| 2023/02/25 | zzz.jp | zzz_003 | a@ex.com |
| 2023/03/05 | zzz.jp | zzz_003 | a@ex.com |
| 2023/03/15 | zzz.jp | zzz_003 | a@ex.com |
| 2023/03/25 | zzz.jp | NULL | NULL |
| 2023/04/05 | zzz.jp | NULL | NULL |
| 2023/04/15 | zzz.jp | zzz_004 | b@ex.com |
| 2023/04/25 | zzz.jp | zzz_004 | c@ex.com |
| 2023/05/05 | zzz.jp | NULL | NULL |
| 2023/05/15 | zzz.jp | NULL | NULL |
| 2023/05/25 | zzz.jp | zzz_005 | c@ex.com |
| 2023/06/05 | zzz.jp | NULL | NULL |
| 2023/06/15 | zzz.jp | zzz_005 | c@ex.com |
| 2023/06/25 | zzz.jp | zzz_005 | c@ex.com |
+call_unification:
http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
headers:
- authorization: ${secret:td.apikey}
method: POST
retry: true
content_format: json
content:
run_canonical_ids: true
run_enrichments: true
run_master_tables: true
full_refresh: true
keep_debug_tables: true
unification:
!include : unification_ex2.ymlname: test_id_unification_ex2
keys:
- name: td_client_id
- name: td_global_id
- name: td_ssc_id
- name: email
tables:
- database: test_id_unification_ex2
table: site_aaa
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: td_global_id, key: td_global_id}
- {column: td_ssc_id, key: td_ssc_id}
- database: test_id_unification_ex2
table: site_xxx
key_columns:
- {column: td_ssc_id, key: td_ssc_id}
- {column: td_global_id, key: td_global_id}
- database: test_id_unification_ex2
table: site_yyy
key_columns:
- {column: email, key: email}
- {column: td_ssc_id, key: td_ssc_id}
- database: test_id_unification_ex2
table: site_zzz
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: email, key: email}
canonical_ids:
- name: person_id
merge_by_keys: [email, td_ssc_id, td_client_id, td_global_id]
merge_iterations: 5
master_tables:
- name: master_table_ex2
canonical_id: person_id
attributes:
- name: email
source_columns:
- {table: site_yyy, order: last, order_by: time, priority: 1}
- {table: site_zzz, order: last, order_by: time, priority: 1}
- name: td_ssc_id
array_elements: 5
source_columns:
- {table: site_xxx, order: last, order_by: time, priority: 1}
- {table: site_yyy, order: last, order_by: time, priority: 2}
- name: td_client_id
array_elements: 5
source_columns:
- {table: site_aaa, order: last, order_by: time, priority: 1}
- {table: site_zzz, order: last, order_by: time, priority: 4}
- name: td_global_id
array_elements: 5
source_columns:
- {table: site_aaa, order: last, order_by: time, priority: 1}
- {table: site_xxx, order: last, order_by: time, priority: 2}The initial state of the graph is as follows:

The graph_unify_loop_0 table is created within the +extract_and_merge task of the workflow. Let's examine how the graph is constructed.
canonical_ids:
- name: person_id
merge_by_keys: [email, td_ssc_id, td_client_id, td_global_id]
merge_iterations: 5In this configuration of canonical_ids:, merge_by_keys: prioritizes email to ensure the creation of the most universal canonical_id, followed by td_ssc_id.
Using the original data, the graph_unify_loop_0 table is generated through the same steps as in the previous section.

Unlike before, pairs for site_aaa containing three or more keys in a single table are created. In this case, for a single record, the leader and followers are as follows:
- Leader:
td_ssc_id
- Followers:
td_client_idtd_global_idtd_ssc_id
As a result, the record expands into three pairs for each follower (highlighted in green in the figure above).
follower_ns and leader_ns identify which key corresponds to each id. (In this example, 1 is td_client_id, 2 is td_global_id, 3 is td_ssc_id, and 4 is email.) In this configuration, the priority order of ns is [4,3,1,2].
Based on the graph from loop_0, leaders are updated according to the following rules:

The key difference this time is:
- If a leader is connected to a higher-priority leader, it will be replaced before leaders of the same priority adjacent to the follower.
All leaders are replaced with new_leader (unless they are the minimum or maximum priority, in which case they remain unchanged). The table resulting from merging all these replacements is the graph_unify_loop_1 table.


Subsequent loops follow the same process with the graph from the previous iteration.

The process converges in the third loop.

In this example, the order of keys specified in merge_by_keys: prioritizes email as the highest. This is because, unlike other keys, the value of email is less likely to change over time. For users without an email, the td_ssc_id becomes the final leader, and the canonical_id is generated based on this value.

The canonical_id used in the master_table is generally created with all keys specified in merge_by_keys:. However, canonical_id can also be created using a subset of the keys.
That said, the canonical_id configured in the master_table must meet the following condition:
- The configuration must ensure that a
canonical_idis assigned to every record in all source tables. In other words, every source table must have at least one key included inmerge_by_keys:.
For this example (Example 2 data), the keys in each table are as follows:
site_aaatd_client_idtd_global_idtd_ssc_id
site_xxxtd_ssc_idtd_global_id
site_yyyemailtd_ssc_id
site_zzztd_client_idemail
Given this, the merge_by_keys: configurations valid for the master_table are:
- Good:
[email, td_ssc_id, td_global_id, td_client_id] - Good:
[td_ssc_id, td_global_id, td_client_id] - Good:
[email, td_global_id] - Good:
[td_ssc_id, td_client_id]
On the other hand, the following configurations will result in errors:
- Bad:
[td_global_id, td_client_id]- Missing keys from the
site_yyytable.
- Missing keys from the
- Bad:
[email, td_client_id]- Missing keys from the
site_xxxtable.
- Missing keys from the
In such cases, an error like the following will appear:
400 Bad Request: {"master_tables[0].attributes[0].source_columns[0].table":["is invalid because it doesn't have columns merged into this master_table's canonical_id 'person_id'"],"master_tables[0].attributes[1].sou...For example, in the [td_global_id, td_client_id] case, canonical_id is only assigned to records generated using td_global_id and td_client_id. Since the site_yyy table does not include these keys, it cannot be assigned a canonical_id. At least one key from merge_by_keys: (used for stitching) must be present to ensure that every record in all tables is assigned a canonical_id.