Up until the previous page, only one canonical_id was created by setting all the keys specified in merge_by_keys:. In this page, we introduce how to configure multiple canonical_ids.
We will use the data from Learning canonical_id for Cookie ID.
By specifying multiple canonical_ids in canonical_ids:, it is possible to generate multiple independent canonical_ids simultaneously. However, only one canonical_id can be set as the Parent Segment, which is generally the one configured with all keys in merge_by_keys:.
name: test_id_unification_ex3
keys:
- name: td_client_id
invalid_texts: ['']
- name: td_global_id
valid_regexp: "3rd_*"
invalid_texts: ['']
tables:
- database: test_id_unification_ex3
table: site_aaa
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: td_global_id, key: td_global_id}
- database: test_id_unification_ex3
table: site_xxx
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: td_global_id, key: td_global_id}
- database: test_id_unification_ex3
table: site_yyy
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: td_global_id, key: td_global_id}
- database: test_id_unification_ex3
table: site_zzz
key_columns:
- {column: td_client_id, key: td_client_id}
- {column: td_global_id, key: td_global_id}
canonical_ids:
- name: brand_id_ax
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_aaa', 'site_xxx']
merge_iterations: 5
- name: brand_id_yz
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_yyy', 'site_zzz']
merge_iterations: 5
- name: unified_brand_id
merge_by_keys: [td_client_id, td_global_id]
merge_iterations: 5
master_tables:
- name: master_table_ex3
canonical_id: unified_brand_id
attributes:
- name: td_client_id
invalid_texts: ['']
source_columns:
- {table: 'site_aaa', order: first, order_by: td_client_id, priority: 1}
- {table: 'site_xxx', order: first, order_by: td_client_id, priority: 1}
- {table: 'site_yyy', order: first, order_by: td_client_id, priority: 1}
- {table: 'site_zzz', order: first, order_by: td_client_id, priority: 1}
- name: td_global_id
valid_regexp: "3rd_*"
invalid_texts: ['']
source_columns:
- {table: 'site_aaa', order: last, order_by: time, priority: 1}
- {table: 'site_xxx', order: last, order_by: time, priority: 1}
- {table: 'site_yyy', order: last, order_by: time, priority: 1}
- {table: 'site_zzz', order: last, order_by: time, priority: 1} brand_id_ax is created by performing unification on data from only two sites of the same brand, with the goal of assigning a canonical_id specific to this brand. Similarly, brand_id_yz is configured for another brand. Additionally, a unified_brand_id spanning all brands is created and set for the master_table.
In this configuration, the following three canonical_ids will execute their unification algorithms in parallel:
brand_id_axbrand_id_yzunified_brand_id
Below are the final graphs for each unification algorithm.



In this example, source_tables: is specified as follows in the canonical_ids: configuration:
canonical_ids:
- name: brand_id_ax
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_aaa', 'site_xxx']
merge_iterations: 5
- name: brand_id_yz
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_yyy', 'site_zzz']
merge_iterations: 5
- name: unified_brand_id
merge_by_keys: [td_client_id, td_global_id]
merge_iterations: 5In this setup, the unification algorithm uses only the keys from the tables specified in source_tables:. This configuration can be useful when you want to generate canonical_ids for specific brands or games in addition to canonical_ids spanning all data sources.
Next, we will introduce how to create a master_table for each canonical_id. Multiple master_table entries can be enumerated as shown below:
master_tables:
- name: master_table_ex3
canonical_id: unified_brand_id
attributes:
- name: td_client_id
invalid_texts: ['']
source_columns:
- {table: '${td.tbl_aaa}', order: first, order_by: td_client_id, priority: 1}
- {table: '${td.tbl_xxx}', order: first, order_by: td_client_id, priority: 1}
- {table: '${td.tbl_yyy}', order: first, order_by: td_client_id, priority: 1}
- {table: '${td.tbl_zzz}', order: first, order_by: td_client_id, priority: 1}
- name: td_global_id
valid_regexp: "3rd_*"
invalid_texts: ['']
source_columns:
- {table: '${td.tbl_aaa}', order: last, order_by: time, priority: 1}
- {table: '${td.tbl_xxx}', order: last, order_by: time, priority: 1}
- {table: '${td.tbl_yyy}', order: last, order_by: time, priority: 1}
- {table: '${td.tbl_zzz}', order: last, order_by: time, priority: 1}
- name: master_table_brand_id_ax
canonical_id: brand_id_ax
attributes:
- name: td_client_id
invalid_texts: ['']
source_columns:
- {table: '${td.tbl_aaa}', order: first, order_by: td_client_id, priority: 1}
- {table: '${td.tbl_xxx}', order: first, order_by: td_client_id, priority: 1}
- name: td_global_id
valid_regexp: "3rd_*"
invalid_texts: ['']
source_columns:
- {table: '${td.tbl_aaa}', order: last, order_by: time, priority: 1}
- {table: '${td.tbl_xxx}', order: last, order_by: time, priority: 1}
- name: master_table_brand_id_yz
canonical_id: brand_id_yz
attributes:
- name: td_client_id
invalid_texts: ['']
source_columns:
- {table: '${td.tbl_yyy}', order: first, order_by: td_client_id, priority: 1}
- {table: '${td.tbl_zzz}', order: first, order_by: td_client_id, priority: 1}
- name: td_global_id
valid_regexp: "3rd_*"
invalid_texts: ['']
source_columns:
- {table: '${td.tbl_yyy}', order: last, order_by: time, priority: 1}
- {table: '${td.tbl_zzz}', order: last, order_by: time, priority: 1}The resulting master_table for each case would look like this:
| unified_brand_id | td_client_id | td_global_id |
|---|---|---|
| jiioKQC2uMFH | aaa_001 | 3rd_018 |
| brand_id_ax | td_client_id | td_global_id |
|---|---|---|
| jiioKQC2uMFH | aaa_001 | 3rd_005 |
| 0_LjXD4ScR1H | aaa_002 | 3rd_010 |
| A0T5jlq77XJH | aaa_003 | 3rd_017 |
| brand_id_yz | td_client_id | td_global_id |
|---|---|---|
| NCnsgr0MusJH | yyy_001 | 3rd_001 |
| UA9xZODmd_pH | yyy_002 | 3rd_008 |
| lb8seLF7YZBH | yyy_003 | 3rd_012 |
| jaUfFmYfVjJH | yyy_004 | 3rd_018 |
| nBTjvWWxqydH | zzz_001 | 3rd_002 |
Even if multiple master_tables are created, only one enriched_table is generated per source table. The key difference from Case1 is that all canonical_ids are enriched into the source tables, allowing linkage with any master_table. However, for brand_id_ax and brand_id_yz specified under source_tables, enrichment is limited to the designated tables. The enriched tables are shown below:
| site | td_client_id | td_global_id | brand_id_ax | unified_brand_id |
|---|---|---|---|---|
| aaa.jp | aaa_001 | 3rd_001 | jiioKQC2uMFH | jiioKQC2uMFH |
| aaa.jp | aaa_001 | 3rd_002 | jiioKQC2uMFH | jiioKQC2uMFH |
| aaa.jp | aaa_001 | 3rd_003 | jiioKQC2uMFH | jiioKQC2uMFH |
| aaa.jp | aaa_001 | 3rd_004 | jiioKQC2uMFH | jiioKQC2uMFH |
Here, the same value appears for both brand_id_ax and unified_brand_id, as both were based on aaa_001, which happened to be the final leader, generating the respective canonical_id. However, the two canonical_ids are not inherently related.
| site | td_client_id | td_global_id | brand_id_ax | unified_brand_id | time |
|---|---|---|---|---|---|
| xxx.jp | 1672876800 | ||||
| xxx.jp | 1673740800 | ||||
| xxx.jp | 1674604800 | ||||
| xxx.jp | xxx_001 | 3rd_004 | jiioKQC2uMFH | jiioKQC2uMFH | 1675555200 |
| xxx.jp | xxx_001 | 3rd_005 | jiioKQC2uMFH | jiioKQC2uMFH | 1676419200 |
| site | td_client_id | td_global_id | brand_id_yz | unified_brand_id | time |
|---|---|---|---|---|---|
| yyy.jp | yyy_001 | 3rd_001 | NCnsgr0MusJH | jiioKQC2uMFH | 1672876800 |
| yyy.jp | 1673740800 | ||||
| yyy.jp | 1674604800 | ||||
| yyy.jp | 1675555200 | ||||
| yyy.jp | yyy_002 | 3rd_005 | UA9xZODmd_pH | jiioKQC2uMFH | 1676419200 |
| site | td_client_id | td_global_id | brand_id_yz | unified_brand_id | time |
|---|---|---|---|---|---|
| zzz.jp | 1672876800 | ||||
| zzz.jp | zzz_001 | 3rd_002 | nBTjvWWxqydH | jiioKQC2uMFH | 1673740800 |
| zzz.jp | 1674604800 | ||||
| zzz.jp | 1675555200 | ||||
| zzz.jp | 1676419200 |
By using merge_by_canonical_ids: in the configuration for each canonical_id within canonical_ids:, you can generate a unified canonical_id by merging other canonical_ids.
canonical_ids:
- name: brand_id_ax
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_aaa', 'site_xxx']
merge_iterations: 5
- name: brand_id_yz
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_yyy', 'site_zzz']
merge_iterations: 5
- name: unified_brand_id
merge_by_keys: []
merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
merge_iterations: 5In this configuration, unified_brand_id is generated by merging brand_id_ax and brand_id_yz.
Since unified_brand_id is set as the canonical_id for the master_table, it must cover all source tables. Therefore, it is crucial to ensure that the combined source_tables: of brand_id_ax and brand_id_yz cover all necessary tables.
Although the merge_by_keys: for unified_brand_id is empty, the configuration works because merge_by_canonical_ids is specified.
Under this setup, the Unification Algorithm will first process brand_id_ax and brand_id_yz in parallel. Once these are complete, the Unification Algorithm for unified_brand_id will execute.
When merge_by_canonical_ids: is specified, the initial graph of the Unification Algorithm takes on a completely different structure.
In addition to the keys set in merge_by_keys:, the keys from the merge_by_keys: of the canonical IDs specified in merge_by_canonical_ids:, as well as the canonical IDs themselves, are added as keys.
The configuration for unified_brand_id will be used to explain this process.
- name: unified_brand_id
merge_by_keys: []
merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
merge_iterations: 5Under this configuration, graph_unify_loop_0 will look as follows:
Despite having an empty merge_by_keys:, all keys from all data sources and the canonical IDs themselves are included.
The initial graph starts by merging the following graphs, meaning that the process begins with a highly interconnected state:
- The final graph of
brand_id_ax - The final graph of
brand_id_yz - The initial graph generated from the combined source tables of
brand_id_axandbrand_id_yz
The applicable merge_by_keys: configuration is determined by the following rules:
- The canonical IDs specified in
merge_by_canonical_ids:are added as stitching keys. - Then,
merge_by_keys:is determined in the following order of precedence:- Keys in the
merge_by_keys:ofunified_brand_idhave the highest priority. - Canonical IDs from
brand_id_axandbrand_id_yz, as specified inmerge_by_canonical_ids:, have the next highest priority in the order listed. - The
merge_by_keys:settings ofbrand_id_axandbrand_id_yz(excluding keys already used byunified_brand_id) have the lowest priority.
- Keys in the
In this example, the final merge_by_keys: is as follows:
merge_by_keys: [canonical_id (brand_id_ax), canonical_id (brand_id_yz), td_client_id, td_global_id]The Unification Algorithm is applied to graph_unify_loop_0 using this priority order.

| key | canonical_id |
|---|---|
| 04zahkB6hA9G | wONJPFlQsJyj |
In this configuration, the canonical_id is generated based on canonical_id (brand_id_ax). However, generating a canonical_id based on another canonical_id means that any changes in the base keys of the original canonical_id will also affect this one, adding complexity. To avoid this, consider specifying keys in the merge_by_keys: of unified_brand_id:
- name: unified_brand_id
merge_by_keys: [td_global_id]
merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
merge_iterations: 5In this case, the true priority of merge_by_keys: becomes:
merge_by_keys: [td_global_id, canonical_id (brand_id_ax), canonical_id (brand_id_yz), td_client_id]The initial graph remains the same, but the final graph will use td_global_id as the leader.
A canonical_id can be set as an attribute in the Parent Segment.
canonical_ids:
- name: brand_id_ax
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_aaa','site_xxx']
merge_iterations: 5
- name: brand_id_yz
merge_by_keys: [td_client_id, td_global_id]
source_tables: ['site_yyy','site_zzz']
merge_iterations: 5
- name: unified_brand_id
merge_by_keys: []
merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
merge_iterations: 5
master_tables:
- name: master_table_ex3
canonical_id: unified_brand_id
attributes:
- name: brand_id_ax
source_canonical_id: brand_id_ax
- name: brand_id_yz
source_canonical_id: brand_id_yz
- name: td_client_id
invalid_texts: ['']
source_columns:
- {table: 'site_aaa', order: first, order_by: td_client_id, priority: 1}
- {table: 'site_xxx', order: first, order_by: td_client_id, priority: 1}
- {table: 'site_yyy', order: first, order_by: td_client_id, priority: 1}
- {table: 'site_zzz', order: first, order_by: td_client_id, priority: 1}
- name: td_global_id
valid_regexp: "3rd_*"
invalid_texts: ['']
source_columns:
- {table: 'site_aaa', order: last, order_by: time, priority: 1}
- {table: 'site_xxx', order: last, order_by: time, priority: 1}
- {table: 'site_yyy', order: last, order_by: time, priority: 1}
- {table: 'site_zzz', order: last, order_by: time, priority: 1} In this configuration, the canonical IDs brand_id_ax and brand_id_yz are set within the attributes: section. Other canonical IDs can also be configured as attributes, but they must always be related to the canonical_id of the master_table (in this case, unified_brand_id) through merge_by_canonical_ids:.
It is also worth reiterating that brand_id_ax and brand_id_yz cannot be set as the canonical ID of the master_table. This is because they do not cover all source tables, which prevents the enriched canonical ID from being applied to all tables.
The output master_table generates records based only on the number of unified_brand_id specified in canonical_id:. It is important to understand that attributes do not influence this record generation.
As seen in Case1, brand_id_ax and brand_id_yz each contained multiple canonical IDs. However, when referenced as attributes, only the canonical ID corresponding to the most recent key (in terms of the time column) is included.
| unified_brand_id | td_client_id | td_global_id | brand_id_ax | brand_id_yz |
|---|---|---|---|---|
| wONJPFlQsJyj | aaa_001 | 3rd_018 | XuCLIRp30bxG | RXcPcKu6CO1G |