As mentioned in About canonical_id, canonical_id is not guaranteed to be immutable. In some cases, a different canonical_id may be assigned to a user who was previously identified as the same person.
To address this issue, a different mechanism has been introduced to maintain an persistency, robust value similar to canonical_id. This is called persistent_id. This page introduces the characteristics of persistent_id and explains how to configure it.
If persistent_id is used instead of canonical_id, users do not need to set merge_by_keys: to obtain an robust ID.
persistent_id is not supported by do_not_merge_key.
The mechanism for generating values for canonical_id and persistent_id is illustrated using the following examples:
| Day | td_client_id | td_global_id |
|---|---|---|
| 2024-03-01 | aaa_002 | 3rd_001 |
| 2024-03-02 | aaa_001 | 3rd_001 |
Both canonical_id and persistent_id can be configured as follows:
canonical_ids:
- name: cid
merge_by_keys: [td_client_id, td_global_id]
persistent_ids:
- name: pid
merge_by_keys: [td_client_id, td_global_id]Under this configuration, the smallest value in td_client_id (in this case, aaa_001) is selected as the leader, and the canonical_id is generated based on this value.
For persistent_id, the smallest value in td_client_id in terms of time is chosen as the leader. In this example, aaa_002 is the leader, and the persistent_id is generated based on this value.
To understand how persistent_id is generated, let’s examine the transition of the leader as new data is added each day.
| Day | td_client_id | td_global_id |
|---|---|---|
| 2024-03-01 | aaa_002 | 3rd_001 |
The leaders for each ID on Day 1 are as follows:
| Day | Leader (canonical_id) | Leader (persistent_id) |
|---|---|---|
| 2024-03-01 | aaa_002 | aaa_002 |
| Day | td_client_id | td_global_id |
|---|---|---|
| 2024-03-01 | aaa_002 | 3rd_001 |
| 2024-03-02 | aaa_001 | 3rd_001 |
On Day 2, the leaders are as follows:
| Day | Leader (canonical_id) | Leader (persistent_id) |
|---|---|---|
| 2024-03-01 | aaa_002 | aaa_002 |
| 2024-03-02 | aaa_001 | aaa_002 |
The leader for canonical_id changes from the previous day, resulting in a new canonical_id value. However, the leader for persistent_id remains unchanged, so the same persistent_id value is retained.
This mechanism ensures that persistent_id remains persistent by always selecting the smallest value based on time, ensuring the leader does not change regardless of new key values.
When two individuals are linked, persistent_id ensures that the value of the earlier key remains as the leader. Consider the following example:
| Day | site_aaa | site_aaa | site_bbb | site_bbb |
|---|---|---|---|---|
td_client_id | td_global_id | td_client_id | td_global_id | |
| 2024-03-01 | bbb_001 | 3rd_001 | ||
| 2024-03-02 | aaa_001 | 3rd_002 | ||
| 2024-03-03 | aaa_001 | 3rd_001 |
| Day | Leader (canonical_id) | Leader (persistent_id) |
|---|---|---|
| 2024-03-02 | bbb_001 | bbb_001 |
| Day | Leader (canonical_id) | Leader (persistent_id) |
|---|---|---|
| 2024-03-02 | aaa_001 | aaa_001 |
When Person 1 and Person 2 are linked on Day 3:
| Day | Leader (canonical_id) | Leader (persistent_id) |
|---|---|---|
| 2024-03-03 | aaa_001 | bbb_001 |
In this case, canonical_id merges into the leader of Person 1, while persistent_id merges into the leader of Person 2.
There are two scenarios where persistent_id can change:
- When past records are added or deleted: Changes to past records may affect the leader selection.
- When
timeis deprioritized inmerge_by_keys:: The key priority can be explicitly set to overridetime.
For instance:
persistent_ids:
- name: pid
merge_by_keys: [td_client_id, time, td_global_id]This configuration prioritizes td_client_id over time, potentially leading to changes in the persistent_id value when higher-priority keys are introduced later.