About persistent_id
As mentioned in About canonical_id
, canonical_id
is not guaranteed to be immutable. In some cases, a different canonical_id
may be assigned to a user who was previously identified as the same person.
To address this issue, a different mechanism has been introduced to maintain an persistency, robust value similar to canonical_id
. This is called persistent_id
. This page introduces the characteristics of persistent_id
and explains how to configure it.
Note
If persistent_id
is used instead of canonical_id
, users do not need to set merge_by_keys:
to obtain an robust ID.
Warning
persistent_id
is not supported by do_not_merge_key
.
Mechanism for persistent_id
to Retain a Robust Value
The mechanism for generating values for canonical_id
and persistent_id
is illustrated using the following examples:
Day | td_client_id |
td_global_id |
---|---|---|
2024-03-01 | aaa_002 | 3rd_001 |
2024-03-02 | aaa_001 | 3rd_001 |
Both canonical_id
and persistent_id
can be configured as follows:
canonical_ids:
- name: cid
merge_by_keys: [td_client_id, td_global_id]
persistent_ids:
- name: pid
merge_by_keys: [td_client_id, td_global_id]
How canonical_id
is Generated
Under this configuration, the smallest value in td_client_id
(in this case, aaa_001
) is selected as the leader, and the canonical_id
is generated based on this value.
How persistent_id
is Generated
For persistent_id
, the smallest value in td_client_id
in terms of time is chosen as the leader. In this example, aaa_002
is the leader, and the persistent_id
is generated based on this value.
Example 1: Daily Transition of Key Values Selected as Leader
To understand how persistent_id
is generated, let’s examine the transition of the leader as new data is added each day.
Day 1
Day | td_client_id |
td_global_id |
---|---|---|
2024-03-01 | aaa_002 | 3rd_001 |
The leaders for each ID on Day 1 are as follows:
Day | Leader (canonical_id ) |
Leader (persistent_id ) |
---|---|---|
2024-03-01 | aaa_002 | aaa_002 |
Day 2
Day | td_client_id |
td_global_id |
---|---|---|
2024-03-01 | aaa_002 | 3rd_001 |
2024-03-02 | aaa_001 | 3rd_001 |
On Day 2, the leaders are as follows:
Day | Leader (canonical_id ) |
Leader (persistent_id ) |
---|---|---|
2024-03-01 | aaa_002 | aaa_002 |
2024-03-02 | aaa_001 | aaa_002 |
The leader for canonical_id
changes from the previous day, resulting in a new canonical_id
value. However, the leader for persistent_id
remains unchanged, so the same persistent_id
value is retained.
This mechanism ensures that persistent_id
remains persistent by always selecting the smallest value based on time, ensuring the leader does not change regardless of new key values.
Example 2: Behavior When Two Individuals Are Linked
When two individuals are linked, persistent_id
ensures that the value of the earlier key remains as the leader. Consider the following example:
Day | site_aaa |
site_aaa |
site_bbb |
site_bbb |
---|---|---|---|---|
td_client_id |
td_global_id |
td_client_id |
td_global_id |
|
2024-03-01 | bbb_001 | 3rd_001 | ||
2024-03-02 | aaa_001 | 3rd_002 | ||
2024-03-03 | aaa_001 | 3rd_001 |
Day 2
Person 1
Day | Leader (canonical_id ) |
Leader (persistent_id ) |
---|---|---|
2024-03-02 | bbb_001 | bbb_001 |
Person 2
Day | Leader (canonical_id ) |
Leader (persistent_id ) |
---|---|---|
2024-03-02 | aaa_001 | aaa_001 |
Day 3
When Person 1 and Person 2 are linked on Day 3:
Day | Leader (canonical_id ) |
Leader (persistent_id ) |
---|---|---|
2024-03-03 | aaa_001 | bbb_001 |
In this case, canonical_id
merges into the leader of Person 1, while persistent_id
merges into the leader of Person 2.
Cases Where persistent_id
May Change
There are two scenarios where persistent_id
can change:
- When past records are added or deleted: Changes to past records may affect the leader selection.
-
When
time
is deprioritized inmerge_by_keys:
: The key priority can be explicitly set to overridetime
.
For instance:
persistent_ids:
- name: pid
merge_by_keys: [td_client_id, time, td_global_id]
This configuration prioritizes td_client_id
over time
, potentially leading to changes in the persistent_id
value when higher-priority keys are introduced later.