About persistent_id

As mentioned in About canonical_id, canonical_id is not guaranteed to be immutable. In some cases, a different canonical_id may be assigned to a user who was previously identified as the same person.

To address this issue, a different mechanism has been introduced to maintain an persistency, robust value similar to canonical_id. This is called persistent_id. This page introduces the characteristics of persistent_id and explains how to configure it.

Note

If persistent_id is used instead of canonical_id, users do not need to set merge_by_keys: to obtain an robust ID.

Warning

persistent_id is not supported by do_not_merge_key.

Mechanism for persistent_id to Retain a Robust Value

The mechanism for generating values for canonical_id and persistent_id is illustrated using the following examples:

Day td_client_id td_global_id
2024-03-01 aaa_002 3rd_001
2024-03-02 aaa_001 3rd_001

Both canonical_id and persistent_id can be configured as follows:

Copy
Copied
canonical_ids:
  - name: cid
    merge_by_keys: [td_client_id, td_global_id]

persistent_ids:
  - name: pid
    merge_by_keys: [td_client_id, td_global_id]

How canonical_id is Generated

Under this configuration, the smallest value in td_client_id (in this case, aaa_001) is selected as the leader, and the canonical_id is generated based on this value.

How persistent_id is Generated

For persistent_id, the smallest value in td_client_id in terms of time is chosen as the leader. In this example, aaa_002 is the leader, and the persistent_id is generated based on this value.

Example 1: Daily Transition of Key Values Selected as Leader

To understand how persistent_id is generated, let’s examine the transition of the leader as new data is added each day.

Day 1

Day td_client_id td_global_id
2024-03-01 aaa_002 3rd_001

The leaders for each ID on Day 1 are as follows:

Day Leader (canonical_id) Leader (persistent_id)
2024-03-01 aaa_002 aaa_002

Day 2

Day td_client_id td_global_id
2024-03-01 aaa_002 3rd_001
2024-03-02 aaa_001 3rd_001

On Day 2, the leaders are as follows:

Day Leader (canonical_id) Leader (persistent_id)
2024-03-01 aaa_002 aaa_002
2024-03-02 aaa_001 aaa_002

The leader for canonical_id changes from the previous day, resulting in a new canonical_id value. However, the leader for persistent_id remains unchanged, so the same persistent_id value is retained.

This mechanism ensures that persistent_id remains persistent by always selecting the smallest value based on time, ensuring the leader does not change regardless of new key values.

Example 2: Behavior When Two Individuals Are Linked

When two individuals are linked, persistent_id ensures that the value of the earlier key remains as the leader. Consider the following example:

Day site_aaa site_aaa site_bbb site_bbb
td_client_id td_global_id td_client_id td_global_id
2024-03-01 bbb_001 3rd_001
2024-03-02 aaa_001 3rd_002
2024-03-03 aaa_001 3rd_001

Day 2

Person 1

Day Leader (canonical_id) Leader (persistent_id)
2024-03-02 bbb_001 bbb_001

Person 2

Day Leader (canonical_id) Leader (persistent_id)
2024-03-02 aaa_001 aaa_001

Day 3

When Person 1 and Person 2 are linked on Day 3:

Day Leader (canonical_id) Leader (persistent_id)
2024-03-03 aaa_001 bbb_001

In this case, canonical_id merges into the leader of Person 1, while persistent_id merges into the leader of Person 2.

Cases Where persistent_id May Change

There are two scenarios where persistent_id can change:

  1. When past records are added or deleted: Changes to past records may affect the leader selection.
  2. When time is deprioritized in merge_by_keys:: The key priority can be explicitly set to override time .

For instance:

Copy
Copied
persistent_ids:
  - name: pid
    merge_by_keys: [td_client_id, time, td_global_id]

This configuration prioritizes td_client_id over time, potentially leading to changes in the persistent_id value when higher-priority keys are introduced later.