Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update entity.md #231

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/what/entity.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,20 +140,20 @@ further denormalize the `Winner` table.

# How to delete an entity?

We purposely made all [metadata aspects](aspect.md) immutable, i.e. each edit results in a new version created with no
We intentionally made all [metadata aspects](aspect.md) immutable, i.e. each edit results in a new version created with no
easy way to remove a specific version. However, since the existance of an entity is determined by the existance of its
associated metadata aspects, it seems that there's no easy way to delete an entity. In fact, this is echoed by the fact
that [GMS](gms.md) doesn't actually provide any `DELETE` API!

The main reason for choosing this append-only design is that a lot of metadata is valuable and irrecoverable once lost,
e.g. information curated by human or a lineage produced by a one-off pipeline. Audit trial is also extremely imporatnt
e.g. information curated by human or a lineage produced by a one-off pipeline. Audit trail is also extremely important
when it comes to sensitive metadata such as privacy settings, access control etc. We really don't want to wipe out the
metadata aspects thinking that the entity is no longer needed—to then regret the decision a year later.

Having said that, cluterring your catalog or graph with deleted entities is also undesirable and can lead to a lot of
confusion. To strike a balance, we decided to introduce a special
[`Status`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/common/Status.pdl)
aspect to indicate if the entity is deleted or not. All aspects of an entity can now live forever, while the entity
itself can be "soft deleted" by flipping a flag in the `Status` aspect. The flag is then repsected by the search index &
itself can be "soft deleted" by flipping a flag in the `Status` aspect. The flag is then respected by the search index &
graph builders when populating the indicies. To keep the storage space in check, one can even implement a garbage
collector, which reguarly clears out aspects of entities that have been soft-deleted for a long time.