Skip to content

Commit

Permalink
update Entity Integrity
Browse files Browse the repository at this point in the history
  • Loading branch information
dimitri-yatsenko committed Sep 4, 2024
1 parent 3d03974 commit caccaf5
Showing 1 changed file with 62 additions and 105 deletions.
167 changes: 62 additions & 105 deletions book/30-schema-design/040-entity-integrity.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,118 +4,75 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Entity Integrity \n",
"# Entity Integrity\n",
"\n",
"Entity integrity is the guarantee given by the data management process of 1-to-1 correspondence between entities in the real world and their digital representations. The database cannot do it on its own.\n",
"Databases are designed to support real-world processes, and to do so effectively, they must accurately represent real-world entities and relationships. One of the most critical requirements for reliable database operation is **entity integrity**.\n",
"\n",
"To enforce this, databases enforce the uniquness of the primary key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import datajoint as dj\n",
"schema = dj.schema('dimitri_university')"
"Entity integrity ensures a 1:1 correspondence between real-world entities and their digital representations in the database.\n",
"In other words, each real-world object or entity must be represented by exactly one unique record in the database, and each record must correspond to a single, distinct real-world entity.\n",
"\n",
"What difficulties can you predict in cases when entity integrity breaks down in the businesses that you interact with on a regular basis?\n",
"\n",
"For example, what would happen if your university or your dentist office had two different identifies for you in their records? What would happen if they occasionally updated your records with another person's information?\n",
"\n",
"# The Foundation of Data Integrity\n",
"\n",
"Entity integrity is the foundation for other aspects of data integrity, such as referential integrity, which relies on the consistent and accurate referencing of records across tables. Without entity integrity, it is impossible to maintain other forms of integrity within the database. For example, a foreign key relationship assumes that every referenced entity exists uniquely and correctly in the database—an assumption that can only hold true if entity integrity is enforced.\n",
"\n",
"# Challenges in Ensuring Entity Integrity\n",
"\n",
"The challenge of ensuring entity integrity lies in the fact that it cannot be fully solved by the database system alone. A reliable system for identifying objects in the real world must be established outside the database to ensure that each entity has a unique identifier that can be consistently used across all related data records.\n",
"\n",
"For example, the Social Security Number (SSN) system in the United States is a powerful demonstration of entity integrity. The federal government has established a process to ensure that each US resident is assigned a unique SSN and that no individual can possess more than one such number. This unique identifier allows for the accurate and consistent representation of individuals across various government databases, ensuring that each person is correctly identified.\n",
"\n",
"# Ensuring Entity Integrity in Practice\n",
"\n",
"To achieve entity integrity, you must ensure that every entry in a table is uniquely and persistently connected to the real-world object it describes. This involves using unique identifiers, such as primary keys, that are both unique and non-repeating, effectively tying each database record to a specific real-world entity.\n",
"\n",
"\n",
"# Entity Integrity in Relational Schemas\n",
"\n",
"Ensuring entity integrity in a relational database is not just about assigning unique identifiers; it also requires careful consideration of the overall database design. Several key aspects of relational database design contribute to maintaining entity integrity:\n",
"\n",
"1. **Clear Table Definitions Reflecting Entity Types**\n",
"\n",
" Each table in a relational database should clearly indicate the type of real-world entity it represents. The table name plays a crucial role in conveying this information. For instance, if a table is named `Person`, the database must enforce entity integrity for individuals, ensuring each record corresponds to a unique person.\n",
"\n",
" However, if the table uses identifiers that do not ensure a 1:1 mapping to actual persons—such as cell phone numbers, which might be shared or changed—a more appropriate table name should be chosen, like `UserAccount`, to reflect the specific entity type being represented. This clarity helps avoid confusion and ensures that the database design accurately mirrors the real-world relationships it models.\n",
"\n",
"2. **Primary Keys and Unique Indexes**\n",
"\n",
" Every table must have a primary key and sometimes other unique indexes that enforce a 1:1 correspondence between records and the real-world entities they represent. The primary key is a unique identifier for each record in the table, ensuring no two records represent the same entity. This key is essential for maintaining entity integrity, as it guarantees that each entity is represented by exactly one record in the table.\n",
"\n",
" Additionally, secondary unique indexes can enforce uniqueness on other attributes that need to be unique across the table, such as email addresses, usernames, or social security numbers. These indexes ensure no duplicate entries for attributes intended to be unique, further supporting entity integrity.\n",
"\n",
"3. **Mandatory Primary Keys**\n",
"\n",
" Every table in a relational database must have a primary key, and the primary key attributes cannot be nullable. This requirement is crucial for maintaining entity integrity, as it prevents the creation of records that cannot be uniquely identified.\n",
"\n",
" By ensuring that primary keys are always present and unique, the database guarantees that each record is tied to a specific real-world entity, upholding data integrity. Without a primary key, it would be impossible to ensure that each record represents a distinct entity, leading to potential duplication and inconsistency in the database.\n",
"\n",
"# Practical Examples of Ensuring Entity Integrity\n",
"\n",
"To illustrate the importance of entity integrity, consider the following examples and how you would ensure it for different tables:\n",
"\n",
"- **Students at a University:** Each student is uniquely identified by a student ID number assigned during enrollment, ensuring entity integrity by preventing duplicate student records.\n",
"- **Kids at a Daycare Center:** A unique enrollment number or child ID can be assigned upon registration, ensuring consistent and unique representation of each child.\n",
"- **Airline Passengers:** Passengers can be uniquely identified by a booking reference number or frequent flyer number, ensuring accurate linkage of travel details to individual records.\n",
"- **Gym Members:** A unique membership ID assigned at joining ensures that each member’s activities are accurately recorded without duplication.\n",
"- **Online Video Game Players:** Each player is assigned a unique gamer tag or player ID, ensuring consistent and unique tracking of player interactions and achievements.\n",
"- **Posts on Facebook:** Each post is associated with a unique post ID, linking the content to the specific user and ensuring that each post is uniquely identified.\n",
"- **Mortgage Loans:** Each mortgage loan is identified by a unique loan number or mortgage ID, ensuring that all related transactions and documents are consistently linked.\n",
"\n",
"# Conclusion\n",
"\n",
"Entity integrity is a critical aspect of relational database design that ensures the accurate and consistent representation of real-world entities within a database. By implementing clear table definitions, enforcing primary keys and unique indexes, and ensuring mandatory primary keys, you can create robust databases that faithfully mirror the real-world entities and relationships they are intended to manage. This foundation of entity integrity supports reliable and accurate data management, enabling databases to function effectively in representing and maintaining the integrity of the data they store.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@schema\n",
"class Employee(dj.Manual):\n",
" definition = \"\"\"\n",
" employee_id : int unsigned\n",
" ---\n",
" first_name : varchar(30)\n",
" last_name : varchar(30)\n",
" date_of_birth : date \n",
" address : varchar(60)\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dj.Diagram(schema)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Person()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Person.insert1((1, 'Carol', 'Fair', '2020-07-01', 'Far Away'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Person()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Person().make_sql()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sqlite3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -134,7 +91,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.10.14"
}
},
"nbformat": 4,
Expand Down

0 comments on commit caccaf5

Please sign in to comment.