DB Concepts | SQL with Manoj

DB Basics – How to control Data Redundancy in a database system (RDBMS) by Normalization

April 12, 2016 Leave a comment

Redundancy in Database systems occurs with various insert, update, and delete anomalies.

To avoid these anomalies in first step you need to make sure your database tables or relations are in good normal forms or normalized upto a certain level.

Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free the anomalies discussed above. that could lead to a loss of data integrity.

Normal forms in a database or the concept of Normalization makes a Relation or Table free from insert/update/delete anomalies and saves space by removing duplicate data.

–> According to E. F. Codd the objectives of normalization were stated as follows:

1. To free the collection of relations from undesirable insertion, update and deletion dependencies.

2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs.

3. To make the relational model more informative to users.

4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

As of now there are total 8 normal forms, but to keep our data consistent & non-redundant the first 3 Normal Forms are sufficient.

–> Anomalies like: Let’s say you have a single table that stores Employee and Department details, thus:

1. Insert Anomaly: If you are inserting a detail of an Employee then his department detail will also be entered for every employee record, thus departments details will be repeated with multiple records, thus storing duplicate data for Departments.

2. Update Anomaly: While updating a department detail you have to update the same department for various employees, which may lead to inconsistent state if any record is left while updating or on any error.

3. Delete Anomaly: If a department is closed, then deleting department record will also delete the Employee records, thus missing records.

The process of normalization makes this EmployeeDepartment table to decompose or split into 2 or more tables and linked them by Foreign Keys, thus eliminating duplicate records, data redundancy and making data/records consistent across all relations/tables.

– 1st NF talks about atomic values and non-repeating groups.

– 2nd NF enforces that a non-Key attribute should belong to entire Key attribute.

– 3rd NF makes sure that there should be no transitive dependency between a non-Key and a Key attribute.

For details on these 3 NFs check my blog post on [Database Normalization].

Categories: DB Concepts Tags: Data Redundancy, Database Normalization, Normalization

DB Basics – As Primary Key can identify a row uniquely, what are the scenarios you need a Super Key?

September 17, 2015 Leave a comment

In one of my [previous post] I discussed about various types of Keys in DBMS, like Candidate, Primary, Composite & Super Keys.

I got few questions like:
1. If there is already a Primary Key to identify a row uniquely, then in which scenario a Super Key can use used?
2. How to create a Super Key like a Primary Key in SQL Server or Oracle?

–> First of all what is a Primary Key (PK): It is the Candidate key attribute/column that is most suited to maintain uniqueness in a table at the tuple/row level.

–> And a Super Key is a super-set of Candidate key. If you add any other attribute/column to a Candidate Key then it become a Super Key. Or a minimal Super Key is called a Candidate Key.

–> Now consider the same example that I discussed in my [previous post]:

CREATE TABLE Employee (
    EmployeeID, 
    EmployeeName, 
    DOB, -- Date of birth
    DOJ, -- Date of joining
    SSN, -- Social Security Number
    DeptID, -- Department ID
    MgrID -- Manager ID
)

Now in this table there are 2 Candidate Keys i.e. EmployeeID & SSN.
– EmployeeID: should be unique with auto-increment column.
– SSN: should also be unique for every employee.

So, here you cannot make SSN as a Primary Key, because its a PII and secure data.
Hence, creating EmployeeID as Primary Key makes sense, this way SSN will be left as an Alternate Key.

–> Now, coming to the Super Key concept: As per the above definition, a minimal Super Key will be called as a Candidate or Primary Key. Thus every Primary Key and Alternate Key is also a Super Key. And if you combine any other column with the EmployeeID (PK) column, the combination will also be called as a Super Key, like:
– EmployeeID
– EmployeeID + DOB
– EmployeeID + DOJ
– EmployeeID + DeptID + MgrID
– EmployeeID + DOB + DOJ + DeptID + MgrID
– EmployeeID + [other column combinations]

For all the above and other possible combinations the row will still be identified uniquely, thus all these combinations will act like a Super Key.

And, if you add all columns with the EmployeeID, it is called a Trivial Super Key, because it uses all columns of Employee table to identify a row.

And in SQL Server or Oracle there is no syntax as such to create a Super Key, this is just a concept in DBMS.

Hope this helps !!!

Categories: DB Concepts Tags: PRIMARY KEY, Super Key

DB Basics – What are ACID properties of a Transaction in an RDBMS?

September 28, 2014 3 comments

In order to perform a Transaction in a database system and to make sure it works without any issues, there are few rules a Database Transaction should follow. These rules are the standards across all Relational Database systems (RDBMS) and are called ACID rules.

ACID stands for Atomicity, Consistency, Isolation and Durability. So let’s check what all these Rules states.

–> A: Atomicity states that every Transaction should be atomic in nature. A Transaction in a Relational Database can contain either a single SQL statement or multiple SQL statements. Thus by Atomic Transaction it means “all or none”. Either all SQL statements/steps execute successfully in a transaction, or fail as a single unit and none of them should be treated as executed and the system should be returned to its original state.

For example: If account-A & account-B both having $2000 balance, you have to transfer $1000 from account-A to account-B, this will involves 2 steps. First withdrawal from account-A, and Second deposit in account-B. Thus, both the steps should be treated as single or atomic unit and at the end account-A should have $1000 & account-B should have $3000 balance. If in case after First step the system fails or any error occurs then first step should also be rolled-back and $1000 withdrawn from account-A should be re-deposited to it, maintaining $2000 back in both the accounts. Thus there should be no intermediate state where account-A has $1000 and account-B still has $2000 balance.

–> C: Consistency states that any Transaction happened in a database will take it from one consistent state to another consistent state. The data finally recorded in the database must be valid according to the defined Rules, Constraints, Cascades, Triggers, etc. If in case of any failure to these rules the changes made by any transaction should be rolled-back, this will put the system in earlier consistent state.

For example: If the money deposit process has any Trigger built on top of it. And at the time of money transfer any of the Trigger fails or any database node, the system should automatically Rollback the complete transaction and switch back the system to its previous consistent state before the transaction was started. Or if everything executes successfully then the system is committed to a new consistent state.

–> I: Isolation means Transactions performing same functions should run in Isolation and not in parallel to provide more concurrency to the data and avoiding dirty reads & writes. One need to use proper Transaction Isolation levels and locking in order to prevent this.

For example: If two people accessing a joint-account with $5000 balance from 2 terminals to withdraw money. Let’s say at same time John & Marry apply to withdraw $4000 from two different ATMs. If both the Transactions do not run in Isolation and run in parallel then both John & Marry will be able to withdraw $4000 each i.e. $8000 total from their account. To make sure this won’t happen Transactions should be not allowed to run in parallel, by setting Transaction Isolations and/or locking methods on the database objects.

–> D: Durability, a transaction should be durable by storing the data permanently and making it available in case of power failure, recovery from system failure, crash, any error, etc. All in all, the data should not get lost in any of the miss-happenings and one should be able to recover data from restore, logging and other methods.

Categories: DB Concepts Tags: ACID Database, ACID Properties, ACID Transaction, Atomicity, Consistency, DURABILITY, Isolation, RDBMS, Relational Database, Transaction

DB Basics – What are Candidate, Primary, Composite & Super Keys and Difference between them?

September 15, 2014 7 comments

Let’s take an example of an Employee table:

CREATE TABLE Employee (
    EmployeeID, 
    EmployeeName, 
    DOB, -- Date of birth
    DOJ, -- Date of joining
    SSN, -- Social Security Number
    DeptID, -- Department ID
    MgrID -- Manager ID
)

1. Candidate Key: is the attribute/column or a set of attributes/columns in a relation/table that qualifies for uniqueness of each tuple/row. A relation/table can have one or more than one Candidate Keys. A Candidate key is also known as a minimal Super key.

Here in Employee table columns EmployeeID & SSN individually can maintain uniqueness in a table, thus are eligible for Candidate keys. The columns EmployeeName + DOB combined can also make up a Candidate Key, but there is a narrow chance that 2 Employees with same name can be born in same day.

2. Primary Key: is the Candidate key attribute/column that is most suited to maintain uniqueness in a table at the tuple/row level. More about PK.

Here in Employee table you can choose either EmployeeID or SSN column for a PK, EmployeeID is preferable choice because SSN is a secure (PII) value.

3. Alternate Key: are the other Candidate key attribute/columns that you didn’t choose as Primary key column.

Like if you choose EmployeeID as a PK then SSN would be the Alternate key.

4. Super Key: is a superset of Candidate key. If you add any other attribute/column to a Candidate Key then it become a Super Key.

Like EmployeeID + EmployeeName is a Super Key.

5. Composite Key: If a table do have a single column that qualifies for a Candidate key, then you have to select 2 or more columns to make a row unique.

Like if there is no EmployeeID or SSN columns in Employee table, then you can make EmployeeName + DOB as a Composite Primary Key. But still there can be a narrow chance of duplicate rows.

–> Check the video:

Categories: DB Concepts Tags: Alternate Key, Candidate Key, Composite Key, PRIMARY KEY, Super Key

Using IDENTITY function with SELECT statement in SQL Server

December 15, 2011 6 comments

In MS BOL for IDENTITY columns it is mentioned that [link]: “It creates an identity column in a table. This property is used with the CREATE TABLE and ALTER TABLE Transact-SQL statements.”

But experimenting with IDENTITY function I learnt this new thing with is not mentioned (or I could not find) anywhere in MS BOL that: “IDENTITY function can also be used with a SELECT statement but with some condition.”

Let’s check what’s that condition is:

USE [AdventureWorks]
GO

-- Let's use Person.Contact table in our SELECT statement and add a new column as IDENTITY function to it:
SELECT 
	IDENTITY (INT, 100, 5) AS NEW_ID, 
	ContactID, Title, 
	FirstName, 
	MiddleName, 
	LastName, 
	EmailAddress, 
	Phone
FROM Person.Contact

Error Msg thrown:
Msg 177, Level 15, State 1, Line 1
The IDENTITY function can only be used when the SELECT statement has an INTO clause.

As per the error msg we cannot add an IDENTITY column to a SELECT query. The SELECT should be followed by an INTO clause. This way a new table will be created and records will be entered with the new IDENTITY column.

-- Let's add INTO clause in SELECT statement:
SELECT 
	IDENTITY (INT, 100, 5) AS NEW_ID, 
	ContactID, 
	Title, 
	FirstName, 
	MiddleName, 
	LastName, 
	EmailAddress, 
	Phone
INTO #tempTable
FROM Person.Contact

Again an Error Msg:
Msg 8108, Level 16, State 1, Line 1
Cannot add identity column, using the SELECT INTO statement, to table ‘#tempTable’, which already has column ‘ContactID’ that inherits the identity property.

Completely agree with this!!! You cannot have 2 IDENTITY columns in a single table.

-- So we will remove the ContactID column as we want to have new ID column.
SELECT 
	IDENTITY (INT, 100, 5) AS NEW_ID, 
	Title, 
	FirstName, 
	MiddleName, 
	LastName, 
	EmailAddress, 
	Phone
INTO #tempTable
FROM Person.Contact
-- This worked perfectly.

-- Now let's check the newly created temp table and inserted records:
select * from #tempTable

-- Output below shows selected records from Person.Contact table with a new column having values starting from 100 with a gap of 5: