Archive
Using DML with XML – UPDATE & DELETE (SQL Server)
In my previous post we saw how we can INSERT records in a XML string. On this post we”ll see the UPDATE & DELETE operations by using same xml.modify() method.
DECLARE @xml XML -- Taking the sample XML string we created in my previous post: SET @xml=N'<Root> <stud id="0"> <fname>Kanchan</fname> <lname>Pandey</lname> <class>10</class> <marks>90.5</marks> </stud> <stud id="1"> <fname>Manoj</fname> <lname>Pandey</lname> <class>10</class> <marks>80.5</marks> </stud> <stud id="2"> <fname>Saurabh</fname> <lname>Sharma</lname> <class>11</class> <marks>82.7</marks> </stud> <stud id="3"> <fname>Rajesh</fname> <lname>Shah</lname> <class>11</class> <marks>70.3</marks> </stud> </Root>'
> UPDATE marks of Manoj to 78.2 (currently 80.5):
SET @xml.modify(' replace value of (/Root/stud[@id="1"]/marks/text())[1] with "78.2"') SELECT @xml
> UPDATE marks of Rajesh to 78.7 stored in a variable:
DECLARE @marks FLOAT SET @marks = 78.7 SET @xml.modify(' replace value of (/Root/stud[@id="3"]/marks/text())[1] with sql:variable("@marks")') SELECT @xml
> DELETE Saurabh record where id = 2:
SET @xml.modify(' delete (/Root/stud[@id="2"])') SELECT @xml
Final XML:
Save XML in a file | Read XML from a file (in SQL Server)
USE [AdventureWorks] GO -- Save XML records to a file: DECLARE @fileName VARCHAR(50) DECLARE @sqlStr VARCHAR(1000) DECLARE @sqlCmd VARCHAR(1000) SET @fileName = 'E:\SQL_Queries\test.xml' SET @sqlStr = 'select TOP 50 [ContactID], [Title], [FirstName], [MiddleName], [LastName], [Suffix], [EmailAddress], [Phone] from AdventureWorks.Person.Contact FOR XML PATH(''Contact''), ROOT(''Person.Contact'')' SET @sqlCmd = 'bcp "' + @sqlStr + '" queryout ' + @fileName + ' -w -T' EXEC xp_cmdshell @sqlCmd
Progress:-
NULL
Starting copy...
NULL
6 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 15 Average : (400.00 rows per sec.)
NULL
Check the file in the directory set above in @filename variable.
Now we will see how can we read the XML file saved above. Will store the XML in into a SQL Server XML variable
and query it with SELECT statement:
DECLARE @xmlStr XML -- Storing XML records into a XML variable: SELECT @xmlStr = BulkColumn FROM OPENROWSET(BULK 'E:\SQL_Queries\test.xml', SINGLE_CLOB) AS X
The above statement gave me ERROR:
SINGLE_CLOB requires a double-byte character set (DBCS) (char) input file. The file specified is Unicode.
Because the XML BCP’d to the file above was saved in unicode format. So use SINGLE_NCLOB instead of SINGLE_CLOB.
SELECT @xmlStr = BulkColumn FROM OPENROWSET(BULK 'E:\SQL_Queries\test.xml', SINGLE_NCLOB) AS X
This again gave me ERROR:
XML parsing: line 1, character 2034, illegal qualified name character
Because the file saved in xml file has CrLf (Carriage Return & Line Feed) in between.
So I removed the CrLf from the XML string as shown below.
SELECT @xmlStr = REPLACE(REPLACE(BulkColumn,CHAR(10),''),CHAR(13),'') FROM OPENROWSET(BULK 'E:\SQL_Queries\test.xml', SINGLE_NCLOB) AS X -- Querying the XML stored in @xmlStr variable with SELECT statement: SELECT T.C.value('ContactID[1]', 'INT') as [ContactID], T.C.value('Title[1]', 'VARCHAR(8)') as [Title], T.C.value('FirstName[1]', 'VARCHAR(50)') as [FirstName], T.C.value('MiddleName[1]', 'VARCHAR(50)') as [MiddleName], T.C.value('LastName[1]', 'VARCHAR(50)') as [LastName], T.C.value('Suffix[1]', 'VARCHAR(10)') as [Suffix], T.C.value('EmailAddress[1]', 'VARCHAR(50)') as [EmailAddress], T.C.value('Phone[1]', 'VARCHAR(50)') as [Phone] FROM @xmlStr.nodes('//Person.Contact/Contact') as T(C)
Using DMLs with XMLs – INSERT (SQL Server)
In my previous XML related posts I wrote about how can we query and SELECT XML strings & pull data from parent-child nodes and hierarchies. In this post we’ll see how we can modify that XML data by applying DMLs to it.
In this part let’s see how we can INSERT new records in a XML string:
DECLARE @xml XML -- Creating sample XML string with single row: SET @xml=N'<Root> <stud id="1"> <fname>Manoj</fname> <lname>Pandey</lname> <class>10</class> <marks>80.5</marks> </stud> </Root>'; -- Now check the sample row by querying it with SELECT statement: SELECT Tab.Col.value('@id','int') AS ID, Tab.Col.value('fname[1]','nvarchar(20)') AS FirstName, Tab.Col.value('lname[1]','nvarchar(20)') AS LastName, Tab.Col.value('class[1]','int') AS class, Tab.Col.value('marks[1]','float') AS Marks FROM @xml.nodes('/Root/stud') Tab(Col)
Output:-
ID FirstName LastName class Marks
1 Manoj Pandey 10 80.5
DMLs in XML provides a common function xml.modify() to modify the XML string in 3 ways, i.e. INSERT, UPDATE & DELETE. Here we will see how we can INSERT new records at different positions in the sample XML created above:
--// Now INSERTing some more records: -- 1. INSERT (basic) SET @xml.modify(' insert <stud id="2"> <fname>Saurabh</fname> <lname>Sharma</lname> <class>11</class> <marks>82.7</marks> </stud> into (/Root)[1]'); -- 2. INSERT at beginning, first record SET @xml.modify(' insert <stud id="0"> <fname>Kanchan</fname> <lname>Pandey</lname> <class>10</class> <marks>90.5</marks> </stud> as first into (/Root)[1]'); -- 3. INSERT at end, last record SET @xml.modify(' insert <stud id="3"> <fname>Rajesh</fname> <lname>Shah</lname> <class>11</class> <marks>70.3</marks> </stud> as last into (/Root)[1]'); -- Let's check the records just inserted: SELECT Tab.Col.value('@id','int') AS ID, Tab.Col.value('fname[1]','nvarchar(20)') AS FirstName, Tab.Col.value('lname[1]','nvarchar(20)') AS LastName, Tab.Col.value('class[1]','int') AS class, Tab.Col.value('marks[1]','float') AS Marks FROM @xml.nodes('/Root/stud') Tab(Col)
Output:-
ID FirstName LastName class Marks
0 Kanchan Pandey 10 90.5
1 Manoj Pandey 10 80.5
2 Saurabh Sharma 11 82.7
3 Rajesh Shah 11 70.3
-- Check final XML with INSERTed records above: SELECT @xml
In my next XML-DML post we will see UPDATE and DELETE operations.
SQL Trivia – Identify & Delete Duplicate records from a table
I see a lot of questions/posts about dealing with duplicate records in many SQL Server forums. Many of these questions are asked in SQL Server Interviews and many developers starting their carrier in database programming find it challenging to deal with. Here we will see how we can deal with such records.
Duplicate data is one of the biggest pain points in our IT industry, which various projects have to deal with. Whatever state-of-art technology and best practices followed, the big ERP, CRM, SCM and other inventory based database management projects ends up in having duplicate & redundant data. Duplicate data keeps on increasing by manual entries and automated data loads. Various data leads getting pumped into system’s databases without proper deduping & data cleansing leads to redundant data and thus duplicated record-sets.
Data cleansing requires regular exercise of identifying duplicates, validating and removing them. To minimize these type of scenarios various checks and filters should also be applied before loading new leads into the system.
–> Lets check this by a simple exercise how we can identify & remove duplicate data from a table:
1. Insert some sample records from Person.Contact table of [AdventureWorks] database:
USE [AdventureWorks] GO SELECT TOP 10 ContactID, FirstName, LastName, EmailAddress, Phone INTO DupContacts FROM Person.Contact SELECT * FROM DupContacts
2. Insert some duplicate records from the same list inserted above:
INSERT INTO DupContacts SELECT TOP 50 PERCENT FirstName, LastName, EmailAddress, Phone from DupContacts SELECT * FROM DupContacts
3. Insert some more duplicate records from the same list inserted above.
INSERT INTO DupContacts SELECT TOP 20 PERCENT FirstName, LastName, EmailAddress, Phone from DupContacts SELECT * FROM DupContacts
–> Identify Duplicate records & delete them
Method #1: by using ROW_NUMBER() function:
;WITH dup as ( SELECT ContactID, FirstName, LastName, EmailAddress, Phone, ROW_NUMBER() OVER(PARTITION BY FirstName, LastName ORDER BY ContactID) AS NumOfDups FROM DupContacts) SELECT * FROM dup WHERE NumOfDups > 1 ORDER BY ContactID
-- Remove/Delete duplicate records: ;WITH dup as ( SELECT ContactID, FirstName, LastName, EmailAddress, Phone, ROW_NUMBER() OVER(PARTITION BY FirstName, LastName ORDER BY ContactID) AS NumOfDups FROM DupContacts) DELETE FROM dup WHERE NumOfDups > 1 SELECT * FROM DupContacts
Method #2: by using SELF-JOIN:
SELECT DISTINCT a.ContactID, a.FirstName, a.LastName, a.EmailAddress, a.Phone FROM DupContacts a JOIN DupContacts b ON a.FirstName = b.FirstName AND a.LastName = b.LastName AND a.ContactID > b.ContactID
-- Remove/Delete duplicate records: DELETE a FROM DupContacts a JOIN DupContacts b ON a.FirstName = b.FirstName AND a.LastName = b.LastName AND a.ContactID > b.ContactID SELECT * FROM DupContacts
Method #3: by using AGGREGATES & Sub-QUERY:
SELECT * FROM DupContacts WHERE ContactID NOT IN (SELECT MIN(ContactID) FROM DupContacts GROUP BY FirstName, LastName)
-- Remove/Delete duplicate records: DELETE FROM DupContacts WHERE ContactID NOT IN (SELECT MIN(ContactID) FROM DupContacts GROUP BY FirstName, LastName) SELECT * FROM DupContacts
–> Final Cleanup
DROP TABLE DupContacts
Check the same demo here:
What are Records, Pages, Extents, Heap in SQL Server?
I found this very interesting series by Paul Randal on some basic terms of SQL Server Storage Engine at MSDN SQL Server blog. This is a 4 part series and provides information on records, pages, extents and heap structures.
I’ve seen many people asking questions related to these topics on MSDN & other SQL Server forums. And there are very few tech articles on internet and MS BOL regarding these. Paul has very breifly and in layman terms explained these topics.
I’d like to share the following topics for my future reference and for my blog viewers:
Part 1. What are Rows/Records?
Part 2. What are Pages?
Part 3. What are Extents?
Part 4. What is a Heap?
–> PAGE: A page in SQL Server is an 8 KB data storage area. There are 8 types of pages:
1. Data page
2. Index page
3. IAM (Index Allocation Map) page
4. GAM (Global Allocation Map) and SGAM (Shared Global Allocation Map) page
5. PFS (Page Free Space) page
6. Text/Image page
7. Sort page
8. Differential Bitmap page
9. Bulk-Changed Map page
10. Boot page
11. FileHeader page