Watch & Subscribe my SQL videos on YouTube | Join me on Facebook

Excel data validation with VBA macros

February 4, 2011 3 comments

You have an important lead set or any other orders/financial recordset in excel and have to import the data in SQL Server. This is not a one time effort but ongoing and in future you have to deals with lot such records/data files. To import valid data in your SQL tables you want to ensure and validate at the frst step instead of correcting the data by firing SQL queries. Also to avoid this manual work to identify the incorrect/invalid data out of thousands of records automated approach would be quite helpful.

Excel not only store data but also have a power to run VBA macro code against and manipulate the data. Lets take a simple example to validate a set of Customer numbers & their insert date, shown below:

-- Test Data which is correct and expected to import in SQL table.
InsertDate	CustomerNo
11/16/2009	91878552
11/16/2009	101899768
11/16/2009	101768884
11/16/2009	123456789123
11/16/2009	101768800

Columns InsertDate should be a valid date & CustomerNo should be numeric & less than 12 characters or max length equal to 12.

-- Lets tweak the test data put some invalid values under both the columns, shown below:
InsertDate	CustomerNo
11/16/2009	91878552
51/96/2009	101899768
11/16/2009	1017abc84
11/16/2009	123456789123
11/16/2009	101768800

If you are on MS Office 2007 create a Macro enabled excel file with extension *.xlsm, OR if you are on 2003 and lower version *.xls file would be fine. Copy the above data in Sheet1 with incorrect values. Now press open up the VBA macro editor or press ALT+F11 keys. Now double click on Sheet1 under “Project -VBAProjects” explorer.
I wish to run the validation while someone tries to save the macro enabled excel file. So the event selected is “BeforeSave”. And the VBA macro code goes below:

Private Sub Workbook_BeforeSave(ByVal SaveAsUI As Boolean, Cancel As Boolean)

	Dim errorCode As Integer
	Dim errorMsg As String

	Dim numOfRecs As Integer
	Dim counter As Integer

	'---------------------
	'Start - Validate Date
	'---------------------
	Sheet1.Activate
	Range("A2").Select

	errorCode = 0
	numOfRecs = 0

	Do Until ActiveCell.Value = vbNullString

		' Validate Date
		If IsDate(ActiveCell.Value) <> True Then
			errorCode = 1
			ActiveCell.Interior.ColorIndex = 3 'Red
		Else
			ActiveCell.Interior.ColorIndex = 2 'White
		End If

		numOfRecs = numOfRecs + 1

		ActiveCell.Offset(1, 0).Select

	Loop

	If errorCode = 1 Then
		errorMsg = errorMsg + vbCrLf & "- Invalid Insert Date"
	End If
	'-------------------
	'End - Validate Date
	'-------------------

	'--------------------------
	'Start - Validate Customer#
	'--------------------------
	Sheet1.Activate
	Range("B2").Select

	errorCode = 0

	Do Until ActiveCell.Value = vbNullString

		' Check for a valid Customer Number
		If IsNumeric(ActiveCell.Value) <> True Or Len(ActiveCell.Value) > 12 Then
			errorCode = 1
			ActiveCell.Interior.ColorIndex = 3 'Red
		Else
			ActiveCell.Interior.ColorIndex = 2 'White
		End If

		ActiveCell.Offset(1, 0).Select

	Loop

	If errorCode = 1 Then
		errorMsg = errorMsg + vbCrLf & "- Invalid Customer #"
	End If
	'------------------------
	'End - Validate Customer#
	'------------------------

	' Go to first cell.
	Range("A1").Select

	'If errors then display the error msg and do not save the file otherwise save the excel file.
	If errorCode = 1 Then
		MsgBox "Workbook not saved. Following are the fields that contain invalid values." _
			& vbCrLf & errorMsg & vbCrLf & vbCrLf & _
			"Please correct the values highlighted in RED color.", vbCritical, "Data Validation ERROR"

		Cancel = True

	End If

End Sub

Invalid values highlighted RED

Now save the macro code and close the editor. Now try to Save the Excel file, you will get the cells with invalid values highlighted in RED, as shown in the image. The excel file not be saved and when closing you will get the dialog box to save it again & again.

In order to save it correctly you need to correct the invalid values, i.e. InsertDate & CustomerNo, and then save it. The RED highlighted cells will become WHITE for the correct values.

This excel file is now very much validated and clean to import in SQL Server or any other database.

 

Try this code and tweak it as per your needs.

I’ve also discussed this logic in MSDN’s TSQL forum at following link: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/09299d7d-9306-4ea4-bb29-87207572aa04

Suggestions & comments are welcome!!!

SELECT an XML string to a table – SQL Server

January 28, 2011 6 comments

A simple way to SELECT XML string in a tabular format by using Nodes and OPENXML().
 

–> By using FOR XML AUTO:

USE [AdventureWorks]
GO

--// Generate an XML result-set:
SELECT TOP 5 ContactID, FirstName, LastName
FROM Person.Contact
FOR XML AUTO
GO
Output as XML String:
<Person.Contact ContactID="1" FirstName="Gustavo" LastName="Achong"/>
<Person.Contact ContactID="2" FirstName="Catherine" LastName="Abel"/>
<Person.Contact ContactID="3" FirstName="Kim" LastName="Abercrombie"/>
<Person.Contact ContactID="4" FirstName="Humberto" LastName="Acevedo"/>
<Person.Contact ContactID="5" FirstName="Pilar" LastName="Ackerman"/>

 

–> SELECT by using XML Nodes:

DECLARE @xml XML
SET @xml = N'
<root><Person.Contact ContactID="1" FirstName="Gustavo" LastName="Achong"/>
<Person.Contact ContactID="2" FirstName="Catherine" LastName="Abel"/>
<Person.Contact ContactID="3" FirstName="Kim" LastName="Abercrombie"/>
<Person.Contact ContactID="4" FirstName="Humberto" LastName="Acevedo"/>
<Person.Contact ContactID="5" FirstName="Pilar" LastName="Ackerman"/></root>'

SELECT
Tab.Col.value('@ContactID','int') AS ContactID,
Tab.Col.value('@FirstName','varchar(20)') AS FirstName,
Tab.Col.value('@LastName','varchar(20)') AS LastName
FROM   @xml.nodes('/root/Person.Contact') Tab(Col)
GO
Output:
ContactID    FirstName    LastName
1            Gustavo      Achong
2            Catherine    Abel
3            Kim          Abercrombie
4            Humberto     Acevedo
5            Pilar        Ackerman

 

–> SELECT by using OPENXML:

DECLARE @xml XML
SET @xml = N'
<root><Person.Contact ContactID="1" FirstName="Gustavo" LastName="Achong"/>
<Person.Contact ContactID="2" FirstName="Catherine" LastName="Abel"/>
<Person.Contact ContactID="3" FirstName="Kim" LastName="Abercrombie"/>
<Person.Contact ContactID="4" FirstName="Humberto" LastName="Acevedo"/>
<Person.Contact ContactID="5" FirstName="Pilar" LastName="Ackerman"/></root>'

DECLARE @docHandle int
-- Create internal representation of the XML document and return the xml-doc Handle ID
EXEC sp_xml_preparedocument @docHandle OUTPUT, @xml

select @docHandle -- 37

SELECT *
FROM OPENXML(@docHandle, N'//Person.Contact')
WITH (ContactID INT, FirstName VARCHAR(20), LastName VARCHAR(20))
Output:
ContactID    FirstName    LastName
1            Gustavo      Achong
2            Catherine    Abel
3            Kim          Abercrombie
4            Humberto     Acevedo
5            Pilar        Ackerman
-- Remove xml-doc Handle
EXEC sp_xml_removedocument @docHandle
GO

 

>> Check & Subscribe my [YouTube videos] on SQL Server.
 


Using SQL_VARIANT datatype to store any datatype value in SQL Server

January 27, 2011 Leave a comment

According to MS BOL SQL_VARIANT can be used in columns, parameters, variables, and the return values of user-defined functions. sql_variant enables these database objects to support values of other data types.

It can have a maximum length of 8016 bytes including both the base-type information and the base-type value. Thus the maximum length of the actual base-type value is 8,000 bytes.

USE [tempdb]
GO

DECLARE @var SQL_VARIANT

-- Set variable as DATETIME type
SET @var =GETDATE()
SELECT @var as Data, SQL_VARIANT_PROPERTY(@var , 'BaseType') as BaseType
, SQL_VARIANT_PROPERTY(@var , 'Precision') as Precision
, SQL_VARIANT_PROPERTY(@var , 'Scale') as Scale
, SQL_VARIANT_PROPERTY(@var , 'TotalBytes') as TotalBytes
, SQL_VARIANT_PROPERTY(@var , 'MaxLength') as MaxLength
, SQL_VARIANT_PROPERTY(@var , 'Collation') as Collation;
Output:
Data            	BaseType Precision Scale TotalBytes MaxLength Collation
2011-01-27 17:13:53.150	datetime 23        3     10         8         NULL
-- Set variable as INT type
SET @var = 1234
SELECT @var as Data, SQL_VARIANT_PROPERTY(@var , 'BaseType') as BaseType
, SQL_VARIANT_PROPERTY(@var , 'Precision') as Precision
, SQL_VARIANT_PROPERTY(@var , 'Scale') as Scale
, SQL_VARIANT_PROPERTY(@var , 'TotalBytes') as TotalBytes
, SQL_VARIANT_PROPERTY(@var , 'MaxLength') as MaxLength
, SQL_VARIANT_PROPERTY(@var , 'Collation') as Collation;
Output:
Data	BaseType Precision Scale TotalBytes MaxLength Collation
1234	int      10        0     6          4         NULL
-- Set variable as VARCHAR type
SET @var = SYSTEM_USER
SELECT @var as Data, SQL_VARIANT_PROPERTY(@var , 'BaseType') as BaseType
, SQL_VARIANT_PROPERTY(@var , 'Precision') as Precision
, SQL_VARIANT_PROPERTY(@var , 'Scale') as Scale
, SQL_VARIANT_PROPERTY(@var , 'TotalBytes') as TotalBytes
, SQL_VARIANT_PROPERTY(@var , 'MaxLength') as MaxLength
, SQL_VARIANT_PROPERTY(@var , 'Collation') as Collation;
Output:
Data        BaseType Precision Scale TotalBytes MaxLength Collation
XXX\mpandey nvarchar 0         0     30         256       Latin1_General_CS_AS
-- Set variable as BIT type
SET @var = cast(1 as bit)
SELECT @var as Data, SQL_VARIANT_PROPERTY(@var , 'BaseType') as BaseType
, SQL_VARIANT_PROPERTY(@var , 'Precision') as Precision
, SQL_VARIANT_PROPERTY(@var , 'Scale') as Scale
, SQL_VARIANT_PROPERTY(@var , 'TotalBytes') as TotalBytes
, SQL_VARIANT_PROPERTY(@var , 'MaxLength') as MaxLength
, SQL_VARIANT_PROPERTY(@var , 'Collation') as Collation;
Output:
Data	BaseType Precision Scale TotalBytes MaxLength Collation
1	bit      1         0     3          1         NULL

More on SQL_VARIANT: http://msdn.microsoft.com/en-us/library/ms173829.aspx

Creating “Dynamic PIVOT” scripts in SQL Server

January 25, 2011 24 comments

My previous PIVOT UNPIVOT post describes to organize data in a cross tab fashion or to transpose columns into rows and vice-versa.

This is fine only when you know how many columns you need. But when columns vary according to the query then how could you apply them on your scripts. The only way is to store them in a string variable at runtime and apply them in a dynamic SQL query, shown below.

This problem was also discussed on MSDN’s following link: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/26f86fd6-6d06-4cc5-9723-9ee4685bd48a where I proposed an answer also described below:

USE [tempdb]
GO

-- Create test tables
create table table1 (number int, desc varchar(20),
location int, numberatlocation int)
create table table2 (code int, name varchar(20))

-- Insert test data
insert into table1 values (12345,'test',1000,5)
insert into table1 values (12345,'test',1001,2)
insert into table1 values (12345,'test',1002,4)
insert into table1 values (12345,'test',1003,9)
insert into table1 values (12345,'test',1004,7)

insert into table2 values (1000,'loc1')
insert into table2 values (1001,'loc2')
insert into table2 values (1002,'loc3')
insert into table2 values (1003,'loc4')
insert into table2 values (1004,'loc5')

-- Static PIVOT
select number, description, [loc1], [loc2], [loc3], [loc4], [loc5]
from (select number, desc, numberatlocation, name
from table1 join table2 on table1.location=table2.code)p
PIVOT(MAX (numberatlocation) FOR Name IN ( [loc1], [loc2], [loc3], [loc4], [loc5] )
) AS pvt
ORDER BY number
Output of Static query:
number	desc	loc1	loc2	loc3	loc4	loc5
12345	test    5       2       4       9       7
-- Dynamic PIVOT

-- Lets add one more record on both the tables to check the results
insert into table1 values (12345,'test',1005,3)
insert into table2 values (1005,'loc6')

declare @col varchar(1000)
declare @sql varchar(2000)

select @col = COALESCE(@col + ', ','') + QUOTENAME(name)
from table2

select @col -- This gives: [loc1], [loc2], [loc3], [loc4], [loc5], [loc6]

-- Now setting this @col variable in the Dynamic SQL.
set @sql = '
select number, desc, ' + @col + '
from (select number, desc, numberatlocation, name
from table1 join table2 on table1.location=table2.code)p
PIVOT(MAX (numberatlocation) FOR Name IN ( ' + @col + ' )
) AS pvt
ORDER BY number'

print @sql

exec (@sql)
Output of Dynamic query:
number	desc	loc1	loc2	loc3	loc4	loc5    loc6
12345	test    5       2       4       9       7       3
-- Final Cleanup
drop table table1
drop table table2
Categories: SQL Tips Tags: , ,

UNION vs UNION ALL | which one is faster?

January 14, 2011 Leave a comment

UNION and UNION ALL are used to combine data or record sets from two different tables. One could combine more than 2 tables. Just like JOINS, UNION combines data into a single record-set but vertically by adding rows from another table. JOINS combine data horizontally by adding columns from another table.

UNION insures you get DISTINCT records from both the tables.
UNION ALL pulls out all records from both the tables with duplicates.

Generally “UNION ALL” is considered best in performance when compared with UNION.
But you should only use “UNION ALL” when you are sure that you have distinct or no common records in both the records-sets.

For more information on this check my earlier blog post which shows the main differences and how performance differs in b/w the two: https://sqlwithmanoj.wordpress.com/2010/12/30/why-union-all-is-faster-than-union/