Archive

Archive for July, 2020

Azure Data Factory (ADF) Pipeline failure – found more columns than expected column count (DelimitedTextMoreColumnsThanDefined)

July 29, 2020 3 comments

 
I was setting up an Azure Data Factory (ADF) to copy files from Azure Data Lake Storage Gen1 to Gen2, but while running the Pipeline it was failing with below error:

Operation on target Copy_sae failed: Failure happened on ‘Sink’ side.
ErrorCode=DelimitedTextMoreColumnsThanDefined,
‘Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Error found when processing ‘Csv/Tsv Format Text’ source ‘0_2019_11_09_01_43_32.avro’ with row number 53: found more columns than expected column count 27.,
Source=Microsoft.DataTransfer.Common,’

 

After some research I figured out that its because I had not selected the “Binary Copy” option while creating the Copy Data activity (shown in image below).

Root Cause: If the files under a particular folder you are copying contains files having different schema like, variable number of columns, different delimiters, quote char settings, or some data issue, the ADF pipeline will end up running in this error.

So, for bulk copying or migrating your data from one Data Lake to another try choosing this option, so that ADF won’t open the files to read schema, but it just simply treats every file as binary and copy it to the other location.


 
Hope this helps !

Migrate ADLS Gen1 to Gen2


SQL DBA – Change RECOVERY mode & SHRINK all databases at once in SQL Server

July 17, 2020 2 comments

 
There are times when you are left with no or very less space in you SQL Server VM disks, and the main cause of this is not only the lot amount of data coming in but the heavy processing your database Server is doing, and thus filling up the entire log space.

 
SQL Server provides you some effective commands to shrink the database files which got inflated due to heavy processing and to make room for new data.

ALTER DATABASE <db_name> SET RECOVERY SIMPLE;

DBCC SHRINKFILE (N'<log_file_name>' , 100);

But this works with one database at a time, so what if you have lots of databases? It will be tiring and time consuming to visit each DB, get the database file names and apply the shrink command. Its even slow to do the same via SSMS GUI.
 

Note: Do not change the recovery mode in a PROD environment unless it is really required.

 
With the below T-SQL statements you can generate scripts for all the databases and can run to change all database settings at once:

-- Generate SQL Script to change Recovery mode to 'SIMPLE' for all DBs:
SELECT 
	'ALTER DATABASE [' + name + '] 
	 SET RECOVERY SIMPLE;' as SimpleRecovery4AllDBs
FROM sys.databases  
WHERE recovery_model_desc <> 'SIMPLE'

-- Generate SQL Script to Shrink log files of all DBs:
SELECT 
	'USE ' + DB_Name(database_id) + '; 
	 DBCC SHRINKFILE (N''' + name + ''' , 100);' as ShrinkAllDBs
FROM sys.master_files 
WHERE database_id > 4 and [type] = 1

 

Please let me know if you have any other way to do the same (and in more effective way) !!!