Archive

Archive for the ‘Others’ Category

2020 blogging in review (Thank you & Happy New Year 2021 !!!)

December 31, 2020 Leave a comment

 

Happy New Year 2021… from SQL with Manoj !!!

As WordPress.com Stats helper monkeys have stopped preparing annual report from last few years for the blogs hosted on their platform. So I started preparing my own Annual Report every end of the year to thank my readers for their support, feedback & motivation, and also to check & share the progress of this blog.
 

In year 2019 & 2020 I could not dedicated much time here, so there were very few blogs posted by me. In mid 2019 something strange happened and my blog hits started declining day by day. Usually I used to get daily ~3.5k hits and within few months hits were reduced to just half. As the daily hits were under ~1.5k so I checked the SEO section in WordPress-admin, and I was surprised to discover that my blog & meta info was missing in Google webmaster.

I re-entered the meta info and since last 1 year the blog hits are stable at ~1k hits per day which is very low from what I was getting ~2 years back. Thus, you can see a drastic decline of hits in 2020 year below.

 
–> Here are some Crunchy numbers from 2020

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 281,285 times by 213,447 unique visitors in 2020. If it were an exhibit at the Louvre Museum, it would take about 40 days for that many people to see it.

There were 22 pictures uploaded, taking up a total of ~1 MB. That’s about ~2 pictures every month.

 

–> All-time posts, views, and visitors

SQL with Manoj all time views


 

–> Posting Patterns

In 2020, there were 10 new posts, growing the total archive of this blog to 561 posts.

LONGEST STREAK: 4 post in Oct 2020

 

–> Attractions in 2020

These are the top 5 posts that got most views in 2020:

0. Blog Home Page (49,287 views)

1. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE (12,870 views)

2. Windows could not start SQL Server, error 17051, SQL Server Eval has expired (8,498 views)

3. Reading JSON string with Nested array of elements (8,016 views)

4. SQL Server 2016 RTM full and final version available (7,714 views)

5. Windows could not start SQL Server (moved Master DB) (6,988 views)

 

–> How did they find me?

The top referring sites and search engines in 2020 were:

SQL with Manoj 2020 Search Engines and other referrers

 

–> Where did they come from?

Out of 210 countries, top 5 visitors came from United States, India, United Kingdom, Canada and Australia:

SQL with Manoj 2020 top Countries visitors

 

–> Followers: 442

WordPress.com: 180
Email: 262
Facebook Page: 1,480

 

–> Alexa Rank (lower the better)

Global Rank: 835,163 (as of 31st DEC 2020)
Previous rank: 221,534 (back in 2019)

 

–> YouTube Channel:

SQLwithManoj on YouTube
– Total Subscribers: 17,1700
– Total Videos: 70

 

–> 2021 New Year Resolution

– Write at least 1 blog post every week
– Write on new features in SQL Server 2019
– I’ve also started writing on Microsoft Big Data Platform, related to Azure Data Lake and Databricks (Spark/Scala), so I will continue to explore more on this area.
– Post at least 1 video every week on my YouTube channel

 

That’s all for 2020, see you in year 2021, all the best !!!
 

Connect me on Facebook, Twitter, LinkedIn, YouTube, Google, Email


Spark SQL – Beware of Implicit datatype conversions (TypeCoercion)

March 6, 2020 1 comment

 
While working on some data analysis I saw one Spark SQL query was not getting me expected results. The table had some good amount of data, I was filtering on a value but some records were missing. So, I checked online and found that Spark SQL works differently compared to SQL Server, in this case while comparing 2 different datatypes columns or variables.

–> I’m populating some test data to reproduce the scenario, for that I’m inserting 9 rows and storing decimal values as String, query below:

CREATE OR REPLACE TEMPORARY VIEW vwTestDataType as 
select * from values 
("row1", "2.0"), 
("row2", "1.5"), 
("row3", "1.0"), 
("row4", "0.8"), 
("row5", "0.6"), 
("row6", "0.4"), 
("row7", "0.2"), 
("row8", "0.0"),
("row9", null);

describe vwTestDataType;

col_name | data_type | comment
col1           | string         | null
col2           | string         | null

 

–> Now, I’ll create a similar query where I was observing the issue. The below query should return me 7 rows, but instead it returns just 3 rows.

select * from vwTestDataType where col2 > 0

Running above query in “SQL Server” throws below error for the same dataset:

Conversion failed when converting the varchar value ‘2.0’ to data type int.

 

–> Let’s check why Spark SQL query didn’t failed and why its behaving like this.

I will use EXPLAIN EXTENDED operator to know what’s happening with the query while creating the Logical Plan.

explain extended select * from vwTestDataType where col2 > 0

Here is the plan you can see that under Analyzed Logical Plan the column “col2” is getting implicitly typecasted to INT, as the comparison value is an INT type. Thus it is converting all 0.x values to 0 and filtering them out.

Plan

== Parsed Logical Plan ==
‘Project [*]
+- ‘Filter (‘col2 > 0)
+- ‘UnresolvedRelation `vwTestDataType`

== Analyzed Logical Plan ==
col1: string, col2: string
Project [col1#13284, col2#13285]
+- Filter (cast(col2#13285 as int) > 0)
+- SubqueryAlias `vwtestdatatype`
+- Project [col1#13284, col2#13285]
+- LocalRelation [col1#13284, col2#13285]

== Optimized Logical Plan ==
LocalRelation [col1#13284, col2#13285]

== Physical Plan ==
LocalTableScan [col1#13284, col2#13285]

 

–> Now to avoid this issue you must explicitly type cast the column and value to the exact datatype to get expected result. Like here we should convert the String column & value to Double, this way the query returns all 7 rows as expected:

select * from vwTestDataType where double(col2) > double(0)
--OR--select * from vwTestDataType where col2 > 0.0

Let’s again check the Logical Plan of the modified query by using EXPLAIN EXTENDED operator how it looks like:

explain extended select * from vwTestDataType where double(col2) > double(0)
--OR--explain extended select * from vwTestDataType where col2 > 0.0

plan
== Parsed Logical Plan ==
‘Project [*]
+- ‘Filter (‘double(‘col2) > ‘double(0))
+- ‘UnresolvedRelation `vwTestDataType`

== Analyzed Logical Plan ==
col1: string, col2: string
Project [col1#13213, col2#13214]
+- Filter (cast(col2#13214 as double) > cast(0 as double))
+- SubqueryAlias `vwtestdatatype`
+- Project [col1#13213, col2#13214]
+- LocalRelation [col1#13213, col2#13214]

== Optimized Logical Plan ==
LocalRelation [col1#13213, col2#13214]

== Physical Plan ==
LocalTableScan [col1#13213, col2#13214]

 

So while working with Spark SQL we should make sure there should not be such datatype conflicts, and moreover these type of issues should be handled in way beginning while modelling the tables with correct datatype.


SQL Error – “SELECT TOP 100” throws error in SparkSQL – what’s the correct syntax?

January 23, 2020 Leave a comment

 
In SQL Server to get top-n rows from a table or dataset you just have to use “SELECT TOP” clause by specifying the number of rows you want to return, like in the below query.

But when I tried to use the same query in Spark SQL I got a syntax error, which meant that the TOP clause is not supported with SELECT statement.

%sql
Select TOP 100 * from SalesOrder

Error in SQL statement: ParseException:
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ‘100’ expecting (line 1, pos 11)

== SQL ==
Select top 100 * from SalesOrder
———–^^^

 

As Spark SQL does not support TOP clause thus I tried to use the syntax of MySQL which is the “LIMIT” clause.

So I just removed “TOP 100” from the SELECT query and tried adding “LIMIT 100” clause at the end, it worked and gave expected results !!!

%sql
Select * from SalesOrder LIMIT 100

SQL Server 2019 released, awesome new features – download now !!!

November 4, 2019 Leave a comment

 
Today on 4th November 2019 Microsoft in MSIgnite2019 event announced the release of new version of SQL Server i.e. SQL Server 2019.
 

New stuff in SQL Server 2019 is all about Big Data Clusters for SQL Server, which will allow you to:
– Deploy scalable clusters of SQL Server, Spark, HDFS on Kubernetes
– Read, write, and process big data from Transact-SQL or Spark
– With Polybase query data from external SQL Server, Oracle, Teradata, MongoDB, and ODBC data sources with external tables
– and many more, we will check below…

 
–> Download SQL Server (evaluation version):

To download SQL Server 2019 you can Register and Download the Full or Evaluation version (180 days) here.

Or you can directly download the installer SQL2019-SSEI-Eval.exe
 

–> Free Developer Version:

Back in March 2016 Microsoft announced that going forward the Developer version of SQL Server any release will be free for Developers and Learning purpose. Register and Download the Developer version.

This Developer version is meant for development and testing only, and not for production environments or for use with production data. For more info please check my previous blog post.
 

–> Download SSMS (separate install):

Microsoft starting with SQL Server 2016 decoupled SSMS from SQL Server setup and is available as a separate installer. This is basically to support the move to make a universal version of SSMS for both SQL Server on-Prem & Azure SQL Database, that will ship every month or so. The SSMS setup is available separately as free download.
 

–> Download SSRS (Reporting Services):

Just like SSMS, now SSRS is also separately available for install from Microsoft Download Center, link.
 

–> Check new features in SQL Server 2019:

1. Big data clusters with scalable compute and storage composed of SQL Server, Spark, and HDFS. It provides key elements of a data lake – Hadoop Distributed File System (HDFS), Spark, and analytics tools deeply integrated with SQL Server. [more info]

2. A complete Data Science & AI platform to train and operationalize models in SQL Server ML Services or Spark ML using Azure Data Studio notebooks.

3. Data virtualization allows queries across relational and non-relational data without movement or replication. PolyBase enabled you to run a T-SQL query inside SQL Server to pull data from Hadoop and return it in a structured format—all without moving or copying the data.

4. Intelligent Query Processing improves scaling of queries and Automatic Plan Correction resolves performance problems. [more info]
  – Table variable deferred compilation
  – Batch mode on row store
  – T-SQL Scalar UDF inlining
  – Approximate QP (Approximate COUNT DISTINCT)
  – Memory grant feedback, row mode

5. In-memory improvements for analytics on operational data using HTAP. Higher concurrency and scale through persistent memory (PMEM). [more info]
– Hybrid buffer pool
– Memory-optimized TempDB
– In-Memory OLTP support for Database Snapshots

6. Greater uptime with more online indexing operations. [more info]

7. Data Discovery & Classification labeling for GDPR and Vulnerability Assessment tool to track compliance.

8. Support for your choice of Windows, Linux, and containers. SQL Server on Linux for Docker Engine [link].

9. High Availability with, five synchronous replica pairs, Secondary to primary replica connection redirection, Run Always On availability groups on containers using Kubernetes. [more info]

10. Accelerated Database Recovery [more info]

11. SQL Graph enhancements, support of MATCH predicates in a MERGE statement, SHORTEST_PATH inside MATCH, and support for derived tables or view aliases in graph match query.
 

Check all the above and many more new features of SQL Server 2019 in MSDN Blogs.


2018 blogging in review (Happy New Year – 2019 !!!)

December 31, 2018 1 comment

 

Happy New Year 2019… from SQL with Manoj !!!

As WordPress.com stats helper monkeys have stopped preparing annual report for any of their blogs, so I’ve prepared my own Annual Report for this year again.
 

–> Here are some Crunchy numbers from 2018

SQL with Manoj 2018 Stats

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 793,171 times by 542,918 unique visitors in 2018. If it were an exhibit at the Louvre Museum, it would take about 17 days for that many people to see it.

There were 68 pictures uploaded, taking up a total of ~6 MB. That’s about ~6 pictures every month.

This blog also got its highest ever hits/views per day (i.e. 3,552 hits) on Sept 25th this year.

 

–> All-time posts, views, and visitors

SQL with Manoj all time views


 

–> Posting Patterns

In 2018, there were 26 new posts, growing the total archive of this blog to 546 posts.

LONGEST STREAK: 6 post in Feb 2018

 

–> Attractions in 2018

These are the top 5 posts that got most views in 2018:

1. Download & Install SQL Server Management Studio (SSMS) 2016 (62,101 views)

2. SQL Server 2016 RTM full & final version available – Download now (31,705 views)

3. Getting started with SQL Server 2014 | Download and Install Free & Full version (20,443 views)

4. SQL Basics – Difference b/w WHERE, GROUP BY and HAVING clause (16,113 views)

5. SQL Basics – Difference b/w TRUNCATE, DELETE and DROP? (13,189 views)

 

–> How did they find me?

The top referring sites and search engines in 2018 were:

SQL with Manoj 2018 Search Engines referrers


 

–> Where did they come from?

Out of 210 countries, top 5 visitors came from India, United States, United Kingdom, Canada and Australia:

SQL with Manoj 2018 top Countries visitors


 

–> Followers: 407

WordPress.com: 160
Email: 247
Facebook Page: 1,358

 

–> Alexa Rank (lower the better)

Global Rank: 221,534
US Rank: 139,012
India Rank: 46,758
Estimated Monthly Revenue: $1,320
Actual Monthly Revenue: $300

SQL with Manoj 2018 Alexa ranking


Alexa history shows how the alexa rank of sqlwithmanoj.com has varied in the past, which in turn also tells about the site visitors.
 

–> 2019 New Year Resolution

– Write at least 1 blog post every week
– Write on new feaures in SQL Server 2017 & 2019
– Also explore and write blog post on Azure Data Platform
– Post at least 1 video every week on my YouTube channel

 

That’s all for 2018, see you in year 2019, all the best !!!
 

Connect me on Facebook, Twitter, LinkedIn, YouTube, Google, Email