Certifications | SQL with Manoj

Prepare for Certification Exam 70-775: Perform Data Engineering on Microsoft Azure HDInsight

April 10, 2018 1 comment

In my [previous post] I’ve tried to collate some basic stuff about HDInsight to let you know the basics and get started. You can also check [Microsoft Docs] for HDInsight to know more and deep dive into the Big-Data platform.

Microsoft Certification Exams is one of a good and easy approach to understand the technology. You can find details about Exam 70-775 certification on the Microsoft Certification page.

Though the web page provides most the details of what would be asked in the Exam, but lacks in providing the study material against each module and topics under it. Thus here with this post I’ve tried to find and provide the study material links against each of the topics covered on these modules:

The exam is divided into 4 Modules:

1. Administer and Provision HDInsight Clusters
2. Implement Big Data Batch Processing Solutions
3. Implement Big Data Interactive Processing Solutions
4. Implement Big Data Real-Time Processing Solutions

Module #1. Administer and Provision HDInsight Clusters

1. Deploy HDInsight clusters
    – Create a HDInsight cluster [Portal] [ARM Template] [PowerShell] [.net SDK] [CLI]
    – Create HDInsight clusters with Hadoop, Spark, Kafka, etc [Link]
    – Select an appropriate cluster type based on workload considerations [Link]
    – Create a cluster in a private virtual network [Link]
    – Create a domain-joined cluster [Link]
    – Create a cluster that has a custom metastore [link]
    – Manage managed disks [with Apache Kafka]
    – Configure vNet peering [Link]

2. Deploy and secure multi-user HDInsight clusters
    – Provision users who have different roles
    – Manage users, groups & permissions [Ambari] [PowerShell] [Apache Ranger]
    – Configure Kerberos [Link]
    – Configure service accounts
    – Implement SSH [Connecting] [Tunneling]
    – Restrict access to data [Link]

3. Ingest data for batch and interactive processing
    – Ingest data from cloud or on-premises data; store data in Azure Data Lake
    – Store data in Azure Blob Storage
    – Perform routine small writes on a continuous basis using Azure CLI tools
    – Ingest data in Apache Hive and Apache Spark by using Apache Sqoop, Application Development Framework (ADF), AzCopy, and AdlCopy
    – Ingest data from an on-premises Hadoop cluster

4. Configure HDInsight clusters
    – Manage metastore upgrades
    – View and edit Ambari configuration groups
    – View and change service configurations through Ambari
    – Access logs written to Azure Table storage
    – Enable heap dumps for Hadoop services
    – Manage HDInsight configuration, use HDInsight .NET SDK, and PowerShell
    – Perform cluster-level debugging
    – Stop and start services through Ambari
    – Manage Ambari alerts and metrics

5. Manage and debug HDInsight jobs
    – Describe YARN architecture and operation
    – Examine YARN jobs through ResourceManager UI and review running applications
    – Use YARN CLI to kill jobs
    – Find logs for different types of jobs
    – Debug Hadoop and Spark jobs
    – Use Azure Operations Management Suite (OMS) to monitor and manage alerts, and perform predictive actions

Module #2. Implement Big Data Batch Processing Solutions

1. Implement batch solutions with Hive and Apache Pig
    – Define external Hive tables; load data into a Hive table
    – Use partitioning and bucketing to improve Hive performance
    – Use semi-structured files such as XML and JSON with Hive
    – Join tables with Hive using shuffle joins and broadcast joins
    – Invoke Hive UDFs with Java and Python; design scripts with Pig
    – Identify query bottlenecks using the Hive query graph
    – Identify the appropriate storage format, such as Apache Parquet, ORC, Text, and JSON

2. Design batch ETL solutions for big data with Spark
    – Share resources between Spark applications using YARN queues and preemption
    – Select Spark executor and driver settings for optimal performance, use partitioning and bucketing to improve Spark performance
    – Connect to external Spark data sources
    – Incorporate custom Python and Scala code in a Spark DataSets program
    – Identify query bottlenecks using the Spark SQL query graph

3. Operationalize Hadoop and Spark
    – Create and customize a cluster by using ADF
    – Attach storage to a cluster and run an ADF activity
    – Choose between bring-your-own and on-demand clusters
    – Use Apache Oozie with HDInsight
    – Choose between Oozie and ADF
    – Share metastore and storage accounts between a Hive cluster and a Spark cluster to enable the same table across the cluster types
    – Select an appropriate storage type for a data pipeline, such as Blob storage, Azure Data Lake, and local Hadoop Distributed File System (HDFS)

Module #3. Implement Big Data Interactive Processing Solutions

1. Implement interactive queries for big data with Spark SQL
    – Execute queries using Spark SQL
    – Cache Spark DataFrames for iterative queries
    – Save Spark DataFrames as Parquet files,
    – Connect BI tools to Spark clusters
    – Optimize join types such as broadcast versus merge joins
    – Manage Spark Thrift server and change the YARN resources allocation
    – Identify use cases for different storage types for interactive queries

2. Perform exploratory data analysis by using Spark SQL
    – Use Jupyter and Apache Zeppelin for visualization and developing tidy Spark DataFrames for modeling
    – Use Spark SQL’s two-table joins to merge DataFrames and cache results
    – Save tidied Spark DataFrames to performant format for reading and analysis (Apache Parquet)
    – Manage interactive Livy sessions and their resources

3. Implement interactive queries for big data with Interactive Hive
    – Enable Hive LLAP through Hive settings
    – Manage and configure memory allocation for Hive LLAP jobs
    – Connect BI tools to Interactive Hive clusters

4. Perform exploratory data analysis by using Hive
    – Perform interactive querying and visualization
    – Use Ambari Views
    – Use HiveQL
    – Parse CSV files with Hive
    – Use ORC versus Text for caching
    – Use internal and external tables in Hive
    – Use Zeppelin to visualize data

5. Perform interactive processing by using Apache Phoenix on HBase
    – Use Phoenix in HDInsight
    – Use Phoenix Grammar for queries
    – Configure transactions, user-defined functions, and secondary indexes
    – Identify and optimize Phoenix performance
    – Select between Hive, Spark, and Phoenix on HBase for interactive processing
    – Identify when to share metastore between a Hive cluster and a Spark cluster

Module #4. Implement Big Data Real-Time Processing Solutions

1. Create Spark streaming applications using DStream API
    – Define DStreams and compare them to Resilient Distributed Dataset (RDDs)
    – Start and stop streaming applications
    – Transform DStream (flatMap, reduceByKey, UpdateStateByKey)
    – Persist long-term data stores in HBase and SQL
    – Persist Long Term Data Azure Data Lake and Azure Blob Storage
    – Stream data from Apache Kafka or Event Hub
    – Visualize streaming data in a PowerBI real-time dashboard

2. Create Spark structured streaming applications
    – Use DataFrames and DataSets APIs to create streaming DataFrames and Datasets
    – Create Window Operations on Event Time
    – Define Window Transformations for Stateful and Stateless Operations
    – Stream Window Functions, Reduce by Key, and Window to Summarize Streaming Data
    – Persist Long Term Data HBase and SQL
    – Persist Long Term Data Azure Data Lake and Azure Blob Storage
    – Stream data from Kafka or Event Hub
    – Visualize streaming data in a PowerBI real-time dashboard

3. Develop big data real-time processing solutions with Apache Storm
    – Create Storm clusters for real-time jobs
    – Persist Long Term Data HBase and SQL
    – Persist Long Term Data Azure Data Lake and Azure Blob Storage
    – Stream data from Kafka or Event Hub
    – Configure event windows in Storm
    – Visualize streaming data in a PowerBI real-time dashboard
    – Define Storm topologies and describe Storm Computation Graph Architecture
    – Create Storm streams and conduct streaming joins
    – Run Storm topologies in local mode for testing
    – Configure Storm applications (Workers, Debug mode)
    – Conduct Stream groupings to broadcast tuples across components
    – Debug and monitor Storm jobs

4. Build solutions that use Kafka
    – Create Spark and Storm clusters in the virtual network
    – Manage partitions
    – Configure MirrorMaker
    – Start and stop services through Ambari
    – Manage topics

5. Build solutions that use HBase
    – Identify HBase use cases in HDInsight
    – Use HBase Shell to create updates and drop HBase tables
    – Monitor an HBase cluster
    – Optimize the performance of an HBase cluster
    – Identify uses cases for using Phoenix for analytics of real-time data
    – Implement replication in HBase

Preparing for Exam 70-761, Querying Data with Transact-SQL | Certification

May 29, 2017 7 comments

This exam (70-761) will earn you MCP in SQL Server 2016 Querying Data with Transact-SQL. It is 1 out of the 2 exams to earn the “MCSA: SQL 2016 Database Development” certification.

I will discuss about the other “Exam 70-762, Developing SQL Databases” in my next post.

Training Kit Book:

For exam preparation you can use: Exam Ref 70-761 Querying Data with Transact-SQL

So, let’s go and understand this exam, links to study material and how to prepare for this.

–> This exam is targeted for students or professionals who want to learn about Transact-SQL or simply T-SQL, which includes:
1. Know about SQL Server and its components
2. Write single/multi table SELECT statements, with SET and Predicate logic
3. Apply Filtering, Sorting, JOIN, etc with SQL Queries
4. Write DDL, DML SQL Statements to Store and Retrieve data to & from Tables
5. Create Views, Stored Procedures, Functions, etc
6. Use inbuilt Functions and Transform data
7. Work with SQL Datatypes, Variables, Conditions, Loops and T-SQL code/scripts

The exam is divided into multiple modules:

1. Introduction to Microsoft SQL Server 2016
    – The Basic Architecture of SQL Server [Architecture]
    – SQL Server on-premise and cloud-based Editions and Versions [Editions]
    – Getting Started with SSMS [SSMS]

2. Introduction to T-SQL Querying
    – Introducing T-SQL with basic SELECT Statements [SELECT]
    – Understanding Sets and SET based Queries in relational DBs
    – Understanding Predicate Logic to Filter Data [WHERE clause, PREDICATES]
    – Understanding the Logical Order of Operations in SELECT statements [Logical Order]

3. Writing SELECT Queries
    – Writing Simple SELECT Statements, its structure and format [SELECT Queries]
    – Eliminating Duplicates with DISTINCT clause [DISTINCT]
    – Using Column and Table Aliases
    – Writing Simple CASE Expressions [CASE]

4. Querying Multiple Tables
– Understanding Joins
– Querying with Inner Joins, Outer Joins, Cross Joins and Self Joins

5. Sorting and Filtering Data
    – Sorting Data with ORDER BY clause
    – Filtering Data with WHERE clause Predicates
    – Filtering Data with TOP and OFFSET-FETCH
    – Working with Unknown/Missing Values or three-valued logic (NULL)

6. Working with SQL Server 2016 Data Types
    – Introducing SQL Server 2016 Data Types
    – Working with Character Data
    – Working with Date and Time Data
    – Working with data types conversion

7. Using DML to Modify Data
– Inserting Data with INSERT and SELECT INTO statements
– Modifying and Deleting Data with UPDATE, MERGE, DELETE, and TRUNCATE statements

8. Using Built-In Functions
    – Writing Queries with Built-In Functions
    – Using Conversion Functions (CAST & CONVERT)
    – Using Logical Functions
    – Using Functions to Work with NULL

9. Grouping and Aggregating Data
    – Using built-in Aggregate Functions
    – Using the GROUP BY Clause
    – Filtering Groups with HAVING

10. Using Subqueries
    – Writing Self-Contained Subqueries
    – Writing Correlated Subqueries
    – Using the EXISTS Predicate with Subqueries

11. Using Table Expressions
    – Using Views
    – Using Inline Table-Valued Functions (TVFs)
    – Using Derived Tables
    – Using Common Table Expressions (CTEs) [CTE]

12. Using Set Operators
    – Writing Queries with the UNION operator (and UNION ALL)
    – Using EXCEPT and INTERSECT operators
    – Using APPLY operators (CROSS APPLY and OUTER APPLY)

13. Using Windows Ranking, Offset, and Aggregate Functions
– Creating Windows with OVER clause with partitioning, ordering, and framing
– Exploring Window Aggregate and Ranking Functions

14. Pivoting and Grouping Sets
    – Writing Queries with [PIVOT and UNPIVOT]
    – Working with Grouping Sets, CUBE and ROLLUP Subclauses
    – Using the GROUPING_ID function

15. Executing Stored Procedures
    – Querying Data with Stored Procedures
    – Creating Simple Stored Procedures
    – Passing Parameters to Stored procedures
    – Stored Procedures with output parameters
    – Working with Dynamic SQL

16. Programming with T-SQL
    – T-SQL Programming Elements (Variables, Batches, )
    – Using variables in a Dynamic SQL Statement
    – Controlling Program Flow (IF-ELSE condition, WHILE loop)
    – Working with Synonyms

17. Implementing Error Handling
– Implementing T-SQL error handling with TRY/CATCH block
– Implementing structured exception handling with THROW keyword

18. Implementing Transactions
    – Transactions and the database engines
    – Differences between batches and transactions
    – Controlling transactions with BEGIN, COMMIT, and ROLLBACK
    – Error handling with the CATCH block
    – Use of SET XACT_ABORT while handling Transactions

Microsoft official link to this exam: https://www.microsoft.com/en-in/learning/course.aspx?cid=20761

Book available as a Training Kit on Amazon, buy it.

SQL Server 2016 articles on my blog: https://sqlwithmanoj.com/sql-server-2016-articles/

Categories: Certifications, SQL Server 2016 Tags: SQL Server 2016, SQL Server Certification

SQL Server 2016 Certification Path

May 21, 2017 21 comments

Microsoft has made few changes to its Certification path for the new version of SQL Server i.e. SQL Server 2016.

And with this post I’m trying to collate and put all exams and certifications in concise and clear manner. Would be happy to accept any comments, changes and suggestions !!!

–> The MCSA i.e. “Microsoft Certified Solution Associate” level now contains the new “MCSA: SQL Server 2016” certification, and this replaces the older “MCSA: SQL Server 2012/14” one.

–> And the top MCSE i.e. “Microsoft Certified Solutions Expert” level now contains the new “MCSE: Data Management and Analytics” certification. This also replaces the following 2 existing certifications (but retains the underlying exams to earn this MCSE level, which I’ve explained below):
– MCSE: Data Platform
– MCSE: Business Intelligence

–> Here are the details of exams at both the levels.

Level 1. “MCSA: SQL Server 2016”: Now to earn this level you need to give either of the following certifications:

   1. MCSA: SQL 2016 Database Development certification
       – Exam [70-761], Querying Data with Transact-SQL [check details here]
       – Exam [70-762], Developing SQL Databases

   2. MCSA: SQL 2016 Database Administration certification
       – Exam [70-764], Administering a SQL Database Infrastructure
       – Exam [70-765], Provisioning SQL Databases

   3. MCSA: SQL 2016 Business Intelligence Development certification
       – Exam [70-767], Implementing an SQL Data Warehouse
       – Exam [70-768], Developing SQL Data Models

Level 2. “MCSE: Data Management and Analytics”: Now to earn this level you need to first get an MCSA on either of the above 3 certifications, or “MCSA: Data Science”. Then need to pass anyone of the below exam:

   – Exam [70-473], Designing and Implementing Cloud Data Platform Solutions
   – Exam [70-475], Designing and Implementing Big Data Analytics Solutions
   – Exam [70-464], Developing Microsoft SQL Server Databases
   – Exam [70-465], Designing Database Solutions for Microsoft SQL Server
   – Exam [70-466], Implementing Data Models and Reports with Microsoft SQL Server
   – Exam [70-467], Designing Business Intelligence Solutions with Microsoft SQL Server

Categories: Certifications, SQL Server 2016 Tags: MCSA, MCSE, SQL Server 2016, SQL Server Certification

Which SQL Server Certification should I take: 98-364, 70-461 or 70-761 ? – MSDN TSQL forum

February 18, 2017 3 comments

–> Question:

I just started learning SQL Server, and I want to get certified, I checked internet, but I got confused.

There is exam 98-364, 70-461, 70-761, and i want to get MCSA in SQL

could some one explain to me what should i do …

–> My Answer:

>> MCSA or Microsoft Certified Solutions Associate in SQL Server can be earned on following areas with respective exams:

– Database Development (761, 762)
– Database Administration (764, 765)
– Business Intelligence (767, 768)

>> Exam 98-364 is to get Microsoft Technology Associate (MTA) certification. It is the most basic exam that focuses just on basic SQL, and is for people new to technology, who need to validate their knowledge of fundamental concepts before moving on to more advanced certifications. This exam covers:

1. RDBMS, DBMS concepts and terms, like Normanization, PK, FK, etc.
2. Plain SQL querying knowledge, like DDL, DML, etc. and indexes.
3. Simple DBA stuff, like security, backup/restore, etc.

Link: https://www.microsoft.com/en-in/learning/exam-98-364.aspx

>> Exam 70-761 is also an advanced exam which is based upon SQL Server 2016, which is the latest version of SQL Server as of today. To earn MCSA you have to take 70-762 exam as well as mentioned above, check this link for MCSA SQL Server 2016: https://www.microsoft.com/en-in/learning/mcsa-sql-2016-certification.aspx

>> Exam 70-461 is also an advance level exam but now older based upon SQL Server 2012 or 2014. Check this link for MCSA SQL Server 2012/2014: https://www.microsoft.com/en-in/learning/mcsa-sql-certification.aspx

Please note: Exam 98-364 is not required here, and as mentioned above its just to make sure you have knowledge in working with basic SQL Querying.

Ref link.

Categories: Certifications Tags: MSDN TSQL forum, SQL Server Certification

Exam 70-461, SQL 2014 update – Querying Microsoft SQL Server 2014

December 1, 2015 7 comments

In one of my previous post I talked about [Exam 70-461 for SQL Server 2012], and I really got good and big response, with ~250 comments. As SQL Server 2014 got released most of the folks asked me about the certification update.

So here in this post I’ll talk about the same exam 70-461 which is now updated for SQL Server 2014, and Microsoft has not changed the exam code for this version of SQL Server.

This updated certification exam is still based on SQL Server 2012 with new features of SQL Server 2014. Previously with SQL 2012 the exam was divided into 4 modules, please check my previous post, link above. Here with SQL 2014 update the exam is reshaped into 20 modules listed below:

1. Introduction to Microsoft SQL Server 2014
– SQL Server architecture | MSDN
– Editions | MSDN
– SQL Server Management Studio | MSDN

2. Introduction to T-SQL Querying
– T-SQL intro | MSDN
– Set based queries
– Predicate Logic | MSDN
– SELECT statement Logical Ordering | MSDN

3. Writing SELECT Queries
– Simple SELECT queries | MSDN
– Find & eliminate duplicates, and DISTINCT keyword
– Working with Columns
– CASE expressions | MSDN

4. Querying Multiple Tables
– SQL JOINs (INNER, OUTER, CROSS & Self) | MSDN | TechNet

5. Sorting and Filtering Data
– Sorting data with ORDER BY clause | MSDN
– Filtering data with WHERE clause MSDN
– Filtering data with TOP keyword | MSDN
– Using TOP-OFFSET clause | MSDN | TechNet
– Working with NULL/unknown values

6. Working with SQL Server 2014 Data Types
– New SQL 2014 datatypes (Character, Date, Time) | MSDN
– Write Queries using different Datatypes | MSDN

7. Using DML to Modify Data
– Inserting Data | MSDN
– Modifying Data | MSDN
– Deleting Data | MSDN

8. Using Built-In Functions
– SQL 2014 new functions (Conversion, Logical)

9. Grouping and Aggregating Data
– Using Aggregate functions | MSDN
– Using GROUP BY clause | MSDN
– Using HAVING clause | MSDN

10. Using Sub-Queries
– Self-Contained Subqueries | TechNet
– Correlated Subqueries | TechNet
– Using EXISTS() predicate with Subqueries | TechNet

12. Using Set Operators
– The UNION, UNION ALL Operator | MSDN
– EXCEPT vs INTERSECT | MSDN
– Using APPLY (CROSS & OUTER) | TechNet

13. Using Window Ranking, Offset, and Aggregate Functions
– Window functions with OVER() clause | MSDN
– More Window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) | MSDN

14. Pivoting and Grouping Sets
– Writing queries with PIVOT and UNPIVOT | MSDN | TechNet
– Using Grouping Sets | MSDN | TechNet

16. Programming with T-SQL
– Working with Variables, Batches | MSDN | TechNet
– Using IF conditions | MSDN
– WHILE loops | MSDN

17. Implementing Error Handling
– Using TRY/CATCH block | MSDN
– Working with Error Handling | MSDN
– Returning error information
– Raising user-defined errors and passing system errors | MSDN

18. Implementing Transactions
– Using Transactions | MSDN
– Use of “SET XACT_ABORT” with Transactions | MSDN | MSDN-2
– Effects of Isolation Levels on Transactions | MSDN

19. Improving Query Performance
– Factors affecting Query Performance | MSDN
– Checking Execution plan
– Optimize SQL queries, Query tuning | MSDN
– Optimize SQL Indexes, Index tuning | MSDN

20. Querying SQL Server Metadata
– Querying System Catalog Views | MSDN
– Querying System Catalog Functions | MSDN
– Using System Stored Procedures | MSDN
– Using Dynamic Management Objects (DMVs, DMFs) | MSDN

–> Check SQL Server 2014 videos here: