Advertisements

Archive

Archive for the ‘Microsoft Azure’ Category

Azure SQL Data Sync – keep your data in sync between Azure Hybrid environment SQL Server and SQL DB

July 12, 2018 Leave a comment

 
SQL Data Sync is a service built on Azure SQL Database that lets you synchronize the data you select bi-directionally across multiple SQL databases and SQL Server instances.
 

Internal Mechanism & Performance impact:

Data Sync uses insert, update, and delete triggers to track changes. It creates side tables in the user database for change tracking. These change tracking activities have an impact on your database workload. Assess your service tier and upgrade if needed.

Since Data Sync is trigger-based, transactional consistency is not guaranteed. Microsoft guarantees that all changes are made eventually, and that Data Sync does not cause data loss.
 

Limitations & Requirements:

1. Each table must have a primary key
2. Snapshot isolation must be enabled
3. A table cannot have an identity column that is not the primary key
4. The names of objects (databases, tables, and columns) cannot contain the printable characters period (.), left square bracket ([), or right square bracket (]).
5. Unsupported data types
    a. FileStream
    b. SQL/CLR UDT
    c. XMLSchemaCollection (XML supported)
    d. Cursor, Timestamp, Hierarchyid
6. Azure Active Directory authentication is not supported.

Maximum number of sync groups any database can belong to 5
Maximum number of endpoints in a single sync group 30
Maximum number of on-premises endpoints in a single sync group 5
Database, table, schema, and column names 50 characters per name
Tables in a sync group 500
Columns in a table in a sync group 1000
Data row size on a table 24 Mb
Minimum sync interval 5 Minutes

 

Configurations:

1. Frequency can be set in seconds, minutes, hours and days (min 5 minutes)
2. You can choose and select desired tables and columns to sync
3. For on premises database you must configure a local DMG agent
 

Setup Azure SQL Data Sync:

Check here for Step-by-Step tutorial.

1. Create a Hub database on SQL DB (Hub & Spoke topology)
    a. Hub database: must be Azure SQL DB
    b. Spoke/Member database: rest of databases are either Azure SQL DB, on SQL Server instance

2. Create Sync group (On the Hub Database create a “New Sync Group”)
    a. Sync Schema: which data is being synchronized
    a. Sync metadata database: must be an Azure SQL DB
    b. Sync Interval: frequency
    c. Conflict Resolution Policy: (Hub wins or Member wins)

3. Add Sync Members (Spokes, can be either SQL DB or SQL instance)
    a. Sync Agent Gateway (for on-prem): download and install on on-premise server
    b. Sync Direction: bi-directional, or one direction

4. Configure Sync group
    a. Select the Tables/Columns which you want to sync.
 

Powershell script to:

Sync SQL DB & on-prem SQL Server instance
Sync between multiple SQL Databases
 

Best Practices: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-best-practices-data-sync

Monitor: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-sync-monitor-oms

Troubleshoot: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-troubleshoot-data-sync
 

Recommendations:

Data Sync is not appropriate for the following scenarios:
1. Disaster Recovery
2. Read Scale
3. ETL (OLTP to OLAP)
4. Migration from on-premises SQL Server to Azure SQL Database
 

Further reading: https://blogs.msdn.microsoft.com/igorpag/2017/07/06/azure-sql-data-sync-test-drive-and-first-impressions/


Advertisements

An introduction to Azure HDInsight – Microsoft’s Big-Data/Hadoop solution on Azure

March 3, 2018 1 comment

 
The Microsoft Azure portal has all the details on HDInsight and is very vast. Here in this post I’ve simply curated main and important stuff for myself and others to get started with HDInsight.
 

Azure HDInsight is a standard Apache Hadoop distribution offered as a managed service on Microsoft Azure. It is based on the Hortonworks Data Platform (HDP) and provisioned as clusters on Azure. The clusters can be created on your choice of Windows or Linux Servers.
 

What HDInsight offers:

1. Provides an end-to-end SLA on all your production workloads.
2. Enables you to scale workloads up or down anytime and only pay for what you use.
3. Protects and Secure your data as per government compliance.
4. Provide Log Analytics to monitor your clusters.
5. Globally availability in multiple regions.
6. Provides various productivity tools for development.
 

HDInsight enables a broad range of scenarios such as: Process & Analyze Big-Data, Batch Processing, in-memory processing ETL, Data Warehousing, Machine Learning, IoT and more, by using a broad spectrum of open-source frameworks, like Hadoop, Spark, Kafka, HBase, Hive, Storm and R Server.


 

HDInsight Cluster Types:

1. Hadoop: A simple Map-Reduce programming model to process and analyze batch data in parallel. [Apache Hadoop]

2. Spark: An open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications. [Apache Spark]

3. HBase: A NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semi-structured data. [Apache HBase]

4. R Server: A server for hosting and managing parallel, distributed R processes. It provides data scientists, statisticians, and R programmers with on-demand access to scalable, distributed methods of analytics on HDInsight.

5. Storm: A distributed, real-time computation system for processing large streams of data fast. [Apache Storm]

6. Hive: or Interactive Query (AKA: Live Long and Process), In-memory caching for interactive and faster Hive queries. [Apache Hive]

7. Kafka: An open-source platform that’s used for building streaming data pipelines and applications. Kafka also provides message-queue functionality that allows you to publish and subscribe to data streams. [Apache Kafka]
 

Other Components available with HDInsight:

Ambari Avro HCatalog
Mahout MapReduce YARN
Phoenix Pig Sqoop
Tez Oozie ZooKeeper
 

Storage options in HDInsight:

1. Azure Blob Store

2. Azure Data Lake Store

Azure Data Lake Store vs Azure Blob Storage
 

Role Based Security:

Owner Lets you manage everything
Contributor Lets you manage everything except access to resources
Reader Lets you view everything but not make changes
User Access Administrator Lets you manage user access to Azure resources

 

HDInsight security:

[Overview and more details on Microsoft Docs]


 


What is Lambda Architecture? and what Azure offers with its new Cosmos DB?

February 16, 2018 2 comments

 
Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch processing and stream processing methods, and minimizing the latency involved in querying big data.

It is a Generic, Scalable, and Fault-tolerant data processing architecture to address batch and speed latency scenarios with big data and map-reduce.

–> The system consists of three layers: Batch Layer, Speed Layer & Service Layer

1. All data is pushed into both the Batch layer and Speed layer.

2. The Batch layer has a master dataset (immutable, append-only set of raw data) and pre-computes the batch views.

3. The Serving layer has Batch views for fast queries.

4. The Speed Layer compensates for processing time (to the serving layer) and deals with recent data only.

5. All queries can be answered by merging results from Batch views and Real-time views or pinging them individually.
 

Lambda Architecture with Azure:

Azure offers you a combination of following technologies to accelerate real-time big data analytics:

1. Azure Cosmos DB, a globally distributed and multi-model database service.

2. Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics applications.

3. Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process.

4. The Spark to Azure Cosmos DB Connector

How Azure simplifies the Lambda Architecture:

1. All data is pushed into Azure Cosmos DB for processing.

2. The Batch layer has a master dataset (immutable, append-only set of raw data) stored in Azure Cosmos DB. Using HDI Spark, you can pre-compute your aggregations to be stored in your computed Batch Views.

3. The Serving layer is an Azure Cosmos DB database with collections for the master dataset and computed Batch View for fast queries.

4. The Speed layer compensates for processing time (to the serving layer) and deals with recent data only. It utilizes HDI Spark to read the Azure Cosmos DB change feed. This enables you to persist your data as well as to query and process it concurrently.

5. All queries can be answered by merging results from batch views and real-time views, or pinging them individually.
 

–> For complete details check here in Microsoft Docs: Azure Cosmos DB: Implement a lambda architecture on the Azure platform


Create a new Azure SQL Database (on PaaS) step by step

February 15, 2018 1 comment

 
Azure SQL Database is a Microsoft’s cloud based service on Microsoft on premise version of SQL Server relational database. It is a fully-managed relational cloud database service built for developers.
 

–> Today we will see how you can create a new SQL Database on Microsoft Azure:
 

1. Please make sure you have a Azure Account and Subscription.
 

2. Go to Microsoft Azure portal:

– In the menu, click New

– Select the Databases in the Azure Marketplace blade

– Click SQL Database

3. In the SQL Database blade enter the following settings:

– Name: MyFirstAzureSQLDB

– Subscription: Select your Azure Subscription

– Resource Group: Select the Resource Group, or create new

– Select source: We will choose here “Sample (AdventureWorksLT)”

– Server: Create a new logical server with following settings:
   – Server name: Provide a globally unique name
   – Server admin login: Provide a valid user name of your choice
   – Password: Provide a valid password
   – Location: Select a valid location
   – Allow azure services to access server: Select the check box

– Want to use SQL elastic pool? Select “Not now”

– Pricing tier: As you click on this option you will be taken to the Configure Performance blade, select Basic for now, but you can choose higher tiers as per your workloads and needs.

– Collation: Leave it as default, SQL_Latin1_General_CP1_CI_AS

– Pin to dashboard: Select the check box
 

4. Finally click the Create button on the main blade. Provisioning a new SQL Database takes few minutes and you can see the deployment progress at the Top-Right corner of your portal. And once the deployment is done you will see following message at the same place:

5. Now as you had selected to pin the new Database, so you should be able to see it in your dashboard like this:

6. Click on the shortcut and you will be taken to its Settings and Properties, where you can check resources, various options, tweak settings and do some admin related stuff.
 

7. The portal also provides a basic Query Editor to execute SQL Queries, check data in tables, and explore the database, check some DB objects, etc. After selecting the Query Editor you can click on Login button at the top. Provide the User Name and Password you applied while creating the Database.

8. Try explore the Sample Database, it gives limited feature to check Tables, Views and SPs. You can try writing a SQL Query and Run against the tables/views, or execute the SPs. The results will be shown in pane below.

9. You can also connect to this Azure SQL Database from SSMS installed in you PC or a remote server, which I’ll show in my next [blog post].
 

–> Terms used above:

1. Resource Group: is a collection of resources that share the same life cycle, permissions, and policies.

2. Server: or Logical Server acts as a central administrative point for multiple databases, including elastic pools logins, firewall rules, auditing rules, threat detection policies, and failover groups.

3. DTU (or Database Transaction Units): Microsoft guarantees a certain level of resources for that database (independent of any other database in the Azure cloud) and providing a predictable level of performance. This amount of resources is calculated as a number of Database Transaction Units or DTUs, and is a blended measure of CPU, memory, I/O (data and transaction log I/O).

4. Elastic Pools: provide a simple and cost effective solution for managing the performance of multiple databases within a fixed budget. An elastic pool provides compute (eDTUs) and storage resources that are shared between all the databases it contains. Databases within a pool only use the resources they need, when they need them, within configurable limits. The price of a pool is based only on the amount of resources configured and is independent of the number of databases it contains.