Archive

Archive for March, 2018

An introduction to Azure HDInsight – Microsoft’s Big-Data/Hadoop solution on Azure

March 3, 2018 1 comment

 
The Microsoft Azure portal has all the details on HDInsight and is very vast. Here in this post I’ve simply curated main and important stuff for myself and others to get started with HDInsight.
 

Azure HDInsight is a standard Apache Hadoop distribution offered as a managed service on Microsoft Azure. It is based on the Hortonworks Data Platform (HDP) and provisioned as clusters on Azure. The clusters can be created on your choice of Windows or Linux Servers.
 

What HDInsight offers:

1. Provides an end-to-end SLA on all your production workloads.
2. Enables you to scale workloads up or down anytime and only pay for what you use.
3. Protects and Secure your data as per government compliance.
4. Provide Log Analytics to monitor your clusters.
5. Globally availability in multiple regions.
6. Provides various productivity tools for development.
 

HDInsight enables a broad range of scenarios such as: Process & Analyze Big-Data, Batch Processing, in-memory processing ETL, Data Warehousing, Machine Learning, IoT and more, by using a broad spectrum of open-source frameworks, like Hadoop, Spark, Kafka, HBase, Hive, Storm and R Server.


 

HDInsight Cluster Types:

1. Hadoop: A simple Map-Reduce programming model to process and analyze batch data in parallel. [Apache Hadoop]

2. Spark: An open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications. [Apache Spark]

3. HBase: A NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semi-structured data. [Apache HBase]

4. R Server: A server for hosting and managing parallel, distributed R processes. It provides data scientists, statisticians, and R programmers with on-demand access to scalable, distributed methods of analytics on HDInsight.

5. Storm: A distributed, real-time computation system for processing large streams of data fast. [Apache Storm]

6. Hive: or Interactive Query (AKA: Live Long and Process), In-memory caching for interactive and faster Hive queries. [Apache Hive]

7. Kafka: An open-source platform that’s used for building streaming data pipelines and applications. Kafka also provides message-queue functionality that allows you to publish and subscribe to data streams. [Apache Kafka]
 

Other Components available with HDInsight:

Ambari Avro HCatalog
Mahout MapReduce YARN
Phoenix Pig Sqoop
Tez Oozie ZooKeeper
 

Storage options in HDInsight:

1. Azure Blob Store

2. Azure Data Lake Store

Azure Data Lake Store vs Azure Blob Storage
 

Role Based Security:

Owner Lets you manage everything
Contributor Lets you manage everything except access to resources
Reader Lets you view everything but not make changes
User Access Administrator Lets you manage user access to Azure resources

 

HDInsight security:

[Overview and more details on Microsoft Docs]


 


Advertisement

Download SQL Server 2017 for free (with full MSBI stack)

March 1, 2018 Leave a comment

 
With SQL Server 2014 Microsoft made its SQL Server Developer Edition free for Development and Test database in a non-production environment. This edition is not meant for Production environments or for use with production data.

SQL Server 2014 Dev Ed free

With SQL Server 2017 Developer edition developers can build any kind of application on top of SQL Server. It includes all the functionality of Enterprise edition, but is licensed for use as a development and test system, not as a production server.

So, with this free edition you get the Database Engine as well as full MSBI stack with DW/BI capabilities ( i.e. SSIS /AS /RS) for free 🙂
 

Downloads here:

SQL Server 2017 Developer Edition

SQL Server Management Studio (SSMS, latest version)

– Sample databases for SQL Server [AdventureWorks] [Wide World Importers]

SQL Operations Studio