Home > Hadoop, Microsoft Azure > What is Lambda Architecture? and what Azure offers with its new Cosmos DB?

What is Lambda Architecture? and what Azure offers with its new Cosmos DB?

February 16, 2018 Leave a comment Go to comments

 
Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch processing and stream processing methods, and minimizing the latency involved in querying big data.

It is a Generic, Scalable, and Fault-tolerant data processing architecture to address batch and speed latency scenarios with big data and map-reduce.

–> The system consists of three layers: Batch Layer, Speed Layer & Service Layer

1. All data is pushed into both the Batch layer and Speed layer.

2. The Batch layer has a master dataset (immutable, append-only set of raw data) and pre-computes the batch views.

3. The Serving layer has Batch views for fast queries.

4. The Speed Layer compensates for processing time (to the serving layer) and deals with recent data only.

5. All queries can be answered by merging results from Batch views and Real-time views or pinging them individually.
 

Lambda Architecture with Azure:

Azure offers you a combination of following technologies to accelerate real-time big data analytics:

1. Azure Cosmos DB, a globally distributed and multi-model database service.

2. Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics applications.

3. Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process.

4. The Spark to Azure Cosmos DB Connector

How Azure simplifies the Lambda Architecture:

1. All data is pushed into Azure Cosmos DB for processing.

2. The Batch layer has a master dataset (immutable, append-only set of raw data) stored in Azure Cosmos DB. Using HDI Spark, you can pre-compute your aggregations to be stored in your computed Batch Views.

3. The Serving layer is an Azure Cosmos DB database with collections for the master dataset and computed Batch View for fast queries.

4. The Speed layer compensates for processing time (to the serving layer) and deals with recent data only. It utilizes HDI Spark to read the Azure Cosmos DB change feed. This enables you to persist your data as well as to query and process it concurrently.

5. All queries can be answered by merging results from batch views and real-time views, or pinging them individually.
 

–> For complete details check here in Microsoft Docs: Azure Cosmos DB: Implement a lambda architecture on the Azure platform


Advertisement
  1. Binoy
    February 16, 2018 at 4:15 pm

    Really a new but tough topic. U-SQL also uses LAMBDA architecture to execute the query. It will be more helpful, if you can add a recording explaining the arch with some example.

    • February 16, 2018 at 4:44 pm

      Thanks for your comments @Binoy, I’ve just started exploring the above stuff. Please feel free to add from your side to enrich this post/blog.

  1. February 20, 2020 at 4:05 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: