Archive

Archive for April 7, 2021

Connect to Cosmos DB from Databricks and read data by using Apache Spark to Azure Cosmos DB connector

April 7, 2021 1 comment

 

In this post we will using Databricks compute environment to connect to Cosmos DB and read data by using Apache Spark to Azure Cosmos DB connector.

 

First go to your Azure Databricks cluster and import the Azure Cosmos DB connector library. Download the library JAR from either [Maven links] or the [Uber JAR] on your local PC drive and install the new library.

Databricks CosmosDB Library

 

Now open a new Notebook with language as scala and use the code provided below.

To get the Cosmos DB instance Uri and Key go to the Azure portal -> Cosmos DB instance, from Overview tab go to Keys tab and copy the “URI” & “PRIMARY READ-ONLY KEY” key values in code below.

import org.joda.time._  
import org.joda.time.format._  

import com.microsoft.azure.cosmosdb.spark.schema._  
import com.microsoft.azure.cosmosdb.spark.CosmosDBSpark  
import com.microsoft.azure.cosmosdb.spark.config.Config  

import org.apache.spark.sql.functions._

val readerConfig = Config(Map( 
  "Endpoint" -> "https://YourCosmosDBname.documents.azure.com:443/", 
  "Masterkey" -> "YourPrimaryKey==", 
  "Database" -> "DatabaseName", 
  "Collection" -> "CollectionName"
  "query_custom" -> "select * from c" //optional
))

val df = spark.sqlContext.read.cosmosDB(readerConfig)
display(df)