Archive

Archive for the ‘Microsoft Azure’ Category

Cosmos DB – Retrieve all attributes/fields from the JSON document of all Collections (using .net)

April 6, 2021 1 comment

 

In our [previous post] we saw how to get COUNT of items/records from all Collections in all Databases of a CosmosDB instance by programmatically/dynamically iterating over all these objects.

 

Here in this post we will extend it and see how to retrieve all attributes/fields from the JSON document/items present in these Collections. (This is some work I’m doing for maintaining a data catalogue and data classification for privacy and compliance purpose.)

 

On Azure portal go to your Azure Cosmos DB instance, from Overview tab go to Keys tab and copy the “URI” & “PRIMARY READ-ONLY KEY” key values in code below.

We have discussed all the methods used below in our [previous post], please check here. As the Cosmos DB SQL container is a schema-agnostic and the items in a container can have arbitrary schemas, thus we will read the first JSON document/item from the Collection by using FirstOrDefault() method and Deserialize this JSON document Object to a Dictionary, and finally read all the attributes from this Dictionary Keys:

 

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace CosmosDBgetJSONAttributes
{
    class Program
    {
        static void Main(string[] args)
        {
            string EndPointUri = "https://YourCosmosDBname.documents.azure.com:443/";
            string PK = "YourPrimaryKey==";

            DocumentClient cosmosClient = new DocumentClient(new Uri(EndPointUri), PK);

            string sqlQuery = "SELECT * FROM d";
            var dbs = cosmosClient.CreateDatabaseQuery(sqlQuery).ToList();

            foreach (var db in dbs)
            {
                Console.WriteLine("Database: " + db.id);

                List<DocumentCollection> collections = cosmosClient.CreateDocumentCollectionQuery((String)db._self).ToList();

                foreach (var collection in collections)
                {
                    Console.WriteLine("  - Collection: " + collection.Id);

                    var docu = cosmosClient.CreateDocumentQuery(collection.SelfLink, $"SELECT TOP 1 * FROM c",
                                    new FeedOptions() { EnableCrossPartitionQuery = true, MaxItemCount = 1, })
                                    .AsEnumerable().FirstOrDefault();
                    if (docu != null)
                    {
                        var docuStr = Convert.ToString(docu);
                        var data = JsonConvert.DeserializeObject<Dictionary<string, string>>(docuStr);
                        foreach (var item in data.Keys)
                        {
                            Console.WriteLine("    - Attribute: " + item);
                        }
                    }
                    else
                    {
                        Console.WriteLine("    - Attribute: no data");
                    }
                }
                Console.WriteLine(Environment.NewLine);
            }
            Console.WriteLine("Finished reading all Collections attributes");
            Console.Read();
        }
    }
}

Cosmos DB – Get record COUNT from all Collections in all Databases (using .net)

April 1, 2021 2 comments

 

Here in this post we will use C# .net code (for beginners like me) to see how to:
1. Connect to a Cosmos DB instance
2. Get list of all Databases in a Cosmos DB
3. Iterate through all the Databases and get the list of all Collections (or Tables)
4. Get COUNT of all documents/items (or records) in these Collections

 

On Azure portal go to your Azure Cosmos DB instance, from Overview tab go to Keys tab and copy the “URI” & “PRIMARY READ-ONLY KEY” key values:

1. First we will use the Uri & Primary Key to connect to CosmosDB service by using DocumentClient() method.

2. Then we will form a simple query to retrieve all Databases, will use with CreateDatabaseQuery() method to retrieve list of all databases.

3. Now to get list of all Collections present in each Database we will iterate through each Database, use the CreateDocumentCollectionQuery() method using the Database instance with Self link property.

4. Now to get the Document link we will also iterate through each Collection by using CreateDocumentQuery() method.

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;

namespace CosmosTables
{
    class ReadCosmosDB
    {
        static void Main(string[] args)
        {
            string EndPointUri = "https://YourCosmosDBname.documents.azure.com:443/";
            string PK = "YourPrimaryKey==";

            DocumentClient cosmosClient = new DocumentClient(new Uri(EndPointUri), PK);

            string sqlQuery = "SELECT * FROM d";
            var dbs = cosmosClient.CreateDatabaseQuery(sqlQuery).ToList();

            foreach (var db in dbs)
            {
                Console.WriteLine("Database: " + db.id);
                Console.WriteLine("- Collections: "); 

                List<DocumentCollection> collections = cosmosClient.CreateDocumentCollectionQuery((String)db._self).ToList();

                foreach (var collection in collections)
                {

                    var count = cosmosClient.CreateDocumentQuery<int>(collection.SelfLink, $"SELECT value count(1) FROM c",
                                    new FeedOptions(){EnableCrossPartitionQuery = true, MaxItemCount = 1,}).AsEnumerable().First();

                    Console.WriteLine("  - " + collection.Id + $", Count: " + count);
                }
                Console.WriteLine(Environment.NewLine);
            }
            Console.WriteLine("Finished reading all DB Collections");
        }
    }
}

Access VSO Work Items in Postman, by using Azure DevOps REST APIs, using PAT (Personal Access Token) authentication

March 4, 2021 1 comment

 
Here in this post we will try to access VSO Work Items from Postman tool by using Azure DevOps REST APIs. To access and authenticate into Azure DevOps we will first create PAT (Personal Access Token) and then use it in the Postman tool.
 

Create Token in Azure DevOps portal:

 

1. Go to https://dev.azure.com/{YourOrg}/. On the top-right corner of the browser click on “User Settings” icon, and select “Personal access token” option from the drop down.


 

2. Now on the settings page click on the + icon to create a new Token, provide a new Name and Expiry Date of the Token, and scroll down.


 

3. At the bottom of the page you will see “Work Items” section, just select the “Read & write” check box, and click Create.


 

4. Now your Token is created, just copy it by clicking on the Copy button and save it somewhere securely. Otherwise you won’t be seeing it again on the portal for security reasons.


 

Using Postman to access VSO Work Items:

 

5. Download & Install Postman from here [link]
 

6. Open Postman tool, create a new Collection and add a new Request of type GET.

Populate following URL: https://dev.azure.com/{{YourOrg}}/{{YourProject}}/_apis/wit/workitems/
{{workitemId}}?api-version=6.0

Auth Tab:
  — Type = “Basic Auth”
    — UserName: {{PAT name}}
    — Password: {{Copied PAT secret from Azure DevOps portal}}


 

6. Now clicking on Send button will return you the Response with all the VSO Work Item attributes and details.


REST API, PATCH request error, Postman – Valid content types for this method are: application/json-patch+json

March 2, 2021 Leave a comment

 
I was trying to use the Azure DevOps REST APIs to get some details of VSO WorkItems by using Postman tool. Doing GET-Request was not a problem and it returned the expected JSON response body with all attributes and details.

But when I tried to update the same VSO WorkItem by using POST request it resulted in below error:

{
“$id”: “1”,
“innerException”: null,
“message”: “The request indicated a Content-Type of \”application/json\” for method type \”PATCH\” which is not supported. Valid content types for this method are: application/json-patch+json.“,
“typeName”: “Microsoft.VisualStudio.Services.WebApi.VssRequestContentTypeNotSupportedException, Microsoft.VisualStudio.Services.WebApi”,
“typeKey”: “VssRequestContentTypeNotSupportedException“,
“errorCode”: 0,
“eventId”: 3000
}

 

Solution:

As the above error mentions to use “application/json-patch+json” as the Content Type, so the same has to be added under the Headers tab as Key Value, like:

KEY: Content-Type
VALUE: application/json-patch+json


 

If you observe similar error while working with PowerShell, you’ve to pass the same value with ContentType parameter in the Invoke-RestMethod command:

Invoke-RestMethod -Method Post -Uri $theUri -Headers $theHeader -ContentType “application/json-patch+json” -Body $theBody


Spark error – Parquet does not support decimal. See HIVE-6384

August 5, 2020 1 comment

 
I was creating a Hive table in Databricks Notebook from a Parquet file located in Azure Data Lake store by following command:

val df = spark.read.parquet(
 "abfss://adlsstore@MyStorageAccount.dfs.core.windows.net/x/y/z/*.parquet")

df.write.mode("overwrite").saveAsTable("tblOrderDetail")

But I was getting following error:

warning: there was one feature warning; re-run with -feature for details
java.lang.UnsupportedOperationException: Parquet does not support decimal. See HIVE-6384

 
As per the above error it relates to some Hive version conflict, so I tried checking the Hive version by running below command and found that it is pointing to an old version (0.13.0). This version of Hive metastore did not support the BINARY datatypes for parquet formatted files.

spark.conf.get("spark.sql.hive.metastore.version")


 

Also as per this Jira Task on HIVE-6384 the support for multiple datatypes was implemented for Parquet SerDe in Hive 1.2.0 version.

 
So to update the Hive metastore to the current version you just need to add below commands in the configuration of the cluster you are using.

Click on “Clusters” –> click “Edit” on the top –> expand “Advanced Options” –> under “Spark” tab and “Spark Config” box add the below two commands:

spark.sql.hive.metastore.version 1.2.1
spark.sql.hive.metastore.jars builtin

You just need to restart the cluster so that the new settings are in use.
 


 

Some similar errors:
– Parquet does not support date
– Parquet does not support timestamp