Troubleshooting

How to delete Azure BLOB Snapshots using powershell

Recently, we came across a scenario, where the HDInsight Cluster was connected to an Azure blob and the spark jobs were failing, when deleting the parquet files, for which snapshot was already present.

Azure Snapshots, are just in time backups created based on the blob files.
We were unable to track on how these snapshots were created, but the important task at hand was to delete the snapshots, so that spark could continue processing regular parquet files.

Identifying and deleting the snapshots manually would have taken ages, as its a very cumbersome process, when done through Azure Storage Explorer.
I did some research on automating this effort, as there are already client libraries available for Azure BLOB Storage with Java, Powershell, Python.
As I am efficient with Powershell and Azure Powershell is more native to Microsoft Cloud stack, I decided to go with using Powershell to automate this effort.

If you are starting new with this, I would recommend going ahead with reading this post first.

Pre-Requisites
1. Powershell 5.0 and above
2. Azure Powershell Module
3. Azure Storage account name and key for access through powershell

We start with creating a context, this is more of establishing a connection with Azure Storage account.

$StorageAccountName = "dummyaccount"
$StorageAccountKey = "xxxx"
$ContainerName = "containername"
$BlobName = "fact" 
$tx = New-AzureStorageContext -StorageAccountName $StorageAccountName -StorageAccountKey $StorageAccountKey

The next cmdlet Get-AzureStorageContainer, gets connection object to the container, you can many containers inside a storage account, we can connect to a single storage container using the below command.

$blobObject = Get-AzureStorageContainer -Context $Ctx -Name $ContainerName

Now that you are connected to the container, we can query the container to check all the BLOB files as below, the function ListBlobs, takes Blob name (this can be a directory or path to a directory OR the prefix of the blob name) and boolean parameter, which is used for checking the flat structure and not the hierarchical structure. So in the below case, it will only list all files and folders under the mentioned bloc and not its sub directories.

For more information, refer to this page

$ListOfBlobs = $blobObject.CloudBlobContainer.ListBlobs($BlobName, $true, "Snapshots")

For deleting the snapshots found, from the above command, we loop over the result set and call a DELETE method over the blob, which is an actual snapshot.

foreach ($CloudBlockBlob in $ListOfBlobs)
{
  if ($CloudBlockBlob.IsSnapshot) {
    Write-Host "Deleting $($CloudBlockBlob.Name), Snapshot was created on $($CloudBlockBlob.SnapshotTime)"
    $CloudBlockBlob.Delete() 
  }
}

This is how you delete all the snapshots under a specific BLOB folder or a blob file directly.

Troubleshooting

How to resolve GPG keys issue for Azure CLI ?

Today blog is a part of troubleshooting series.

We use Azure-CLI to transfer files from our Linux VM to Azure File Share storage account. This is a very typical use case, where we want to transfer files from our VM’s to Azure Storage and typically to Azure file Share account.

More about AZURE-CLI can be found here

Recently we found our apps failing with the below Error.

While initializing the apps, we had the Azure-CLI installation steps mentioned here in docker files.

Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
[91mwarning: /var/cache/yum/x86_64/7/azure-cli/packages/azure-cli-2.0.66-1.el7.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID xxxxxxxx: NOKEY
[0mPublic key for azure-cli-2.0.66-1.el7.x86_64.rpm is not installed
--------------------------------------------------------------------------------
Total                                               33 MB/s |  38 MB  00:01     
Retrieving key from https://packages.microsoft.com/keys/microsoft.asc

The GPG keys listed for the "Azure CLI" repository are already installed but they are not correct for this package.
Check that the correct key URLs are configured for this repository.


 Failing package is: azure-cli-2.0.66-1.el7.x86_64
 GPG Keys are configured as: https://packages.microsoft.com/keys/microsoft.asc

Reason:

It seems the while following the installation steps for Azure CLI we followed the steps as below

Step 1 : We download the GPG key from Microsoft, as this rpm needs to be downloaded from Microsoft repository.

It seems the while following the installation steps for Azure CLI we followed the steps as below

sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc

Step 2 : Create local azure-cli repository information.

sudo sh -c 'echo -e "[azure-cli]\nname=Azure CLI\nbaseurl=https://packages.microsoft.com/yumrepos/azure-cli\nenabled=1\ngpgcheck=1\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/azure-cli.repo'

Step 3 : Install with the yum install command.

sudo yum install azure-cli

These steps were followed daily in an automated way, but, suddenly we found the Step 3 command started failing yesterday.
When digged further, we found the GPG keys that was downloaded from Microsoft changed and was not matching with the current version of GPG for azure-cli rpm downloaded as a part of Step 3.

Resolution
Our Architect helped us to find a resolution to this issue, by disabling the GPG check in Step 2.
As a part of this resolution, every thing remains same as a part of this installation of Azure-cli, except for Step 2

Step 2: Create local azure-cli repository information.

sudo sh -c 'echo -e "[azure-cli]\nname=Azure CLI\nbaseurl=https://packages.microsoft.com/yumrepos/azure-cli\nenabled=1\ngpgcheck=0\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/azure-cli.repo' 

After this when you run Step3, the installation goes fine.

Maybe this is not the most perfect solution, but this helped us to get away with the problem.

Troubleshooting

Troubleshooting series

This series is going to be help you troubleshoot issues related to Big Data stack.

This series will include a good number of troubleshooting instances faced by me during daily implementation cycles with respect to technoliogies like Spark, Azure, K8, Docker , Powershell, Microservices, Python, Vertica and much more….

We will have the blog posts named as tbit{#}, which will denote troubleshooting bits and the seq number associated to it.

Stay tuned to this series for meaningful insights and faster solving of your issues.