Troubleshooting

How to delete Azure BLOB Snapshots using powershell

Recently, we came across a scenario, where the HDInsight Cluster was connected to an Azure blob and the spark jobs were failing, when deleting the parquet files, for which snapshot was already present.

Azure Snapshots, are just in time backups created based on the blob files.
We were unable to track on how these snapshots were created, but the important task at hand was to delete the snapshots, so that spark could continue processing regular parquet files.

Identifying and deleting the snapshots manually would have taken ages, as its a very cumbersome process, when done through Azure Storage Explorer.
I did some research on automating this effort, as there are already client libraries available for Azure BLOB Storage with Java, Powershell, Python.
As I am efficient with Powershell and Azure Powershell is more native to Microsoft Cloud stack, I decided to go with using Powershell to automate this effort.

If you are starting new with this, I would recommend going ahead with reading this post first.

Pre-Requisites
1. Powershell 5.0 and above
2. Azure Powershell Module
3. Azure Storage account name and key for access through powershell

We start with creating a context, this is more of establishing a connection with Azure Storage account.

$StorageAccountName = "dummyaccount"
$StorageAccountKey = "xxxx"
$ContainerName = "containername"
$BlobName = "fact" 
$tx = New-AzureStorageContext -StorageAccountName $StorageAccountName -StorageAccountKey $StorageAccountKey

The next cmdlet Get-AzureStorageContainer, gets connection object to the container, you can many containers inside a storage account, we can connect to a single storage container using the below command.

$blobObject = Get-AzureStorageContainer -Context $Ctx -Name $ContainerName

Now that you are connected to the container, we can query the container to check all the BLOB files as below, the function ListBlobs, takes Blob name (this can be a directory or path to a directory OR the prefix of the blob name) and boolean parameter, which is used for checking the flat structure and not the hierarchical structure. So in the below case, it will only list all files and folders under the mentioned bloc and not its sub directories.

For more information, refer to this page

$ListOfBlobs = $blobObject.CloudBlobContainer.ListBlobs($BlobName, $true, "Snapshots")

For deleting the snapshots found, from the above command, we loop over the result set and call a DELETE method over the blob, which is an actual snapshot.

foreach ($CloudBlockBlob in $ListOfBlobs)
{
  if ($CloudBlockBlob.IsSnapshot) {
    Write-Host "Deleting $($CloudBlockBlob.Name), Snapshot was created on $($CloudBlockBlob.SnapshotTime)"
    $CloudBlockBlob.Delete() 
  }
}

This is how you delete all the snapshots under a specific BLOB folder or a blob file directly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s