Marcel Cremer | Tech, Digitalisation and Entrepreneurship
February 20, 2023

How to batch download Azure Blobs or AWS S3 Files

Posted on February 20, 2023
5 minutes  • 949 words  • Other languages:  Deutsch

If you are working in the cloud, you probably came across a requirement to Batch download files from cloud storage at some time. Wether you want to provide the reports of last year, files that were uploaded by users or other use cases - providing an archive of data is a not-so-uncommon use-case. However, when I researched if it’s possible to do this with Microsoft Azure, there were only examples like here (MSDN, external) where you would download the files either one-by-one or in parallel, but not in batch.

The article will first explain my journey to the current solution. For TLDR, click here.

The “easy” server-side solution

So when I came across the requirement, I implemented this “Best practise” approach and failed miserably. The flow that was executed was similiar to the following:

Client requests archive via RESTful-API

For a few, small files, this approach will even work fine. However, as soon, as the amount of data grows, you run into some severe problems:

…and probably some others, that I forgot right now. Not a very viable solution, that has a lot (and I have experience by now with it) of potential to fail.

The “more-complex” server-side solution

As a realtime delivery of the data is not possible, my next solution was to offboard the task, which was also an inspiration to write my article about KEDA before. The flow now differed a bit for the client, but could work completely in the background:

Client requests archive via RESTful-API Server answers, that the request is processed in the background

The job then basically does the same as the server before, but does not send the result to the client. Instead, it will store it again on an azure BLOB storage and afterwards generates a link, that the client will receive.

The background job now creates a few other issues:

So even now, that we’ve removed the workload from our core services, we are still struggling with plenty of problems, that we don’t want to have and additionally, the process becomes more complex. If we could just send the files directly from Azure or AWS to the user. But that’s not possible… or is it?

A client-based solution for downloading the zip archive

Let’s go back to the original requirement: We want to

Is that even possible, without an Azure or AWS batch download? Yes, it is!

I created a proof-of-concept in this repository here . The demo is written with Vue, but you really don’t need to understand Vue to get, what’s going on there:

Whenever you click the Client Batch Download Example-Button, it will trigger the downloadAndStoreAsZip-function that tries to download 1000 files from a robot-image generator called robohash.org(which is pretty cool btw). Per default, the downloadInParallel-function now creates fetch-promises for 10 images at a time and downloads them as a blob into the RAM. If you want to, it could also “throttle” the next batch by waiting some seconds - otherwise it will continue until everything is downloaded. At the same time, the progress is updated after every batch that was downloaded, so that we can visualize it to the client. Finally, we use JSZip to zip everything and provide the download to the client.

Azure Batch Download Progress Bar

That’s much better than any other solution, that we had before:

However, there are also two downsides:

For the first issue, there’s a possible solution that is directly linked by our utilised FileSaver library: StreamSaver makes it possible, to write the blobs directly to the local hard disc while still downloading the other files.

For the second downside, I have to admit that I’m willing to accept it, because the benefits outweigh the downsides by far.

Follow me

Ich arbeite an der Saas-Plattform MOBIKO, baue Teams auf und gebe manchmal Talks.