In this blog we are going to see how we can use the SPA Google Cloud Storage SDK to automate Google Cloud Storage
Google Cloud Storage provides an object storage repository like Amazon S3. An object is basically any sort of file which needs to be stored to be delivered later via a Content Delivery Network. The object can also be archived in case of long-term storage.
The first thing needed to use Cloud storage is a billable project in GCP. Inside this project you can create multiple ‘buckets’ into which your objects will go.
The overall idea behind this organization is that, if your company has multiple apps, you can go ahead and create a project for each of those and place the content used by those apps in a bucket. Alternatively, you can also have a single project, place the data of your different apps in different buckets respectively.
The Google Cloud Storage SDK can help you automate everything from bucket creation to uploading objects.
Below scenario explains how you can you use this SDK. But here are some pre-requisites:
You need a project created in Google Cloud Platform and a service account created inside that project. This page explains how you can create a service account in GCP, also make sure you have the Cloud Platform scope enabled: https://www.googleapis.com/auth/cloud-platform. Once the service account is created, go to IAM in GCP and make sure you have the following permissions enabled for your service account:
In addition to this, depending on your organizations’ cloud policies, you might have to enable ‘Storage Object Admin’ permission for the service account on each bucket that has been created:
On the SPA side of things, we need to create an automation and add the Google Authorization SDK and the Google Cloud Storage SDK.
Authorize the bot, using service account authentication.
Create a bucket
Upload multiple objects in a folder within that bucket
While uploading we will compress certain objects which are bigger in size.
Iterate through the list of objects in that folder.
Download them one by one.
Let’s see how each step is done in detail below:
Authorize the bot:
Once you create a service, you need to create a key file, which would then have to be provided to the “Authorize Google (Service Account)” activity. This activity also needs to know the list of scopes for which bot has access, in this case it needs only the “GoogleCloudPlatformScope”. For further details about Authorization, you can refer this blog .
Create a bucket
We then add the create bucket activity. This activity takes as mandatory input, the bucket name, and the project id of the GCP project in which this bucket must be created. Here you must give a globally unique name, and the project id can be procured from the GCP console. The highlighted area below is the project id.
There is non-mandatory input called the storage class, it is based on this input that the SLAs are guaranteed. If you don’t provide a value, we will use the default ‘Standard’ storage class. You can also read more about storage classes here.
Next, we are going to upload 2 files into the bucket. One of them is sample purchase order which is a PDF and the other one is a MP4 file explaining how to create a purchase order. Let’s look at the screen shot below:
As you can see, we have used the “Upload Object (Google Cloud Storage)” activity to perform the upload. The first parameter is the path of the file that is to be uploaded. The second parameter is the bucket name which we have procured from the output of the “Create Bucket (Google Cloud Storage)” activity from Step 2. The third parameter “destinationPath” which is not mandatory, asks you for a specific folder in the bucket into which the object needs to be uploaded. In this case we want them to be stored under the purchaseOrders folder. If the folder does not exist, one is created for you.
Once the activity runs, you can see the PDF uploaded into your bucket under the purchaseOrders folder.
The MP4 file is also uploaded in a similar manner:
But you might notice one minor difference here. Here the gzip parameter is set to true. This means that the MP4 file will be compressed and uploaded. This is useful when uploading large files, as it will help to reduce network costs. As you can see below, GCP also indicates that the content is compressed.
Get Objects in the purchaseOrders folder:
In this step we have used the “List Objects (Google Cloud Storage)” activity in order to list files inside a bucket. Notice that in the prefix parameter we have provided “purchaseOrders/” this means that we want to list the objects only in the purchaseOrders folder.
In this step, we are going to take each of the objects that we got from the previous list activity and then download them to a location on the file system:
As you can see, in the “Download Object (Google Cloud Storage)” activity, we have specified the folder into which the objects must be downloaded in the “pathToDownload” parameter and the object name can be procured from the result of the previous “List Objects (Google Cloud Storage)” activity.
And that's it.
As we have seen, the Google Cloud Storage SDK, can help you perform all the above steps and much more. It is especially useful when used in concert with Google Vision AI and Google Doc AI SDK. For example, the Google Doc AI SDK, takes a file such as a purchase order, sales order etc which is generally in PDF format and then infers information about that file. Such files are usually read from the file system, but if the file is too big (>10 MB), then Google mandates that such files need to be served as input from Google Cloud Storage. So, you can use the Google Cloud Storage SDK to upload files first into GCS and retrieve them later for further processing.
Your feedbacks and thoughts are most welcome, kindly leave those in the comments and we will get back to you.