SAP Builders Blog Posts
Learn from peers about their low-code journey and write your own blog posts to share your thoughts and experiences as you become an SAP Builder.
cancel
Showing results for 
Search instead for 
Did you mean: 
Dan_Wroblewski
Developer Advocate
Developer Advocate
740

Someone asked me how to scrape images from Google. I took it as a challenge.

Note that I did an image search, and took the Base64 versions of the images from the search page, which is how Google displays them. These are small, maybe 300-pixels square. If you wanted larger images, you'd extend the automation to click on the image to open the side panel, and use the right-click menu to save/download the image (my next challenge 😺). 

Dan_Wroblewski_0-1732004995216.png

  1. Open Google.
  2. Enter your keyword in the search bar (the keyword comes from an input parameter).
  3. Click somewhere to get rid of the dropdown that hides the search button.
  4. Click "Google Search".
  5. Now on the results page, click "Images".
  6. Wait – before I put this 1s wait, my automation was quite unstable, sometimes worked, sometimes didn't. I believe not all the elements were in place at the beginning so waiting made sure they were loaded.
  7.  
  8. Iterate over all the pictures. I declared as an element anything with class = YQ4gaf – this was contained in the img tags.
  9. I retrieved the src property for the img tag – which was the image in Base64.
  10. I decoded the base64.
  11. I wrote the file to the path that was sent as an input parameter.
  12. Finally I set a condition so I would only loop over the first 10 images (well, 11 since I set the condition wrong 😮.

I then created a process with a trigger form with 2 inputs, the keyword for the Google search and the local path to store the pictures. But I could almost as easily create a Automation Launcher and User Task so I could trigger it attended whenever I wanted. 

Dan_Wroblewski_1-1732006107030.png

 

How I Put It Together

Above looks like I knew what I was doing. But it took a little while, especially the following parts:

  • Recording the screens – this was BY FAR the most complicated part for me. I've tried the tutorials here and here, and hey are great, but I am still really not sure of myself on how the recording of screens works and the selecting of elements. Bu I'm getting there. I will work on it and create some videos on the opic.
  • Decoding Base64 – I was not sure how this was done, since we have documentation on using ctx library to do encoding/decoding, but I soon figured out how to use the Decode activity. I was already a little familiar with Base64 from SAP Build Apps, for which I created a blog on handling images, including in Base64 format.
  • Simulating Enter Key: This I did not even try. I would have needed it because when you enter a keyword the "Google Search" button is hidden, but online people said this was an issue. Instead, I simply clicked the Google logo to get rid of the dropdown that hid the button, and then used a click activity on the search button. 

 

Recording Screens

I did a recording by opening up Google in a new browser, and selecting Create → Application and selecting Recorder (not manual capture) and it created 2 screens – the Google home page, and then the results page.

Dan_Wroblewski_0-1732008943169.png

On the results page, I created 2 captures. The main results page, and then the results page for images.

Dan_Wroblewski_1-1732009176951.png

I also declared some elements I needed, like the search box and button. Most importantly, I declared the list of images, using the class as a criteria, and setting this to a collection.

Dan_Wroblewski_3-1732009330179.png

 

Automation and Process

The automation is created for you, based on the recording, but I found there were all kinds of extra artifacts that I needed to delete or combine. Not a big deal, just something you have to do.

And then I had to add the part where we iterate through the images, decode the Base64 strings, save them to a file, and add a condition to just take the first 10 images. All using simple activities. I also created an input parameter for the keyword, instead of the hard-coded "cat" I used in the recording.

Dan_Wroblewski_4-1732012404829.png

I created the artifacts for enabling this as an attended automation, creating:

  • Automation Launcher
  • User task (for entering the keyword and path)

Dan_Wroblewski_7-1732012732853.png

The automation launcher you would have to register in the Control Tower for your environment.

Dan_Wroblewski_6-1732012604599.png

And the user task you would have to add to your automation. Here is an example from my SAP CodeJam demo of sending emails using an automation.

Dan_Wroblewski_8-1732012788917.png

 

10 Comments