SAP Builders Blog Posts
Learn from peers about their low-code journey and write your own blog posts to share your thoughts and experiences as you become an SAP Builder.
cancel
Showing results for 
Search instead for 
Did you mean: 
Dan_Wroblewski
Developer Advocate
Developer Advocate
837

Someone asked me how to scrape images from Google. I took it as a challenge.

Note that I did an image search, and took the Base64 versions of the images from the search page, which is how Google displays them. These are small, maybe 300-pixels square. If you wanted larger images, you'd extend the automation to click on the image to open the side panel, and use the right-click menu to save/download the image (my next challenge 😺). 

Dan_Wroblewski_0-1732004995216.png

  1. Open Google.
  2. Enter your keyword in the search bar (the keyword comes from an input parameter).
  3. Click somewhere to get rid of the dropdown that hides the search button.
  4. Click "Google Search".
  5. Now on the results page, click "Images".
  6. Wait – before I put this 1s wait, my automation was quite unstable, sometimes worked, sometimes didn't. I believe not all the elements were in place at the beginning so waiting made sure they were loaded.
  7.  
  8. Iterate over all the pictures. I declared as an element anything with class = YQ4gaf – this was contained in the img tags.
  9. I retrieved the src property for the img tag – which was the image in Base64.
  10. I decoded the base64.
  11. I wrote the file to the path that was sent as an input parameter.
  12. Finally I set a condition so I would only loop over the first 10 images (well, 11 since I set the condition wrong 😮.

I then created a process with a trigger form with 2 inputs, the keyword for the Google search and the local path to store the pictures. But I could almost as easily create a Automation Launcher and User Task so I could trigger it attended whenever I wanted. 

Dan_Wroblewski_1-1732006107030.png

 

How I Put It Together

Above looks like I knew what I was doing. But it took a little while, especially the following parts:

  • Recording the screens – this was BY FAR the most complicated part for me. I've tried the tutorials here and here, and hey are great, but I am still really not sure of myself on how the recording of screens works and the selecting of elements. Bu I'm getting there. I will work on it and create some videos on the opic.
  • Decoding Base64 – I was not sure how this was done, since we have documentation on using ctx library to do encoding/decoding, but I soon figured out how to use the Decode activity. I was already a little familiar with Base64 from SAP Build Apps, for which I created a blog on handling images, including in Base64 format.
  • Simulating Enter Key: This I did not even try. I would have needed it because when you enter a keyword the "Google Search" button is hidden, but online people said this was an issue. Instead, I simply clicked the Google logo to get rid of the dropdown that hid the button, and then used a click activity on the search button. 

 

Recording Screens

I did a recording by opening up Google in a new browser, and selecting Create → Application and selecting Recorder (not manual capture) and it created 2 screens – the Google home page, and then the results page.

Dan_Wroblewski_0-1732008943169.png

On the results page, I created 2 captures. The main results page, and then the results page for images.

Dan_Wroblewski_1-1732009176951.png

I also declared some elements I needed, like the search box and button. Most importantly, I declared the list of images, using the class as a criteria, and setting this to a collection.

Dan_Wroblewski_3-1732009330179.png

 

Automation and Process

The automation is created for you, based on the recording, but I found there were all kinds of extra artifacts that I needed to delete or combine. Not a big deal, just something you have to do.

And then I had to add the part where we iterate through the images, decode the Base64 strings, save them to a file, and add a condition to just take the first 10 images. All using simple activities. I also created an input parameter for the keyword, instead of the hard-coded "cat" I used in the recording.

Dan_Wroblewski_4-1732012404829.png

I created the artifacts for enabling this as an attended automation, creating:

  • Automation Launcher
  • User task (for entering the keyword and path)

Dan_Wroblewski_7-1732012732853.png

The automation launcher you would have to register in the Control Tower for your environment.

Dan_Wroblewski_6-1732012604599.png

And the user task you would have to add to your automation. Here is an example from my SAP CodeJam demo of sending emails using an automation.

Dan_Wroblewski_8-1732012788917.png

 

11 Comments
Dan_Wroblewski
Developer Advocate
Developer Advocate
0 Kudos

One thing I forgot to mention. When you are recording your screen, for example with a search term "cat", the automation will set the criteria for recognizes the screen by the word "cat" being in the title.

Dan_Wroblewski_0-1732016602618.png

You have to change his, for example, to contains "Google Search".

 

Dan_Wroblewski
Developer Advocate
Developer Advocate

And if you want to get larger images, you can define a new screen for when selecting one of the images.

Dan_Wroblewski_0-1732016703300.png

The criteria for that picture I used was its class.

Then I retrieve its src property and use the Download File activity. All very simple.

Dan_Wroblewski_0-1732018541109.png

 

 

YashminBehera
Explorer
0 Kudos

Hi @Dan_Wroblewski 

I am doing the same work but with capture screen and I get error in the get property(element) when the automation works, is there something I can change or edit in the capture screen application??

YashminBehera
Explorer
0 Kudos

I am unable to retrieve its src property, can you guide me with it.
(I am using capture screen)

@Dan_Wroblewski 

Dan_Wroblewski
Developer Advocate
Developer Advocate
0 Kudos

I had to play around a little bit with the recording, not complicated but a few tweaks.

  • I did Recorded but manually captured any new screen that had some element I needed to manipulate. I ended up with 2 screens, the Google home page and the results page, and I had 3 captures on the results page: results, results for images, results for images with an image selected.Dan_Wroblewski_0-1732020026946.png
  • As far as Declared Elements, I had the following:
    • Google Home - Search box
    • Google Home: Google logo
    • Google Home: Google Search button
    • Results Page: All of the returned images, as collection (using class YQ4gaf)
    • Results Page: "Images" filter
    • Results Page: Enlarged image that appears when you click one of the images (using class sFlh5c FyHeAf iPVvYb)

Can't guarantee these classes are stable and will work tomorrow, but they worked now with multiple searches.

As for the Get Property, for your element you should be able to see the attributes the automation finds (in criteria screen) and these should be available.

YashminBehera
Explorer
0 Kudos

Hi @Dan_Wroblewski 

I would like to ask to that while using the Download File activity what are you setting as the URL???

And I would like to ask how to achieve this web-scraping images and saving it to the local folder using SAP BTP trial Account since DMS is not available for it.

Thank You
Yashmin Behera

Dan_Wroblewski
Developer Advocate
Developer Advocate
0 Kudos

1)

Download file takes the URL output by Get Property.

Get Property just asks for "src" from one of the declared elements.

Dan_Wroblewski_0-1732203741160.png

The declared element is the enlarged image displayed whenever you click one of the smaller images. Instead of Base64 string you get a real URL in the src property of that element.

Dan_Wroblewski_1-1732203947187.png

 

2) I do not use DMS. I think this should work on trial.

Unless you are saying that YOU want to bring it into DMS. Yes, you cannot create a DMS repo in trial, but you can reference DMS on another BTP account from trial.

 

  

YashminBehera
Explorer
0 Kudos

Hi @Dan_Wroblewski 

Unable to declare the enlarged picture using class as recognition criteria as a uniquely defined element.

YashminBehera_0-1732262861069.png


Thank You
Yashmin Behera

Dan_Wroblewski
Developer Advocate
Developer Advocate

@YashminBehera I used class "sFlh5c FyHeAf iPVvYb", which is more detailed than yours. I have to say that I do not know if these classes change from time to time or if different images will display differently, but it worked for me.

I may publish the project if there is interest.

YashminBehera
Explorer
0 Kudos

@Dan_Wroblewski  Yes that would be great.

Thank You so much!