CRM and CX Blogs by SAP
Stay up-to-date on the latest developments and product news about intelligent customer experience and CRM technologies through blog posts from SAP experts.
cancel
Showing results for 
Search instead for 
Did you mean: 
joris_quenee
Product and Topic Expert
Product and Topic Expert
786

Introduction

Bots are massively used by third party to browse your website and your product catalogue. Bot is an automated tool that will collect key information as URL, Product data, Price, Performance, Quality, and so on.

They're not always fair and good for your business. Sometime, they're even hiding themselves to steal crucial data.

Ignoring this aspect can have negative impact in your business by for example impacting badly your website performance.

 

Rightful Bots

Most popular Bot is Googlebot which is used to provide you a ranking in thier search engine (SEO). This bot is well known (IP range is published) and it is respecting best practices as declaring itself (in HTTP Header / user agent) and crawling your website nicely without generating a peak of charge.

Second type of Bot is less known and less fair as AdsBot which is used to give a website quality score when you're hosting adverts. But sill, Bots IP range is published.

To manage those kind of bots and mitigate potential performance impacts, robots.txt must be setup in your SAP Commerce Cloud solution. 

 

Malicious Bots

B2C website is highly vulnerable to all kind of malicious bots.

With this kind of bots, aggressive selling down price strategy could be executed to be always a little bit cheaper that your public price. Then a not fair competitor can see their products always in top in price comparators system.

Or even worst, fake copy website can be built up automatically to capture orders under your brand.

Often those malicious bots are acting anonymously (no user agent declaration / dynamic IP range) and pretending to be a normal end user. They're also generating peak of charge to collect massively your data in shorten time as possible.

To sum up, malicious or not fair bots are a threats for your business because they can steal your data and impact badly your website response time.

 

Bots Filtering

It is not rare to see that half of traffic is coming from Bots. See in below a real use case example from CDN CloudFlare monitoring.

Capture d’écran 2024-09-10 à 09.07.53.png

We can see in green real user traffic and in orange bot detected traffic (pretending to be human or malicious ones).

Allow list

As you certainly understand, it is very difficult to identify and to filter rightful bots from others. In this case, one of best strategy consists to setup an allow list and reject the others.

Fair and good bots are publishing their IP range as Google. Then, we should consider other bots are a threat for your business and not as an opportunity.

By accepting all bots, you could see a peak of charge during day as following example

Capture d’écran 2024-09-10 à 09.17.01.png

Most common CDN solution is providing a Bot IP range filtering strategy. See in below an example from CDN CloudFlare

Capture d’écran 2024-09-10 à 10.06.20.png

 

Bot detection and blocking

When a Bot is not declaring itself, Bot IP range filtering strategy cannot be applied. Then only advanced and smart detection strategy can be used. As for example, CDN CloudFlare is proposing a Bot scoring to identify by different strategies a Bot from a real human.

In top of that, CAPTCHA can be setup to block potential smartest bots when score is too low. End user impact is negligible comparing the protection that offers for your business (less than 1% user could be challenged by CAPTCHA).

 

Conclusion

Bot filtering is a serious subject and it should not be unheeded. B2C website is specially vulnerable to this aspect. 

At least, a study should be executed to understand deeply how bot traffic is impacting your solution by using an advanced/smart bot detection. 

 SAP Expert Services can help you to perform this study and to apply best strategy according your business.

 

2 Comments
kai_unewisse
Explorer
0 Kudos

Hi Joris,
why is that following statement true ? "Bot IP range filtering strategy cannot be applied.." ?

When it comes to SAP commerce cloud (CCv2) you can blacklist certain IP Ranges that are known for the bad bots.
There are also GIT Repos available with lists of suspicious BOT IPs; i cannot find that currently again.

Another option for CCv2 is the build-in lightweight firewall , where you can limit the request per minute. However that 3 predefined limits are not very granular, but should target the bots, not real users.

 

 

joris_quenee
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi @kai_unewisse,

By experience, all strategies you're mentioned are not efficients. When Bot is not declaring itself, they're often using hyperscaler (AWS, Azure, GCP) to host their bots army. In this case, IP ranges can be changed easily. 

And SAP Commerce Cloud disallow list is not designed to be updated frequently. It is more to allow only IP from CDN, or/and from Employees, or/and to block an entire geographical plate (as Asia for example).

Lightweight firewall into SAP Commerce (Tomcat) won't be efficient. It is too far in downstream. You will have to setup larger hardware to support the workload / processing...

Unfortunately, modern CDN is the only serious option.