Every content or E-commerce player lives with the constant threat of web bot attacks and scraping attempts on the site to steal the data in an automated way. The problem becomes serious enough when such data is then used by the competitors to offer competitive pricing. It becomes a threat to your business functioning.
Besides having stealing attempts, the site also suffers availability issues due to these slow moving bots using your valuable server resources to download the data. Many of the large E-commerce websites use Web Application firewalls to detect the Application level OWASP security attacks or CDN systems to avoid IP and DNS based attacks. However, the scraping attempts are usually made by human-like browsers running on real machines and with a frequency small enough to not trigger the rate limiting algorithm offered by the Web Application Firewalls. Such traffic mixes well into the overall traffic and the E-commerce player is generally not able to detect and stop these attempts. One such example is given below, where one of the Gemini customers was facing a pattern of suspected traffic at the early morning time.
Web Bot Detection:
One of the distinguishing features of Gemini is our integrated detection of the Web bots and scrapers. So apart from the Site speed and User Experience details, we also detect the Web bot or scraping behaviour and give insights about it. In this case, this Gemini customer is an E-commerce player having about 3-4 Million daily page views. However the scraping traffic appears to be around 100 K daily, so just around 3% of the total traffic. Generally it is very difficult to detect such a low traffic spread across a long time duration.
We have developed our algorithm in such a way that we are able to detect non-human behavior with almost 90% probability, even with such a low traffic. Expanding on the above traffic characteristics, Gemini also identified their behaviour as follows.
- Referrer – No referrer for 50% of the suspect traffic. (means 50% direct landing traffic, initiated by someone on purpose)
- Same Domain Referrer – Remaining 50% of the traffic is having the same domain referrer. This means the most sessions consist of probably 2 pages only. First page is a direct landing view and the next page link is obtained from the first page in most probably an automated way.
- Viewport – Almost 30% of the suspect traffic is having viewport 1200X1100, a fixed one.
- Browser and version – Almost 32% of the suspect traffic is from Chrome 86. This browser was launched in 2020. Quite an old browser to generate this specific behaviour.
- Traffic Pattern – The suspect traffic (37% of all suspects) is primarily landing on the Home page and the Product Page. The traffic on the Product Page is having no referrer, means direct.
- IP Concentration – The suspect traffic is being generated from multiple IP Subnets having the same patterns as specified above.
The E-commerce website is receiving a pattern of the suspect traffic which is small enough to be below the radar and is being generated from multiple machines having fixed viewport as 1200X1100 on the same subnets deployed across multiple subnets geographically using Chrome 86 browser. The traffic is landing on the Product Page which is having No Referrer and is also moving to the Product Details Page which is having a referrer.It is important to understand that the page patterns are also determined and detected by Gemini in an automated way. Thus there is no additional step which needs to be executed to detect the Level 1 suspect bots.
Blocking the Attack:
Having detected the level 1 suspects where all the traffic characteristics point to an automated attempt of site access, Gemini insights are then integrated with the Load balancer or the Web Application Firewall of the customers to achieve the blocking of the suspects.
Our customers, especially the E-commerce players, are liking this additional functionality and are using this to block the web scraping attempts in a very cost effective way. If you are interested in the free trial of Gemini, please do not hesitate to drop us an email.