Web Crawler & User Agent Blocking Techniques

Web Crawler & User Agent Blocking Techniques

This is a simple script that allows hackers to block specific crawlers based upon website requests from specific user-agents. This is useful when you don’t want certain traffic from being able to load certain content – usually a phishing page or a malicious download.

if(preg_match(‘/bot|crawler|spider|facebook|alexa|twitter|curl/i’, $_SERVER[‘HTTP_USER_AGENT’])) {
logger(“[BOT] {$_SERVER[‘REQUEST_URI’]} – 500”);

header(‘HTTP/1.1 500 Internal Server Error’);
exit();
}

Using preg_match, the script looks for certain known crawler strings in the user-agent.

Continue reading Web Crawler & User Agent Blocking Techniques at Sucuri Blog.

Some of the links in this article are "affiliate links", a link with a special tracking code. This means if you click on an affiliate link and purchase the item, we will receive an affiliate commission. The price of the item is the same whether it is an affiliate link or not. Regardless, we only recommend products or services we believe will add value to our readers. By using the affiliate links, you are helping support our Website, and we genuinely appreciate your support.
Web Crawler & User Agent Blocking Techniques

This is a simple script that allows hackers to block specific crawlers based upon website requests from specific user-agents. This is useful when you don’t want certain traffic from being able to load certain content – usually a phishing page or a malicious download.

if(preg_match(‘/bot|crawler|spider|facebook|alexa|twitter|curl/i’, $_SERVER[‘HTTP_USER_AGENT’])) {
logger(“[BOT] {$_SERVER[‘REQUEST_URI’]} – 500”);

header(‘HTTP/1.1 500 Internal Server Error’);
exit();
}

Using preg_match, the script looks for certain known crawler strings in the user-agent.

Continue reading Web Crawler & User Agent Blocking Techniques at Sucuri Blog.

Author: mcnmm

MCNM Marketing, Graphics and Website Development service