Google Yahoo Proxy Exploit – Protect your site agaist it

September 16, 2007 · Print This Article

Defence against the dark arts sound like something from Harry Potter but webmasters need to defend their website against some black hat search engine techniques.

One technique, which could be a black hat attack to get a site into trouble with leading search engine, is the proxy exploit.  Proxy servers used correctly can help some Internet users access sites they want to access legitimately, but for whatever reason they need to go through a proxy server.

The downside of proxy server is that they tend to cache the site they are used to access, the cached copy of the site creates a duplicate copy of the original site, which is then seem by search engines as duplicate contents, thereby penalising the original site for duplicate contents.  This can be devastating for a website, especially when the website owner is unaware that a copy of his site exists on another location on the Internet.

One of the senior members of UK Webmaster Forums Bagi, has brought together a hack that help you protect your website against this proxy duplicate contents exploit.

A snippet of the code can be found below, you can find the complete full code here.

 <?php
// Get the user agent.
$ua $_SERVER['HTTP_USER_AGENT'
];
// Check the user agent to see if it’s identifying itself as a search engine bot.
if(stristr($ua‘msnbot’) || stristr($ua‘googlebot’) || stristr($ua‘yahoo slurp’
)){
// The user agent is purporting to be MSN’s  bot or Google’s bot or Yahoo! Slurp.
// If the user agent string is spoofed, we won’t find googlebot.com in the host name.
// Get the IP address requesting the page.
$ip $_SERVER['REMOTE_ADDR'
];
// Reverse DNS lookup the IP address to get a hostname.
$hostname gethostbyaddr($ip
);
// Check for ’.googlebot.com’ and ’/search.live.com’ in hostname.
if(!preg_match(“/\.googlebot\.com$/”$hostname) &&!preg_match(“/search\.live\.com$/”$hostname) &&!preg_match(“/crawl\.yahoo\.net$/”$hostname
)) {
// The host name does not belong to either live.com or googlebot.com.
// Remember the UA already said it is either MSNBot or Googlebot.
$block TRUE
;
header(“HTTP/1.0 403 Forbidden”
);
exit;
} else {
// Now we have a hit that half-passes the check. One last go:
// Forward DNS lookup the hostname to get an IP address.
$real_ip gethostbyname($hostname
);
if(
$ip!= $real_ip
){
$block TRUE
;
header(“HTTP/1.0 403 Forbidden”
);
exit;
} else {
// Real bot.
$block FALSE
;
}
}
}
?>

You can find this code in Hungaria at Bagi’s blogs


This is some text prior to the author information. You can change this text from the admin section of WP-Gravatar  Test Bio for Temi Read more from this author


delicious | digg | reddit | facebook | technorati | stumbleupon | savetheurl
Share

Comments

Got something to say?

You must be logged in to post a comment.

2009 © Temi Webmaster Blog All Right Reserved.