Free Energy | searching for free energy and discussing free energy

Discussion board help and admin topics => Help to access this discussion board => Topic started by: hartiberlin on January 09, 2008, 11:10:48 AM

Title: Google crawler stealing all the bandwidth traffic..
Post by: hartiberlin on January 09, 2008, 11:10:48 AM
Hi All,
although I am still on vacation I could analyse the traffic being made on this forum and unfortunately the Google bot crawler makes about 10 to 20 times more traffic than all the power users over here.
I already set in Google Webmastertools the setting to crawl this site slower, but this also did not help.
Also the Google bot does not follow the Crawl-delay parameter in robots.txt.

Is there any other solution to stop the Google bot to crawl so fast ?
Maybe it is because of the Adsense Ads ?
Any help would be greatly appreciated,
maybe setting it somehow to error 503 for temporarely not available , not to be thrown out of the index ?
Many thanks.
Regards, Stefan.
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: helmut on January 09, 2008, 11:22:29 AM
Hi Stefan
Dont forget to enjoy your Vacation.
The world will keep on turning

helmut
Title: Google crawler stealing all the bandwidth traffic..
Post by: Earl on January 12, 2008, 10:30:03 PM
If your Web server is running under Linux, can you use a cron job to copy and overwrite robots.txt such that only x hours per night robots.txt says

User-agent: *
Allow: /

The rest of the day it is overwritten to show

User-agent: *
Disallow: /

For example make a file called allow.txt and the cron job would say
echo allow.txt > robots.txt

and the file disallow.txt and the cron job would say
echo disallow.txt > robots.txt

allow.txt and disallow.txt are the same except one line and contain
the entire robots.txt
For your info, Slurp (yahoo/AV) and MSFT bots obey crawl delay,
Googlebot not yet but will most likely in 2.1+
less than 35 percent of servers have a robots.txt file
this is crazy, but over 75,000 robots.txt files have pictures in them!

Regards, Earl


Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: hartiberlin on January 12, 2008, 11:52:40 PM
Hi Earl,
nice idea !
This sounds like an easy solution.
Many thanks for this tip.

I just wonder, if Google tries again after a few hours to access
my site, when it was blocked already ?
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: amigo on January 13, 2008, 12:49:38 AM
You could use .htaccess in the root of the web and Mod_Rewrite (if this server runs Apache) to effectively block Google or have rules based on time tied to scripts that check last visited time etc.

It really depends what is the ultimate goal but mod_rewrite is pretty powerful, though with steep learning curve to begin with. :)
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: DrStiffler on January 17, 2008, 11:35:31 PM
Well there are still some problems....

When I post a message it goes to neverland and mat take or it may not. Loading is slow, (to much being downloaded to the local machine). If it takes 45 to 180 seconds to down load or longer to ass a message, then what need to be done?????
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: hartiberlin on January 18, 2008, 12:30:30 AM
Hmm,
I blocked now all the spiders and on my location the site runs very well.
Please post a few traceroute results, so we can see,where the
bottlenecks are.
Many thanks.
P.S:If you post new pictures, be sure that the picture name is
really new and was not used by another user already,
so name the pics:
my_username_pic01.jpg
my_username_pic02.jpg
etc...

Also it it wise to copy the written text into the windows buffer (Control plus c)
before hitting the Post button,
in case the server times out or does not accept it to have still a copy of it...
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: Paul-R on January 18, 2008, 04:06:31 PM
Hmm,
I blocked now all the spiders and on my location the site runs very well.
On the subject of bandwidth, every post has a tick box labelled:

"Notify me of replies"

and this tick box defaults to the "Yes" flag. Why not set the
software so that either this service is omitted, or it defaults
to the "No" postion, and people have to change it actively to
get it to work?
Paul.
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: gri on January 18, 2008, 04:47:11 PM
Quote from: Paul-R in ~Google crawler stealing all the bandwidth traffic..~ (http://www.overunity.com/index.php?topic=3898.msg71393#msg71393)
On the subject of bandwidth, every post has a tick box labelled:

"Notify me of replies"

and this tick box defaults to the "Yes" flag.

Why not set the software so that either this service is omitted,
or it defaults to the "No" postion,
and people have to change it actively to get it to work?
Paul.

Paul-R,

if a user does not observe vividly any setting results -
he in general does not know of its existance at all.

It is quite easy and natural to set off
those settings which are _visible_ for a user.

For example, if a user is not getting notifications by default
he will think that SMF developers monkey team
has not yet began to develop the human notification system.

But they began already. Just they are doing it too slow.
According to the Evolution tempo.
Title: Re: Google crawler stealing all the bandwidth traffic..
Post by: bolt on January 21, 2008, 08:00:05 AM
This site has got way too slow with all the linked ads. For every page refresh a user makes the main server has to go off and collect all those adds links too as well as building the page. For end users its not a pleasant experience taking time to browse and then be subjected to too many ads in the text.

My solution. Use firefox  and click AddOns. Install Adblock Plus and add the following zap lines to the Adblock filters then the pages here loads clean, fast and no more stupid adverts!

Result squeaky clean page  ;D

http://www.shareasale.com/
http://shopcloud.chitika.net/
http://pagead2.googlesyndication.com/
http://mm.chitika.net/
http://bdv.bidvertiser.com/

Any more spotted just right click on the advert and click Adblock to zap it.