Sometimes I see a sudden jump in the hits through ShortStat, and I’m left to wonder whether some busy site has linked to Nomadig.com, a search engine robot has gone wild or the site has been sucked by an offline browser.
The spikes of the first kind are more than welcome. More traffic here, more potential audience to be served. The second is usually okay, but there are some search engines that nobody is really interested in and their robots are not behaving properly.
The third category drives me nuts — the offline browsers are stupid enough to suck every possible page that has been linked in. This causes problems when the same page can be reached using different URLs and the site uses these variations. The same page can be fetched several times, increasing the bandwidth usage that at the end of the day costs me money. And these suckers are fast; they can fetch several pages in a second, eating bandwidth and processing power from the rest of you.
Until today, I wasn’t able to tell what caused the spike without downloading the raw log file and then trying to analyse it by finding an IP address that showed up again and again in relatively short period. As the log file contains information about all downloads, including image, JavaScript and CSS files, the task is burdensome.
A week ago I got that a-ha moment, when I suddenly realised that all required informaiton is stored in the ShortStat database. I could fetch the data with a relatively simple SQL statement and then organise it to a suitable structure for rapid analysis and print-out.
The process is very simple:
- Read all requests for the past X seconds from the database, organised by the IP number and the browser string.
- Count the number of requests per distinct IP number and browser string pair. Store the first and the last access time of these requests.
- Calculate the time span between the first and the last access, and use that number to calculate average hits per second.
- Show the number of hits, average hits per second, time span, IP address and the browser string sorted by the number of hits to the user. There is a lower limit for the hits to keep the list relatively short.
I fancied first doing that whole stuff in an SQL statement, but the timespan calculation was impossible feat for MySQL — at least without creation huge unions.
If you are interested in this, go to my ShortStat page at www.nomadig.com/shortstat/ and click link Leeches at the top of the page to see who is leeching Nomadig.com during the past 24, 48 or 72 hours.
I can make the code available, if someone is interested in using the system.