banner

Understanding Web Statistics

MWS Newsletter: Volume 4, Issue 1

Understanding Web Statistics

First a quick announcement: Margaret and I are going to be in New York for the Folio Show next week. If you'll be at the show or if you're in the New York area and you'd like to set up a meeting with us, please give me a call or send an email.

Now on to the subject of this newsletter: Web stats. I'm going to try to keep this short and to the point. This is fundamental information that anyone who's trying to make money off of Web content needs to understand, but I'm going to approach it from a slightly different angle than usual.

Rather than simply defining the terms "hit", "page view", "visit", and "referrer" (which you should know or look up now if you don't), I'm going to show you a small section of a Web server log file and explain how these stats are derived from it. Many Web stat collection systems (such as Google Analytics and Omniture) now use Javascript and cookies to collect visitor data, but the basic idea is the same.

Every time a file is requested from a Web server, a line is written to a text file. There are several different formats for logging Web server requests, but one of the most common and most useful is the 'Combined Log Format'. Here's are two example lines from a log in Combined Log Format:

127.0.0.1 - - [27/Jul/2006:18:00:41 -0700] "GET /aboutme.html HTTP/1.0" 200 1206 "http://www.example.com/start.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/1.5.0.5"
127.0.0.1 - - [27/Jul/2006:18:00:56 -0700] "GET /picture.jpg HTTP/1.0" 200 65320 "http://www.example.com/aboutme.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/1.5.0.5"

If you take apart each of these lines, you'll see that they aren't that complicated, and they contain quite a bit of information. Here's a quick rundown, in order, of each important piece of data in this log file format:

  • address of the computer requesting the file (127.0.0.1)
  • 2 dashes, representing missing data. These two spots are usually not filled in or useful.
  • date, time, and time zone on the server
  • file requested and protocol used (aboutme.html, picture.jpg, HTTP 1.0)
  • status code (200 = success)
  • size of the file sent to the visitor (1206 bytes for the first line)
  • the page that the visitor linked to this file from (http://www.example.com/start.html). This is also known as the referer (sic)
  • user agent (this contains information about the visitor's browser and computer)

You can find out more about each of the parts by visiting: http://httpd.apache.org/docs/2.2/logs.html (I'm trying to keep this short, after all). I will say this much, just because it's so important: each time any file is successfully downloaded from a web server, it counts as a "hit". Each time a Web page is downloaded, it counts as a 'page view'. So, in the above example, the file aboutme.html contains an image (picture.jpg). These files download separately and register two "hits" but only one page view. Page views are the more accurate measurement of how much traffic a site gets. Hopefully you already know this1.

The raw data that can be found in a single line of a log file is pretty impressive. However, even more important is what can be calculated, or at least estimated, from multiple lines of a log file. For example, by looking at all of the logged hits over a certain period of time, log file analysis software can figure out how long someone stayed on your site, or what keywords are most commonly used to find your site on which search engine, or how many unique visitors your site got during a specified period of time.

All of this data is essential for knowing how your site is doing, for figuring out where to focus your efforts, and for comparing it with other sites. But wouldn't it be nice if there were a number that you could use that would give you a handle on how people feel about your content, or how "engaged" they are in your content? Yes, of course. Is it possible? Sort of. Defining and measuring "engagement" will be the topic of next month's newsletter.

In the meantime, please contact me if you have any questions.

Thanks,
Chris

Chris Minnick
Minnick Web Services
www.minnickweb.com
www.ebookhost.com
Phone: 916-551-1453
Fax: 916-551-1454

------
1For more information about meaningless numbers and ways in which Web stats can be misrepresented or artificially inflated, I highly recommend the following article: http://www.emediastrategist.com/blog/?p=17.

------

If you would like to suggest a topic for a future newsletter issue, please send mail to newsletter@minnickweb.com

In the meantime:

For more information about how Minnick Web Services can help you achieve your goals, please visit our web site, www.minnickweb.com or contact us.

To find out more about our digital proofing, publishing, and reporting technologies, visit the eBookHost demo at demo.ebookhost.com.

------

Complete Archives and Subscription Information

------