Site Map Contact Us Home
E-mail Newsletter
Subscribe to get informed about
Clever Components news.

Your Name:
Your Email:
 
SUBSCRIBE
 
Previous Newsletters
 




Products Articles Downloads Order Support
Customer Portal      

Getting an instant downloads statistic

Submitted on August 23, 2004

Introduction

One of the most important website activity parameters is the resource access statistic. Such information is necessary for many purposes - optimizing of the website content, marketing campaigns improvements and also for some diagnostic tests. The detailed information regarding resource access statistic saved by the web server into the log file(s).

There are lots of applications and program tools such as "WebTrends Log Analyser" (by http://www.webtrends.com) which can parse the web server activity logs, compose the statistical information and finally display this information in user-friendly format. Majority of these programs can provide the information with resource access statistic during some fixed time interval. Also such report generators require some time to process the log files and prepare the statistic reports.

In this article we will provide simple ASP.NET application which can walk through the web server activity logs, parse them on a fly and finally display the summary statistic report for each fixed time interval (day, month, year) chronologically.

Log File Parsing

We need to provide access to the web server activity log files in order to allow the ASP.NET application parse them. For demo purposes we will assume that our test web server configured to save all log files to the same PC where our ASP.NET application runs. All what we need is to read the log files in an appropriate order, parse each of them and finally enumerate all occurrences of the given key phrase, lexeme or a resource name.

We also will assume that the current web server stores its log files daily and names them using the following file mask: "exYYYYMMDD.log". Where YYYY denotes the year part of the log file creation date, MM - month and DD - day correspondingly. This will allow us not to parse each log file for the extracting of the log file creation date.

Finally, the algorithm of iterating through the log files and finding all occurrences of the specified phrase is shown below:

int ProcessFile(string fileName, string checkWord) {
   int wordCount = 0;

   FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
   StreamReader sr = new StreamReader(fs);
   string s;
   while ((s = sr.ReadLine()) != null) {
      if(s.ToUpper().IndexOf(checkWord.ToUpper()) > -1) {
         wordCount++;
      }
   }
   sr.Close();
   fs.Close();
   return wordCount;
}

int ProcessFilesByDate(string checkWord, DateTime startDate, DateTime endDate) {
   int totalWordCount = 0;

   for(DateTime dt = startDate; dt <= endDate; dt = dt.AddDays(1)) {
      string file = String.Format("ex{0}.log", dt.ToString("yyMMdd"));
      file = String.Format("{0}\\{1}", LogPath, file);

      if(File.Exists(file)) {
         int wordCount = ProcessFile(file, checkWord);
         totalWordCount += wordCount;
         PrintLogFileWordCount(dt.ToString("dd MMM yyyy"), wordCount);
         // printing the wordCount values for the specified date dt
      }
   }
   return totalWordCount;
}
 

Displaying the statistic information on the web page

The resource access statistic information can be displayed chronologically for each time interval. Such representation is helpful when you want to know the download statistics of the specified resource per each time interval (e.g, daily). The code below represents the modified version of the file enumerating algorithm from the previous chapter:

protected System.Web.UI.WebControls.Table tblLogFileWordCount;

void PrintLogFileWordCount(string file, int wordCount) {
   TableRow row = new TableRow();
   tblLogFileWordCount.Rows.Add(row);

   TableCell cell = new TableCell();
   row.Cells.Add(cell);
   cell.Width = Unit.Percentage(20);
   cell.Text = String.Format("{0}:", Path.GetFileName(file));

   cell = new TableCell();
   row.Cells.Add(cell);
   cell.Width = Unit.Percentage(80);
   cell.Text = wordCount.ToString();
}

Multithreaded downloading statistic

Many users have special programs for downloading large files more effectively. Such programs (Download Managers) usually download one single web resource in multiple downloading threads simultaneously. Web server stores the corresponding log record per each downloading thread. In order to prevent our log parser from enumerating such duplicated log records we need to extract the user IP from each log record and check it for matching with all previously extracted IPs:

Hashtable ipList = new Hashtable();

bool IsNewIp(string ipString) {
   bool result = !ipList.Contains(ipString);
   if(result && !ipString.Equals(String.Empty))
      ipList.Add(ipString, ipString);
   return result;
}

string GetIp(string line) {
   int ind = line.IndexOf(" ");
   if(ind > -1)
      ind = line.IndexOf(" ", ind + 1);
   if(ind > -1) {
      int indEnd = line.IndexOf(" ", ind + 1);
      if(indEnd > -1)
         return line.Substring(ind + 1, indEnd - ind - 1);
   }
   return String.Empty;
}

int ProcessFile(string fileName, string checkWord) {
   int wordCount = 0;

   FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
   StreamReader sr = new StreamReader(fs);
   string s;
   while ((s = sr.ReadLine()) != null) {
      if(s.ToUpper().IndexOf(checkWord.ToUpper()) > -1) {
         if(IsNewIp(GetIp(s))) {
            wordCount++;
         }
      }
   }
   sr.Close();
   fs.Close();
   return wordCount;
}

Source Code and working sample

The full source code of all classes described in this article can be downloaded at statistic.zip

This code is constantly being refined and improved and your comments and suggestions are always welcome.

With best regards,
Sergey Shirokov 
Clever Components team.

    Copyright © 2000-2024