Analog is a fast and flexible web log analysis tool. Its configuration can consist of several files nested using include statements. This allows common configuration items to be grouped in separate files. The minimal site specific configuration items can be contained in small include files. Similarly, time period specific include files allow for reports by time period to be easily configured. Each report then requires a configuration file, which includes a few other files.
I have reviewed and updated my previous documentation for analog. This site is hosted on a new server, and I needed to setup analog for the new server. I also made changes to the list of virtual sites being hosted. I generate report sets for each site as well as an overview report for all sites. Each report set includes reports for covering the latest week, month, and year of data.
Setting up Apache2
To be able to report on multiple sites, it is important to record the site information in the access log files. The vhost log format is designed to do this. This allows for a single log file include file for all sites. Alternatively, each site can have its own access log file. Analog can be configured to allow you mix both types of log files, should you wish to change format without modifying existing files.
The DEFAULTLOGFORMAT
you use must match your log files. You can specify multiple formats. The access log format I use is a variation on the vhost_combined format. It differs from the Apache vhost_combined format as follows:
- the remote host is recorded by address rather than by name;
- the remote logname is replaced with the time taken to serve the request.
LogFormat "$v:%p %a %T %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" local LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
Common Configuration
The documentation includes a number of sample configuration files. On Ubuntu or Debian systems these are located in /usr/share/doc/analog/examples
. You may find bigbyrep.cfg
is a good starting point for your reports. This documentation also contains pointers to sites which have configurations for robots, search engines, typealiases, and Spam sites. The configuration below assumes you have downloaded these, and includes them.
Create a directory for your configuration files. This example uses ${HOME}/etc
. Place all the files you have selected in it. This will simplify your configuration.
These examples assume all reports are on a common reporting site. You will need to create directories on a web site for the reports and images on each site hosting reports. These directories need to be writable by the user running the reports, and readable by the web server. These examples do not include securing access to the reports from the web site. Do not run these report as root or the Apache servers user-id.
Tune bigbyrep.cfg
configuration for one site using one or two log files. This will become the basis for all your reports. Create a copy of your bigbyrep.cfg
file as bigbyrep.inc
. Ensure all the lines in Logfile Input
lines are commented out. Also ensure the lines for IMAGEDIR
, CHARTDIR
, and LOCALCHARTDIR
, are commented out. These lines will be supplied on a per report basis. Now you can create your common.inc file containing something like.
# common.inc #### Basic local configuration # Header information - Modify as appropriate or move to vhost file if it varies by site HOSTURL /webstats/index.html LOGO /graphics/icon.gif # Cache DNS Look-ups - use default (/var/cache/analog/dnscache) # User must be granted privileges on this file, # or a different file may be specified here DNS WRITE DNSGOODHOURS 1440 # Enhanced vhost log format - Adjust for your format - you can supply multiple formats # Only includes port number in vhost identification if it is not the default http port. DEFAULTLOGFORMAT (%v:80 %s %t %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B") DEFAULTLOGFORMAT (%v %s %t %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B") DEFAULTLOGFORMAT (%s %j %j [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B") # Exclude internal hosts HOSTEXCLUDE 127.0.0.1 HOSTEXCLUDE 192.0.2.* HOSTEXCLUDE *.example.com # Exclude monitoring requests BROWEXCLUDE check_http* # Reporting Period - Starting Yesterday - default to Yearly report FROM -01-00-00:0000 TO -00-00-00:0000 #### External configuration files CONFIGFILE etc/bigbyrep.inc CONFIGFILE etc/SearchEngines.txt CONFIGFILE etc/RobotInclude.txt CONFIGFILE etc/RefSpam.txt #### Overrides DAILYREP OFF DAILYSUM ON REQINCLUDE *.php PAGEINCLUDE *.php WARNINGS ON WARNINGS -MR
Setting up the first report
Create a configuration file for your first report. Once you have these files created, test the report using the command analog +getc/example_month.conf
. You could also use the run_analog.sh
script provided below to run the report.
You should have a naming file for your configuration files. These examples use a name consisting of the components: site, purpose, and time period. Usually only two components are required to specify a file.
You will also need a naming standard for the OUTFILE
and CHARTDIR
parameters. The examples use the same approach as for configuration files. LOCALCHARTDIR
prefixes, CHARTDIR
with the directory path from OUTFILE
. You will want a different standard if your are placing the reports on each vhosts’ site, and the directory path will change for each site.
The example_month.conf
file contains seven configuration lines. The first four lines specify the output. The remaining lines include the specifications.
# example_month.conf HOSTNAME "Example.com - Month" OUTFILE /var/www/webstats/example_month.html CHARTDIR images/example_m_ LOCALCHARTDIR /var/www/webstats/images/example_m_ CONFIGFILE etc/common_month.inc CONFIGFILE etc/example_vhost.inc CONFIGFILE etc/log_month.inc
The nested files also contain minimal information. The contents of common_month.inc
are site independent as are most include files used here.
# common_month.inc CONFIGFILE etc/common.inc FROM -00-01-00:0000 WEEKLY ON
The contents of vhost_example.inc
specifies how to select records for example.com
. This file could include site specific header information instead of supplying this information in the common.inc
file. Any overrides for the site should also be included in this file.
# vhost_example.inc VHOSTINCLUDE www.example.com
The contents of monthly_log.inc
specifies which files to use. This limits the number of records not required for the report. If you have access logs separated by vhost, you will need a log file include per site or report.
# log_month.inc LOGFILE /var/log/apache2/access.log LOGFILE /var/log/apache2/access.log.1 LOGFILE /var/log/apache2/access.log.2.gz LOGFILE /var/log/apache2/access.log.3.gz LOGFILE /var/log/apache2/access.log.4.gz LOGFILE /var/log/apache2/access.log.5.gz
Adding Weekly and Yearly reports
Copy example_month.conf
to example_week.conf
. Change the HOSTNAME
, OUTFILE
, CHARTDIR
, and LOCAL_CHARTDIR
parameters to unique values. Replace common_month.inc
with common_week.inc
, and log_month.inc
with log_week.inc
. Test the configurations as above.
# example_week.conf HOSTNAME "Example.com - Week OUTFILE /var/www/webstats/weekly_example.html CHARTDIR images/w_example_ LOCALCHARTDIR /var/www/webstats/images/w_example_ CONFIGFILE etc/common_week.inc CONFIGFILE etc/example_vhost.inc CONFIGFILE etc/log_week.inc
Create common_week.inc
. This selects the appropriate time period, and provides the common configuration. As we have only one week we turn off the weekly report.
# common_week.inc CONFIGFILE etc/common.inc FROM -00-00-07:0000 WEEKLY OFF
Create log_week.inc
. This specifies the log files included for the weekly report.
# log_week.inc LOGFILE /var/log/apache2/access.log LOGFILE /var/log/apache2/access.log.1
Copy example_month.conf
to example_year.conf
. Change the HOSTNAME
, OUTFILE
, CHARTDIR
, and LOCAL_CHARTDIR
parameters to unique values. Replace common_month.inc
with common_year.inc
, and log_month.inc
with log_yearly.inc
.
# example_year.conf HOSTNAME "Example.com - Year" OUTFILE /var/www/webstats/example_year.html CHARTDIR images/example_y_ LOCALCHARTDIR /var/www/webstats/images/example_y_ CONFIGFILE etc/common_year.inc CONFIGFILE etc/example_vhost.inc CONFIGFILE etc/log_year.inc
Create the common_year.inc
file. The supplies the common configuration and enables yearly reports. The time period was specified in common.inc
, and has been overridden in the other time period include files.
# common_year.inc CONFIGFILE etc/common.inc MONTHLY ON
Create the log_year.inc
file. The example below includes all log files. It could be modified appropriately if you retain far more than a years’ access log files in the log directory.
# log_year.inc LOGFILE /var/log/apache2/access.log*
Adding new sites
Adding a new site consists of creating a few small files. You will need a new vhost
specification, and a new .conf
file for each report.
If the access logs are separated by site you will need a logfile include file per site or report. Alternatively, include the LOGFILE
specifications in the .conf
file.
This example is for the site mail.example.com.
# mail_example_vhost.inc VHOSTINCLUDE mail.example.conf
Copy example_month.conf
to mail_example_month.conf
. Edit as above changing the vhost
include file. Create the weekly and yearly files as was done for the original site. Test these new reports.
# mail_example_month.conf HOSTNAME "Mail.Example.com - Month" OUTFILE /var/www/webstats/mail_example_month.html CHARTDIR images/mail_example_m_ LOCALCHARTDIR /var/www/webstats/images/mail_example_m_ CONFIGFILE etc/common_month.inc CONFIGFILE etc/mail_example_vhost.inc CONFIGFILE etc/log_week.inc
Reporting all sites
Adding a all site consists of creating new .conf
files. If your log files are split by site you will need a new log
include files as well.
This example is for the allsites monthly report. Copy example_month.conf
to all_month.conf
. Edit as above replacing the dropping the VHOSTINCLUDE
specification, and turning on the VHOST
report. Create the weekly and yearly files as was done for the original site. Test these new reports.
# all_vhost.inc VHOST ON
# all_month.conf HOSTNAME "All Sites - Month" OUTFILE /var/www/webstats/all_month.html CHARTDIR images/all_m_ LOCALCHARTDIR /var/www/webstats/images/all_m_ CONFIGFILE etc/common_month.inc CONFIGFILE etc/all_vhost.inc CONFIGFILE etc/log_week.inc
Scheduling Report Generation
You will need a script to run the reports. The following script will run one or more reports. It defaults to running all the reports.
#!/bin/sh -x # run_analog.sh - Run the analog jobs CONF_DIR=$HOME/etc # Needs to be above the configuration directory cd ${CONF_DIR}/.. # Get the list of config files CONFIGS="$*" [ -z "${CONFIGS}" ] && CONFIGS='*.conf' # Run all the conf files (1 per report) for CONF in $CONFIGS; do CONF=$(basename ${CONF}) nice analog +g${CONF_DIR}${CONF} done # EOF
Schedule this script to run at appropriate times. You can run all reports, or schedule report sets at appropriate times. You may want to run the weekly reports Monday morning, and the Monthly and Yearly reports on the first of the month. Avoid running two sets of reports at the same time.
Final Cleanup
If you have more than one report in a directory, create an index.html file for the directory. If this is the target the HOSTURL
parameter it will make it easier to navigate the reports.
Restrict access to the reports and images using .htaccess or changes to the Apache configuration.