Wednesday, May 21, 2014

Contrivance without conflation

In a previous post, "Contrivance without conclusion," I presented a device to count web page views. As noted in the post, that contrivance really counts web page requests. As such, it would also count requests from web crawlers used by search engines, and other* cases where a human being has not actually viewed the page.

The trick commonly used to count only views of a web page is to embed in the page an image containing the number of views. The technique is sometimes called a web bug. When the page is requested by a user-operated browser, the browser normally** will also request all embedded images so that they can be displayed in their proper places on the page.

There are dozens of web sites which offer free counters of this kind. However, if you choose to use one of these you will be bound by their terms and conditions, and subject to them discontinuing their service, which can occur without notice.

So, why not make your own? Here is a modification of the earlier contrivance that accomplishes this.

#!/bin/bash
echo "Content-type: image/gif"
echo
echo -n 1 >>../../tallies
SUM=`cat ../../tallies | wc -c`
LEN=${#SUM}
IMG="H/$LEN"
for d in `seq 1 $LEN`
  do
    IMG="$IMG S"
    IMG="$IMG P/`echo $LEN-$d | bc`"
    IMG="$IMG `echo -n $SUM | tail -c $d | head -c 1`"
  done
cd ../../counters
cat $IMG T

Line 1 as before, tells the web server that this file is a program written in the "bash" language.

Line 2 instructs the web server to send the user agent an HTTP header informing it of the type of content, in this case, a GIF image.

Line 3 as before is a blank line that signals the end of HTTP headers, so that all that follows will be sent to the user agent as the actual content produced by this CGI script (which happens only at line 15, but that is getting ahead of the story).

Line 4 as before increments the counter.

Line 5 assigns the visitor count as the value of the shell variable named SUM. As a running example, let's assume the value is 1957.

Line 6 assigns the length of the visitor count number (i.e. the number of digits it contains) to the shell variable named LEN. For our running example, this will be 4.

Line 7 assigns an initial value to the shell variable named IMG, the string "H/" followed by the actual length, "H/4" in our running example.

Line 8 begins a loop which, in our running example, will be used 4 times, since the command `seq 1 4` will return 1 2 3 4.

Line 9 open brackets the lines which will be repeated.

Line 10 appends a space character and the letter S to the IMG variable.

Line 11 appends a space character, then "P/" and, a number (the position, zero-based). For our running example, the numbers will be 3 2 1 0, as computed by having the command bc evaluate each of 4-1, 4-2, etc.

Line 12 appends a space character, then a digit from the number to be displayed. In our running example, these digits will be 1 9 5 7 7 5 9 1 (because we are looking at them in reverse order).

Line 13 close brackets the lines which were to be repeated.

Line 14 changes the current directory to a folder named "counters" (which must be created--see below) which is out of the way of the files which can be served by the web server. This directory or folder must contain image fragments, which are used by the final line.

Line 15 concatenates many bits and pieces, which are each binary files. When all strung together, and ending with the contents of the binary file named T, this will produce a valid GIF file which will be the number from the web page visitor count.

In our running example, the value stored in the IMG variable at the end of the loop will be

H/4 S P/3 7 S P/2 5 S P/1 9 S P/0 1

and the complete command of line 15 would thus be

cat H/4 S P/3 7 S P/2 5 S P/1 9 S P/0 1 T

which will output the fourteen fragments--named H/4, S, P/3, 7, etc. and finally, the fragment named T--to the web server (which passes this content along to the user agent (the browser)).

To make this work, you will need to upload the file to the cgi-bin folder in the public html file provided by your web hosting company, naming it, say, "tallyimage.cgi". Once you have done this, anyone who visits the page at
[your domain name]/cgi-bin/tallyimage.cgi
will see the number of visitors, shown as a decimal number. And that is all that will be shown in the browser, because the user will have requested an image. At this point it will be a broken image (because the script will fail to execute correctly), until you install the image fragments.

You will also need to download the image fragments from the file counters.zip (which is also mentioned/used in a web page entitled "Image generation in DataPerfect") and unzip this file into a new folder named "counters" (for that is the name mentioned in line 14) in the folder containing the public html file provided by your web hosting company.

As a mnemonic, the 32 fragment names are

folder H for "header" (containing fragments named 1, 2, 3, ... 9, X(not used here))
S for "separator"
folder P for "position" (zero-based, containing fragments named 0, 1, 2, ... 9)
9 for the digit nine, etc.
T for "terminator"

Suppose you don't want to display the count on a web page, but you want to just count it as having been viewed? In this case, here is a much simpler CGI script to accomplish the task.

#!/bin/bash
echo "Content-type: image/gif"
echo
echo -n 1 >>../../tallies
cat 1x1.gif

The first four lines are identical to the previous CGI script.

Line 5 copies a file named 1x1.gif to the the web server, which passes it along to the user agent as the requested image. This file, as its name suggests, is a small one by one pixel GIF image, which (though its name does not suggest this) is transparent, so it can be included somewhere on a page without disruption of the visual appearance of the page.

The two contrivances live:
http://sanbachs.net/tallypagewithimage.html
http://sanbachs.net/tallythispage.html

These are normal web pages, which call for and display the contrivance images. On the second page, I have enlarged the image and surrounded it will a border so that you can see where it is. This will allow you to download a 1x1.gif file more easily.

Note that these web pages (and those of the previous post) all share the same "tallies" file, and thus share a page view counter.

As a modification to the previous "contrivance without conclusion," this is a contrivance without conflation of the two kinds of visits: the ones from a genuine request to view a page, and the ones from web crawlers and other user agents which do not request the images on the page.

* Technically, any software which makes a request for a web page is called a User Agent.

** A user can disable the automatic display of images on a web page, unfortunately, so this trick will miss counting such page views.

No comments:

Post a Comment