Saturday, May 10, 2014

Contrivance without conclusion

This might be the smallest web page which counts and displays the number of times it has been seen.

#!/bin/bash
echo "Content-type: text/plain"
echo
echo -n 1 >>../../tallies
cat ../../tallies | wc -c

It is a contrivance, all right. Without conclusion? Well, not literally so. Eventually, it might run out of disk space. Probably before that the web hosting company owning the disk drive will go out of business. Probably before that the owner of the domain name will leave the planet or otherwise lose interest in maintaining it.

To make this work, you will need to reserve a domain name, engage a web hosting company for disk space and the use of a web server. Then, you will need to upload this file to a folder named "cgi-bin" within the public html folder provided by the web hosting company. You will of course give this file a name when you upload it. Supposing you give it the name "tally.cgi" it will be available to anyone with an internet connection as
[your domain name]/cgi-bin/tally.cgi
Next, you will need to convince some people to look at your web page. And that is in fact the hardest thing of all to accomplish. The social problems being harder than the technical problems.

Each time someone looks at it, he or she will see a number. Ever larger numbers. And that is all. The number will be equal to the number of times the page has been visited. No one, including you testing it, will ever see the number zero.

How does it work? Delighted that you asked.

The browser asks the web server at your domain (running on the machine belonging to the web hosting company) for the page. The web server runs the program which you have named "tally.cgi" by interpreting it as explained here. The web server uses the output of the program to prepare a response which it then forwards to the browser.

Line 1 tells the web server that this file is a program written in the "bash" language. This is one of the Linux shells, normally used at the command line prompt. So the web server starts up a bash shell and passes the file to it for execution.

Line 2 produces as output an HTTP header which will ultimately let the browser know that the content is plain text.

Line 3 outputs an empty line, which signals to the web server that the HTTP headers are finished, and that what follows will be the actual content to be sent to the browser.

Line 4 produces no output, but adds one to the file named "tallies". A quirk here is that it adds one by adding a digit "1" to the end of the file. The number of visits is maintained in base one, rather than the more familiar base ten.

Line 5 outputs a base ten number expressing the number of characters (all ones, remember) in the file named "tallies"

Then, the program stops, signalling to the web server that the page is complete, and the web server passes this information along to the browser, which will signal to the person that the page is complete.

The count will include visits of the page by web crawlers, such as the ones used by search engines. So, strictly speaking, it is not counting the number of times the page is viewed by a person (using a browser or mobile phone), but rather the number of times the page is requested of the web server.

Why use base one, instead of base ten? Glad you asked! For some insane reason your web page might become wildly popular, with people around the world accessing it over and over again just for the pleasure of seeing ever larger numbers. What happens if two of these requests arrive at the web server at exactly and precisely the same moment in time? If we had been using base ten, the operation of adding one to the number would involve reading in the (base ten) number, adding one to it, and writing out the next (base ten) number. If the web server is running two copies of your program and both copies read in the same number, both will increment it, and both will write out the next number, and one visit will be missed. This is known in the computing industry as a critical section. By contrast, the operation of appending a character to the end of a file is indivisible, and so the count will not be missed.

But won't this require more disk space than a base ten number? Yes, considerably more as the count gets bigger and bigger. This is an example of a trade-off. We are trading off space for the advantage of indivisibility or reliability. Of course, this is a bit contrived, because what does it really matter if the count is off by one once in awhile? Especially if there are billions of page views, this would consume vast amounts of disk space, with each byte being exactly the same digit "1". Well, yes, but that's not going to happen, because billions of people are not going to be viewing a page that merely contains an ever larger number. They have better things to do, such as looking at pictures of cats.

Could this be used to count visitors to a real web page? Sure. Adapt it like this.

#!/bin/bash
echo "Content-type: text/html"
echo
echo -n 1 >>../../tallies
COUNT=`cat ../../tallies | wc -c`
cat <<ENDMARKER
... your real web page goes here ...
<p>This page has been visited $COUNT times.</p>
... the conclusion of your real web page ...
ENDMARKER

Notice the change to line 2.

The adapted line 5 does not output the visitor count, but instead assigns it as the value of a shell variable named "COUNT".

Line 6 starts copying the following lines (your real web page!), up to but not including the end marker, to the output to be sent by the web server to the browser. The construct "$COUNT" will be replaced in the output with the value of the "COUNT" variable, which you recall is the visitor count. You can safely use the plural "times" because no one will see the page when it says "visited 1 times." Other than the first time you test it.

This paragraph marks the conclusion of this post. So, it was not the post itself which was without conclusion, but rather the contrivance described therein, which in principle would never run out of numbers.

[added Sun May 11 06:59:14 CDT 2014]
The two contrivances live:
http://sanbachs.net/cgi-bin/tally.cgi
http://sanbachs.net/cgi-bin/tallypage.cgi

Note that both web pages share the same "tallies" file, and thus share a page view counter.

[added Sun May 11 14:03:51 CDT 2014]
CGI stands for "Common Gateway Interface"

3 comments:

  1. This is the kind of stuff that keeps Elder Conrad awake at night!!!! SCARY!
    Sister Conrad

    ReplyDelete
  2. This is the kind of stuff that makes Sister Layton's eyes glaze over!!

    ReplyDelete
  3. Love the read :D
    I finally discovered your blog

    ReplyDelete