Budapest CNEET Workshop, 2001

David Martland D.Martland@kingston.ac.uk

Rahim Rahmani rahim@ite.mh.se

 

Web Caching

In this workshop the emphasis is on efficient distribution of data via the Internet. In particular, this includes Web caching, so we have set up a few exercises for you to carry out. You will be given assistance with the exercises.

The purpose of the exercises is to give you familiarity with some caching systems, which you can use when you return to your own institutions. The exercises are not intended to take you beyond the initial level of familiarity, though if you have previous experience, you may wish to undertake some more challenging work.

In the time allowed, you should be able to work through the first two exercises.

The first exercise is the most tedious. This exercise shows you how to install the Squid web proxy cache. Exercise 2 configures your browser to use a web proxy cache.

Exercises 3-5 are provided for you to work if you complete these exercises earlier, or for you to take back for further study. Exercise 3 configures the browser to use an autoconfiguration file. Exercise 4 is an exercise to use the cache manager to extract useful information and statistics about the operation of the cache.

Exercise 5 configures the web proxy cache to use cache digests, for use with cooperating caches.

You should keep notes, preferably in a logbook, during these exercises. You may need your notes for discussion later on.

You will be asked to give a brief discussion of the work you have carried out at the end of this session.

You should work in small groups (preferably 2-3 people per group).

There are sufficient machines.

Each exercise may require cooperation between different groups, so you should be prepared to discuss the work in your group with members of other groups.

At the end of the practical workshop session you should be able to:

1. Install and run the Squid Proxy cache server package.

2 Configure a browser to run with a proxy server

Exercise 1 - Installing Squid

  1. Start Linux on your computer – you should already know how to do this. Open up a shell or command window.
  2. Create a new working directory squidtemp using the mkdir command, and change directories into the new directory.
  1. Find, copy and unpack the Squid archive. Compile the proxy server.
  1. Change directories again – the directory will have been created during the unpacking process
  1. Run the configuration file, and enable cache digests.
  1. Now compile the package
  1. Now install the package
  1. Now make a user for the squid package. Note – it is dangerous to run Squid from root – there is a security hazard, so it is better to have at least one user to work with.

 

  1. Create a directory for the cache. This can be in /usr/local. Change its ownership to squid and its group ownership also to squid.
  1. Also change the ownsership of the logs directory.
  1. Now change directory as follows, then edit the squid configuration file squid.conf
    We have made a simpler configuration file for you to edit,with most of the defaults set. This file is /server/archive/squid.conf
    You can use whichever editor you like,e.g vi, emacs, pico etc.
  1. Now you are almost ready to start squid. Change to the squid binary directory, then initialise squid, then finally run squid.

 

 

Exercise 2 – manual browser configuration

In this exercise you will configure the Linux version of Netscape Navigator to access the proxy server.

The aim of the exercise is to connect to a proxy server, and to explore the behaviour of the browser when using the proxy.

Netscape 4.xx

Select the Edit->Preferences->Advanced-> Proxy menu option.

You will see three options - direct connection via LAN, direct connection to proxy, and automatic configuration.

Select direct connection to proxy. Find an IP number of a working Squid cache from one of your neighbour groups. Use this IP number with port number 3128. Fill in the boxes for HTTP, Gopher, and FTP.

Select OK

and finally quit from the dialogue.

 

Once you have configured your browser to run with the cache proxy server, try the following exercises:

 

1. Try to fetch a web page which you know, using the browser. If all goes well, the page should come as normally.

2. Now try to fetch a web page for a domain which does not exist, using the browser. For example, you could try http://www.rehdat.com/index

Look at the message which should come back. This message comes from the proxy cache, and comes instead of the usual 404 Not Found error message which you would get without the proxy.

3. Select a web page from a distant location. We can provide suggestions for files if you have a problem here. Preferably try to choose web pages which you think might not already be in the cache. Download the page(s). Then do some more browsing to other pages, then try to download the distant web page(s) again. You may observe that the download time is reduced. However, this is sometimes a very subjective thing, and it may depend on the state of the network at the time. On fast links, with fast hardware, download times can be very small, so that the cache may not give much apparent benefit.


Sometimes a cache may even make things worse! It is very difficult to be sure that the cache is having a beneficial effect without measurements.

3. Now look at an FTP site, to try to find an file using FTP. A suitable site would be eiher the CPAN or CTAN archives (these are for Perl and Tex/Latex respectively). Open up the site, and browse the files. If you are observant, you will notice that the directory listings will have icons, and this is because the proxy cache is generating an HTML page with icons, rather than your browser creating a directory listing.

  1. While testing the cache, ask your neighbours to suspend or kill their squid process. They can do this in the following way:

pgrep –u root squid

kill <processid>

where <processid> is the processid returned by the pgrep command. Now try to access pages using the browser, and observe the behaviour. Write down your observations in a notebook or logbook.

 

Exercise 3 – Automatic browser configuration

{This part of the exercise is optional, depending on the time available.

You should try Exercise 2, before carrying out this exercise.]

This part of the exercise is to test out the automatic configuration method for configuring your browser with the proxy server. To do this, you will need a script file. This is typically a Javascript file, and should be compatible with both Netscape and Internet Explorer.

Netscape

Use the Edit->Preferences->Advanced->Proxy menu.

Select the automatic configuration option.

Put in the URL of the file which contains the configuration program - set this to

http://pc<nn>.ceenet.ce.hu:80/viaprox.pac

Here the <nn> refers to the machine number.

Confirm with OK.

The web browser must be stopped at this point.

Now you should copy the configuration file /server/archive/viaprox.pac to the appropriate directory :

cp /server/archive/viaprox.pac /server/httpd/htdocs/.

(e.g for pc01 is pc01.ceenet. ceu.hu)

Before restarting the web browser, the squid web server needs a small change. The web server needs to be reconfigured to map the .pac filename extension to the MIME type.

Change to the directory /server/httpd/conf

Edit the file httpd.conf (using an editor of your choice). Look for the part of the file which contains lines beginning with the word Addtype,

and insert the line which follows …

Addtype application/x-ns-proxy-autoconfig .pac

There is one more file to edit. The file /usr/local/squid/etc/mime.conf should be edited to include the line:

\.pac$ application/x-ns-proxy-autoconfig anthony-unknown-txt.gif asccii

 

Make sure that the web server and browser are stopped at this stage.

Now the server and the browser should be configured for operation using the autoconfiguration files.

Comment about Exercises 4 and 5:

Exercises 4 and 5 are more complex, and it is possible that something will go wrong with the configuration if the instructions are not followed very carefully. It is also possible that something will go wrong if the system state is in some way unusual. Also, if you wish to experiment with configuration, this may be of interest to you, but the likelihood is that there could be problems.

There are more than 300 parameters in the Squid configuration file, and often there are many possibilities for each parameter. It is quite possible that you may encounter new problems which no-one will have ever met before. The laboratory assistants and lecturers will try to help you, but it is very possible that because of the vast possible number of states that this will not succeed.

If this happens, it may be easier to start the exercise again, rather than to try to figure out what the problem is. Do not spend more than half an hour trying to solve such difficult problems – seek help, and if necessary back track and restart from an earlier point.

 

Exercise 4 – Cache Manager

  1. Create a directory for cache manager cgi script.
  1. Now change directory as follows , then edit the srm.conf file

ScriptAlias /Squid/cgi-bin/ /usr/local/squid/cgi-bin

3. Next, you should ensure that only specified workstation can access the cache manger.

That is done in your Apache http.conf file. At the bottom of httpd.conf file insert:

<Location /Squid/cgi-bin/cachemgr.cgi>

order deny ,allow

deny from all

allow from 193.225.202.<x> # your neighbour IP address

</Location>

  1. When you have finished configuring your web Server, make sure the web Server is stopped
  2. and then restart it.

  3. You connect to the cache manager with your neighbour web browser, using a URL such

a s :

http://pco<x>.ceenet.ceu.hu/Squid/cgi-bin/cachemgr.cgi

 

Exercise 5 – Cache Digests

A cache Digests is a summary of the contents of an Internet Object Caching Server. It contains in a compact format an indication of whether or not particular URLs are in the cache. A lossy technique is used for compression, which means that very high compression factors can be achieved at the expense of not having 100% correct information.

Now change directory as follows then edit the squid configuration file squid.conf

mcast_groups pc0<x>.ceenet.ceu.hu # your neighbour computer name

cache_peer_domain pc0<x>.ceenet.ceu.hu .ceenet.ceu.hu

cache_peer pc0<x>.ceenet.ceu.hu sibling 3128 3130 digest-url=http://pc0<x>.ceenet.ceu.hu:3128/squid-internal-periodic/store_diges

neighbor_type_domain pc0<x>.ceenet.ceu.hu sibling .ceenet.ceu.hu

and insert the line which follows……

acl proxy-B src 193.225.202.<x>

#Remember that The ip address is your neighbor ip address

and insert the line which follows……

http_access allow proxy-B

Fetching a Cache Digest

http://pc0<x>.ceenet.ceu.hu:3128/squid-internal-periodic/store_digest

In order to view the cache digest, you will need to set an application within your web browser.

Alternatively, you can try the following


/usr/local/squid/logs

and should then be able to detect that the new requests for the files were from the remote cache – since the local cache would use the digest in order to detect the location of the cached files.