Setting up the Internet Gateway and Web Services
From Colwiki.org
This module will enable you to understand Network Proxy Servers, Install and configure the Squid Proxy Server and:
|
|
Contents |
Web Services
There are a number of Web Servers that can be used in the Linux platform, however in this manual, we will use the apache webserver also referred to as HTTPD. With a share of more than 70%, the Apache HTTP Server (Apache) is the world's most widely-used Web server according to the Survey from http://www.netcraft.com/ . Apache, developed by the Apache Software Foundation (http://www.apache.org/), is available for most operating systems.
You can install apache from the rpms or debian files that come with most of the distribution CDs if it is not already installed by default. You can use any of the automated installers that come with the distribution that you are using.
Apache is controlled by a series of configuration files: httpd.conf, access.conf. and srm.conf (there's actually also a mime.types file, but you have to deal with that only when you're adding or removing MIME types from your server, which shouldn't be too often). The files contain instructions, called directives that tell Apache how to run. Several companies offer GUI-based Apache front-ends, but it's easier to edit the configuration files by hand.
Use the chkconfig command to configure Apache to start at boot:
[root@linux]# chkconfig httpd on
Use the httpd init script in the <code>/etc/init.d directory to start,stop, and restart Apache after booting:
[root@linux]# /etc/init.d/httpd start [root@linux]# /etc/init.d/httpd stop [root@linux]# /etc/init.d/httpd restart
You can test whether the Apache process is running with
[root@linux]# pgrep httpd
you should get a response of plain old process ID numbers.
This process may differ between distributions. When using Debian based distribution packages you will note that instead of the httpd you will trype the word apache or apache2.
The configuration file used by Apache is /etc/httpd/conf/httpd.conf in Redhat / Fedora distributions and /etc/apache*/httpd.conf in Debian / Ubuntu distributions. As for most Linux applications, you must restart Apache before changes to this configuration file take effect. All the statements that define the features of each web site are grouped together inside their own <VirtualHost> section, or container, in the httpd.conf file. The most commonly used statements, or directives, inside a <VirtualHost> container are:
- servername: Defines the name of the website managed by the <VirtualHost> container. This is needed in named virtual hosting only, as I'll explain soon.
- DocumentRoot: Defines the directory in which the web pages for the site can be found.
By default, Apache searches the DocumentRoot directory for an index, or home, page named index.html. So for example, if you have a servername of www.my-site.com with a DocumentRoot directory of /home/www/site1/, Apache displays the contents of the file /home/www/site1/index.html when you enter http://www.my-site.com in your browser.
Some editors, such as Microsoft FrontPage, create files with an .htm extension, not .html. This isn't usually a problem if all your HTML files have hyperlinks pointing to files ending in .htm as FrontPage does. The problem occurs with Apache not recognizing the topmost index.htm page. The easiest solution is to create a symbolic link (known as a shortcut to Windows users) called index.html pointing to the file index.htm. This then enables you to edit or copy the file index.htm with index.html being updated automatically. You'll almost never have to worry about index.html and Apache again!
This example creates a symbolic link to index.html in the /home/www/site1 directory.
[root@linux]# cd /home/www/site1 [root@linux]# ln -s index.htm index.html [root@linux]# ll index.* -rw-rw-r-- 1 root root 48590 Jun 18 23:43 index.htm lrwxrwxrwx 1 root root 9 Jun 21 18:05 index.html -> index.htm [root@linux]#
The l at the very beginning of the index.html entry signifies a link and the -> the link target.
The Default File Location
By default, Apache expects to find all its web page files in the /var/www/html/ directory with a generic DocumentRoot statement at the beginning of httpd.conf. The examples in this chapter use the /home/www directory to illustrate how you can place them in other locations successfully.
File Permissions and Apache
Apache will display Web page files as long as they are world readable. You have to make sure you make all the files and subdirectories in your DocumentRoot have the correct permissions.
It is a good idea to have the files owned by a nonprivileged user so that Web developers can update the files using FTP or SCP without requiring the root password.
To do this:
- Create a user with a home directory of /home/www.
- Recursively change the file ownership permissions of the /home/www directory and all its subdirectories.
- Change the permissions on the /home/www directory to 755, which allows all users, including the Apache's httpd daemon, to read the files inside.
[root@linux]# useradd -g users www [root@linux]# chown -R www:users /home/www [root@linux]# chmod 755 /home/www
Now we test for the new ownership with the ll command.
[root@linux]# ll /home/www/site1/index.* -rw-rw-r-- 1 www users 48590 Jun 25 23:43 index.htm lrwxrwxrwx 1 www users 9 Jun 25 18:05 index.html -> index.htm [root@linux]#
Note: Be sure to FTP or SCP new files to your web server as this new user. This will make all the transferred files automatically have the correct ownership.If you browse your Web site after configuring Apache and get a "403 Forbidden" permissions-related error on your screen, then your files or directories under your DocumentRoot most likely have incorrect permissions. You may also have to use the Directory directive to make Apache serve the pages once the file permissions have been correctly set. If you have your files in the default /home/www directory then this second step becomes unnecessary.
Named Virtual Hosting
You can make your Web server host more than one site per IP address by using Apache's named virtual hosting feature. You use the NameVirtualHost directive in the /etc/httpd/conf/httpd.conf file to tell Apache which IP addresses will participate in this feature. The <VirtualHost> containers in the file then tell Apache where it should look for the Web pages used on each Web site. You must specify the IP address for which each <VirtualHost> container applies.
Named Virtual Hosting Example
Consider an example in which the server is configured to provide content on 97.158.253.26. In the code that follows, notice that within each <VirtualHost> container you specify the primary Web site domain name for that IP address with the ServerName directive. The DocumentRoot directive defines the directory that contains the index page for that site. You can also list secondary domain names that will serve the same content as the primary ServerName using the ServerAlias directive. Apache searches for a perfect match of NameVirtualHost, <VirtualHost>, and ServerName when making a decision as to which content to send to the remote user's Web browser. If there is no match, then Apache uses the first <VirtualHost> in the list that matches the target IP address of the request. This is why the first <VirtualHost> statement contains an asterisk: to indicate it should be used for all other Web queries.
NameVirtualHost 97.158.253.26 <VirtualHost *> Default Directives. (In other words, not site #1 or site #2) </VirtualHost> <VirtualHost 97.158.253.26> servername www.my-site.com Directives for site #1 </VirtualHost> <VirtualHost 97.158.253.26> servername www.another-site.com Directives for site #2 </VirtualHost>
Be careful with using the asterisk in other containers. A <VirtualHost> with a specific IP address always gets higher priority than a <VirtualHost> statement with an * intended to cover the same IP address, even if the ServerName directive doesn't match. To get consistent results, try to limit the use of your <VirtualHost *> statements to the beginning of the list to cover any other IP addresses your server may have. You can also have multiple NameVirtualHost directives, each with a single IP address, in cases where your Web server has more than one IP address.
IP-Based Virtual Hosting
The other virtual hosting option is to have one IP address per Web site, which is also known as IP-based virtual hosting. In this case, you will not have a NameVirtualHost directive for the IP address, and you must only have a single <VirtualHost> container per IP address. Also, because there is only one Web site per IP address, the ServerName directive isn't needed in each <VirtualHost> container, unlike in named virtual hosting.
IP Virtual Hosting Example: Single Wild Card
In this example, Apache listens on all interfaces, but gives the same content. Apache displays the content in the first <VirtualHost *> directive even if you add another right after it. Apache also seems to enforce the single <VirtualHost> container per IP address requirement by ignoring any ServerName directives you may use inside it.
<VirtualHost *> DocumentRoot /home/www/site1 </VirtualHost>
IP Virtual Hosting Example: Wild Card and IP addresses
In this example, Apache listens on all interfaces, but gives different content for addresses 97.158.253.26 and 97.158.253.27. Web surfers get the site1 content if they try to access the web server on any of its other IP addresses.
<VirtualHost *> DocumentRoot /home/www/site1 </VirtualHost> <VirtualHost 97.158.253.26> DocumentRoot /home/www/site2 </VirtualHost> <VirtualHost 97.158.253.27> DocumentRoot /home/www/site3 </VirtualHost>
Because it makes configuration easier, system administrators commonly replace the IP address in the <VirtualHost> and NameVirtualHost directives with the* wildcard character to indicate all IP addresses.
If you installed Apache with support for secure HTTPS/SSL, which is used frequently in credit card and shopping cart Web pages, then wild cards won't work. The Apache SSL module demands at least one explicit <VirtualHost> directive for IP-based virtual hosting. When you use wild cards, Apache interprets it as an overlap of name-based and IP-based <VirtualHost> directives and gives error messages because it can't make up its mind about which method to use:
Starting httpd: [Sat Oct 12 21:21:49 2002] [error] VirtualHost _default_:443 -- mixing * ports and non-* ports with a NameVirtualHost address is not supported, proceeding with undefined results
If you try to load any Web page on your web server, you'll see the error:
Configuration - Multiple Sites And IP Addresses
To help you better understand the edits needed to configure the /etc/httpd/conf/httpd.conf file, I'll walk you through an example scenario. The parameters are:
- The web site's systems administrator previously created DNS entries for www.my-site.com, my-site.com, www.my-cool-site.com and www.default-site.com to map the IP address 97.158.253.26 on this web server. The domain www.another-site.com is also configured to point to alias IP address 97.158.253.27. The administrator wants to be able to get to www.test-site.com on all the IP addresses.
- Traffic to www.my-site.com, my-site.com, and www.my-cool-site.com must get content from subdirectory site2. Hitting these URLs causes Apache to display the contents of file index.html in this directory.
- Traffic to www.test-site.com must get content from subdirectory site3.
- Named virtual hosting will be required for 97.158.253.26 as in this case we have a single IP address serving different content for a variety of domains. A NameVirtualHost directive for 97.158.253.26 is therefore required.
- Traffic going to www.another-site.com will get content from directory site4.
- All other domains pointing to this server that don't have a matching ServerName directive will get Web pages from the directory defined in the very first <VirtualHost> container: directory site1. Site www.default-site.com falls in this category.
Securing a Webserver using HTTPS
The secure socket layer protocol SSL allows any networked applications to use encryption. This can be thought of as a process which wraps the socket preparing it to use encryption at the application level. In the case of HTTPS, the server uses a pair of keys, public and private. The server's public key is used by the client to encrypt the session key, the private key is then used to decrypt the session key for use.
The public key is published using certificates. A certificate contains the following information: - Name and Address, Hostname, etc. - Public Key - TTL - (optional) ID + Signature from a certificate authority (CA)
The certificate will be used to establish the authenticity of the server. A valid signature from a known CA is automatically recognised by the client's browser. With Mozilla for example these trusted CA certificates can be found by following the links: Edit -> Preferences -> Privacy & Security -> Certificates then clicking on the “Manage Certificates” button and the Authorities TAB
On the other hand communications would be too slow if the session was encrypted using public key encryption. Instead, once the authenticity of the server is established, the client generates a unique secret session key which is encrypted using the servers public key found in the certificate. Once the server receives this session key it can decrypt it using the private key associated with the certificate. From there on the communication is encrypted and decrypted using this secrete session key generated by the client.
SSL Virtual Hosts
A separate apache server can be used to listen on port 443 and implement SSL connections. However most default configurations involve a single apache server listening on both ports 80 and 443. For this an additional Listen directive is set in httpd.conf asking the server to listen on port 443. Apache will then bind to both ports 443 and 80. Non encrypted connections are handled on port 80 while an SSL aware virtual host is configured to listen on port 443:
<VirtualHost _default_:443> SSL CONFIGURATION </VirtualHost>
The SSL CONFIGURATION lines are:
SSLEngine on SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP SSLCertificateFile PATH_TO_FILE.crt SSLCertificateKeyFile PATH_TO_FILE.key
We need to generate the servers private key (FILE.key) and certificate (FILE.crt) to complete this configuration.
Managing Certificates
The keys and certificates are usually kept in subdirectories of /etc/httpd/conf called ssl.crt and ssl.key. There should also be a Makefile that will generate both a KEY and a CERTIFICATE in PEM format which is base64 encoded data.
Using the Makefile
For example if we want to generate a self-signed certificate and private key simply type:
make mysite.crt
The Makefile will generate both files mysite.key (the private key) as well as mysite.crt (the certificate file containing the public key). You can use the following directives in httpd.conf:
SSLCertificateFile ... mysite.crt SSLCertificateKeyFile ... mysite.key
Certificate Requests
On a production server you would need to generate a new file called a “certificate request” with:
openssl req -new -key mysite.key -out mysite.csr
This file can be sent to a certificate authority (CA) to be signed. The certificate authority will send back the signed certificate.
Pass Phrases
A private key can be generated with or without a passphase, and a private key without a passphrase can be constructed from an existing private key.
A passphrased file: If a private key has a passphrase set then the file starts with
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED DEK-Info: DES-EDE3-CBC,
---- snip ----
..... this means that the file is protected by a pass-phrase using 3DES. This was generate by the line /usr/bin/openssl genrsa -des3 1024 > $@ in the Makefile. If the -des3 flag is omitted NO passphrase is set. You can generate a new private key (mysite-nophrase.key) without a passphrase from the old private key (mysite.key) as follows:
openssl rsa -in mysite.key -out mysite-nopass.key
Reading From the SuSE Linux Enterprise Server (Installation and Administration Document) read Pg 741 – 778 on APACHE Implementation on SUSE Linux
Internet Gateway
An internet gateway is useful in two ways:
- Reduce Internet bandwidth charges
- Limit access to the Web to only authorized users.
In linux this is achieved by the Squid Proxy Server. The Squid web caching proxy server can achieve both these goals fairly easily. Users configure their web browsers to use the Squid proxy server instead of going to the web directly. The Squid server then checks its web cache for the web information requested by the user. It will return any matching information that finds in its cache, and if not, it will go to the web to find it on behalf of the user. Once it finds the information, it will populate its cache with it and also forward it to the user's web browser.
This reduces the amount of data accessed from the web. Another advantage is that you can configure your firewall to only accept HTTP web traffic from the Squid server and no one else. Squid can then be configured to request usernames and passwords for each user that users its services. This provides simple access control to the Internet.
Squid can be installed through the relevant packages of Package application managers provided by the relevant distribution. To start squid on boot use the chkconfig.
[root@linux]# chkconfig squid on
Use the service command to start, stop, and restart Squid after booting:
[root@linux]# service squid start [root@linux]# service squid stop [root@linux]# service squid restart
You can test whether the Squid process is running with the pgrep command:
[root@linux]# pgrep squid
You should get a response of plain old process ID numbers. The /etc/squid/squid.conf File. The main Squid configuration file is squid.conf, and, like most Linux applications, Squid needs to be restarted for changes to the configuration file can take effect.
The Visible Host Name
Squid will fail to start if you don't give your server a hostname. You can set this with the visible_hostname parameter. Here, the hostname is set to the real name of the server bigboy.
visible_hostname linux
Access Control Lists
You can limit users' ability to browse the Internet with access control lists (ACLs). Each ACL line defines a particular type of activity, such as an access time or source network, they are then linked to an http_access statement that tells Squid whether or not to deny or allow traffic that matches the ACL.
Squid matches each Web access request it receives by checking the http_access list from top to bottom. If it finds a match, it enforces the allow or deny statement and stops reading further. You have to be careful not to place a deny statement in the list that blocks a similar allow statement below it. The final http_access statement denies everything, so it is best to place new http_access statements above it
Note: The very last http_access statement in the squid.conf file denies all access. You therefore have to add your specific permit statements above this line. In the chapter's examples, I've suggested that you place your statements at the top of the http_access list for the sake of manageability, but you can put them anywhere in the section above that last line.
Squid has a minimum required set of ACL statements in the ACCESS_CONTROL section of the squid.conf file. It is best to put new customized entries right after this list to make the file easier to read. Restricting Web Access By Time
You can create access control lists with time parameters. For example, you can allow only business hour access from the home network, while always restricting access to host 192.168.1.23.
# # Add this to the bottom of the ACL section of squid.conf # acl home_network src 192.168.1.0/24 acl business_hours time M T W H F 9:00-17:00 acl RestrictedHost src 192.168.1.23 # # Add this at the top of the http_access section of squid.conf # http_access deny RestrictedHost http_access allow home_network business_hours
Or, you can allow morning access only:
# # Add this to the bottom of the ACL section of squid.conf # acl mornings time 08:00-12:00 # # Add this at the top of the http_access section of squid.conf # http_access allow mornings
Restricting Access to specific Web sites
Squid is also capable of reading files containing lists of web sites and/or domains for use in ACLs. In this example we create to lists in files named /usr/local/etc/allowed-sites.squid and /usr/local/etc/restricted-sites.squid.
# File: /usr/local/etc/allowed-sites.squid www.col.org www.yahoo.com
# File: /usr/local/etc/restricted-sites.squid www.porn.com illegal.com
These can then be used to always block the restricted sites and permit the allowed sites during working hours. This can be illustrated by expanding our previous example slightly.
# # Add this to the bottom of the ACL section of squid.conf # acl home_network src 192.168.1.0/24 acl business_hours time M T W H F 9:00-17:00 acl GoodSites dstdomain "/usr/local/etc/allowed-sites.squid" acl BadSites dstdomain "/usr/local/etc/restricted-sites.squid"
# # Add this at the top of the http_access section of squid.conf # http_access deny BadSites http_access allow home_network business_hours GoodSites
Restricting Web Access By IP Address
You can create an access control list that restricts Web access to users on certain networks. In this case, it's an ACL that defines a home network of 192.168.1.0.
# # Add this to the bottom of the ACL section of squid.conf # acl home_network src 192.168.1.0/255.255.255.0
You also have to add a corresponding http_access statement that allows traffic that matches the ACL:
# # Add this at the top of the http_access section of squid.conf # http_access allow home_network
Password Authentication Using NCSA
You can configure Squid to prompt users for a username and password. Squid comes with a program called ncsa_auth that reads any NCSA-compliant encrypted password file. You can use the htpasswd program that comes installed with Apache to create your passwords. Here is how it's done:
1) Create the password file. The name of the password file should be /etc/squid/squid_passwd, and you need to make sure that it's universally readable.
[root@linux]# touch /etc/squid/squid_passwd [root@linux]# chmod o+r /etc/squid/squid_passwd
2) Use the htpasswd program to add users to the password file. You can add users at anytime without having to restart Squid. In this case, you add a username called www:
[root@linux]# htpasswd /etc/squid/squid_passwd www New password: Re-type new password: Adding password for user www [root@linux]#
3) Find your ncsa_auth file using the locate command.
[root@linux]# locate ncsa_auth /usr/lib/squid/ncsa_auth [root@linux]#
4) Edit squid.conf; specifically, you need to define the authentication program in squid.conf, which is in this case ncsa_auth. Next, create an ACL named ncsa_users with the REQUIRED keyword that forces Squid to use the NCSA auth_param method you defined previously. Finally, create an http_access entry that allows traffic that matches the ncsa_users ACL entry. Here's a simple user authentication example; the order of the statements is important:
# # Add this to the auth_param section of squid.conf # auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd # # Add this to the bottom of the ACL section of squid.conf # acl ncsa_users proxy_auth REQUIRED # # Add this at the top of the http_access section of squid.conf # http_access allow ncsa_users
5) This requires password authentication and allows access only during business hours. Once again, the order of the statements is important:
# # Add this to the auth_param section of squid.conf # auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd # # Add this to the bottom of the ACL section of squid.conf # acl ncsa_users proxy_auth REQUIRED acl business_hours time M T W H F 9:00-17:00
# # Add this at the top of the http_access section of squid.conf # http_access allow ncsa_users business_hours
Remember to restart Squid for the changes to take effect.
Forcing Users To Use Your Squid Server
If you are using access controls on Squid, you may also want to configure your firewall to allow only HTTP Internet access to only the Squid server. This forces your users to browse the Web through the Squid proxy. Making Your Squid Server Transparent To Users
It is possible to limit HTTP Internet access to only the Squid server without having to modify the browser settings on your client PCs. This called a transparent proxy configuration. It is usually achieved by configuring a firewall between the client PCs and the Internet to redirect all HTTP (TCP port 80) traffic to the Squid server on TCP port 3128, which is the Squid server's default TCP port.
Squid Transparent Proxy Configuration
Your first step will be to modify your squid.conf to create a transparent proxy. The procedure is different depending on your version of Squid. Prior to version 2.6: In older versions of Squid, transparent proxy was achieved through the use of the httpd_accel options which were originally developed for http acceleration. In these cases, the configuration syntax would be as follows:
httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on
Version 2.6 and Beyond: Newer versions of Squid simply require you to add the word "transparent" to the default "http_port 3128" statement. In this example, Squid not only listens on TCP port 3128 for proxy connections, but will also do so in transparent mode.
http_port 3128 transparent
Configuring iptables to Support the Squid Transparent Proxy
The examples below are based on the discussion of Linux iptables in Chapter 14, "Linux Firewalls Using iptables". Additional commands may be necessary for you particular network topology.
In both cases below, the firewall is connected to the Internet on interface eth0 and to the home network on interface eth1. The firewall is also the default gateway for the home network and handles network address translation on all the network's traffic to the Internet.Only the Squid server has access to the Internet on port 80 (HTTP), because all HTTP traffic, except that coming from the Squid server, is redirected.
If the Squid server and firewall are the same server, all HTTP traffic from the home network is redirected to the firewall itself on the Squid port of 3128 and then only the firewall itself is allowed to access the Internet on port 80.
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 \
-j REDIRECT --to-port 3128
iptables -A INPUT -j ACCEPT -m state \
--state NEW,ESTABLISHED,RELATED -i eth1 -p tcp \
--dport 3128
iptables -A OUTPUT -j ACCEPT -m state \
--state NEW,ESTABLISHED,RELATED -o eth0 -p tcp \
--dport 80
iptables -A INPUT -j ACCEPT -m state \
--state ESTABLISHED,RELATED -i eth0 -p tcp \
--sport 80
iptables -A OUTPUT -j ACCEPT -m state \
--state ESTABLISHED,RELATED -o eth1 -p tcp \
--sport 80
Note: This example is specific to HTTP traffic. You won't be able to adapt this example to support HTTPS web browsing on TCP port 443, as that protocol specifically doesn't allow the insertion of a "man in the middle" server for security purposes. One solution is to add IP masquerading statements for port 443, or any other important traffic, immediately after the code snippet. This will allow non HTTP traffic to access the Internet without being cached by Squid.
If the Squid server and firewall are different servers, the statements are different. You need to set up iptables so that all connections to the Web, not originating from the Squid server, are actually converted into three connections; one from the Web browser client to the firewall and another from the firewall to the Squid server, which triggers the Squid server to make its own connection to the Web to service the request. The Squid server then gets the data and replies to the firewall which then relays this information to the Web browser client. The iptables program does all this using these NAT statements:
iptables -t nat -A PREROUTING -i eth1 -s ! 192.168.1.100 \
-p tcp --dport 80 -j DNAT --to 192.168.1.100:3128
iptables -t nat -A POSTROUTING -o eth1 -s 192.168.1.0/24 \
-d 192.168.1.100 -j SNAT --to 192.168.1.1
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.100 \
-i eth1 -o eth1 -m state
--state NEW,ESTABLISHED,RELATED \
-p tcp --dport 3128 -j ACCEPT
iptables -A FORWARD -d 192.168.1.0/24 -s 192.168.1.100 \
-i eth1 -o eth1 -m state --state ESTABLISHED,RELATED \
-p tcp --sport 3128 -j ACCEPT
In the first statement all HTTP traffic from the home network except from the Squid server at IP address 192.168.1.100 is redirected to the Squid server on port 3128 using destination NAT. The second statement makes this redirected traffic also undergo source NAT to make it appear as if it is coming from the firewall itself. The FORWARD statements are used to ensure the traffic is allowed to flow to the Squid server after the NAT process is complete. The unusual feature is that the NAT all takes place on one interface; that of the home network (eth1).
Apache Log File Analysis Software
Most log analysis tools available for squid are listed on the following site: http://www.squid-cache.org/Scripts/ The main logfile for squid is the /var/log/squid/access.log file. Next is a short overview of calamaris and webalizer. Also notice that webmin produces log reports based on calamaris.
Calamaris
The code is GPL and can be downloaded from http://cord.de/tools/squid/calamaris. You can generate reports as follow:
cat /var/log/squid/access.log | calamaris
In order to get information on webpage requests per host one can use the -R switch: There are many more switches available (check the manpages for calamaris). There are also a number of scripts that can run hourly or monthly reports. These scipts are included in the EXAMPLES file distributed with calamaris.
calamaris -R 5 /var/log/squid/access.log
# Incoming TCP-requests by host host / target request hit-% Byte hit-% sec kB/sec --------------------------------- --------- ------ -------- ------ ---- ------- 192.168.2.103 72 0.00 323336 0.00 0 10.24 *.redhat.com 35 0.00 126726 0.00 0 10.44 *.suse.co.uk 20 0.00 63503 0.00 0 13.15 *.lemonde.fr 6 0.00 109712 0.00 1 16.39 207.36.15.* 5 0.00 8946 0.00 0 3.94 *.akamai.net 4 0.00 12428 0.00 1 4.43 other: 2 requested urlhosts 2 0.00 2021 0.00 1 0.71 192.168.2.101 63 0.00 295315 0.00 1 4.65 cord.de 17 0.00 115787 0.00 0 20.86 *.doubleclick.net 13 0.00 26163 0.00 1 2.07 *.google.com 10 0.00 30646 0.00 1 3.71 *.squid-cache.org 8 0.00 51758 0.00 1 6.53 <error> 4 0.00 4290 0.00 0 10474 other: 6 requested urlhosts 11 0.00 66671 0.00 5 2.28 --------------------------------- --------- ------ -------- ------ ---- ------- Sum 135 0.00 618651 0.00 1 6.51
Webalizer
This tool is often installed by default on some Linux distributions. It is also GPL'ed and can be downloaded from http://www.mrunix.net/webalizer/. By editing the /etc/webalizer.conf file one can choose between apache access logs, ftp transfer logs or squid logs. Example graphics generated with webaliser.
{{Reading}|From the SuSE Linux Enterprise Server (Installation and Administration Document) read Pg 781 – 800 on Squid Proxy and SquidGuard Implementation on SUSE Linux}}
| In this module you learned how to set up a Linux Webserver and Proxy server. These are common functions that you as the Linux System Administrator will be required to perform as from time to time. |
| Study the following two (2) articles and follow the instructions in them to set up a Linux based proxy server |
This work is licenced under a Creative Commons - By Attribution Licence - Share Alike License.


