Setting up the Internet Gateway and Web Services

From Colwiki.org

Jump to: navigation, search


Outcomes

This module will enable you to understand Network Proxy Servers, Install and configure the Squid Proxy Server and:
  • Configure the Squid Proxy Server to offer controlled, authenticated and validated internet gateway access
  • Installation and Configuration of APACHE Webserver
  • Advanced Apache Web server configuration and integration to Databases
  • Apache Web server Performance tuning



Terminologies

  • Proxy Server: A proxy server is a server (a computer system or an application program) that acts as a go-between for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource, available from a different server. The proxy server evaluates the request according to its filtering rules
  • Web Server: A computer program that is responsible for accepting HTTP requests from clients (user agents such as web browsers), and serving them HTTP responses along with optional data contents, which usually are web pages such as HTML documents and linked objects (images, etc.).



Contents

Web Services

There are a number of Web Servers that can be used in the Linux platform, however in this manual, we will use the apache webserver also referred to as HTTPD. With a share of more than 70%, the Apache HTTP Server (Apache) is the world's most widely-used Web server according to the Survey from http://www.netcraft.com/ . Apache, developed by the Apache Software Foundation (http://www.apache.org/), is available for most operating systems.

You can install apache from the rpms or debian files that come with most of the distribution CDs if it is not already installed by default. You can use any of the automated installers that come with the distribution that you are using.

Apache is controlled by a series of configuration files: httpd.conf, access.conf. and srm.conf (there's actually also a mime.types file, but you have to deal with that only when you're adding or removing MIME types from your server, which shouldn't be too often). The files contain instructions, called directives that tell Apache how to run. Several companies offer GUI-based Apache front-ends, but it's easier to edit the configuration files by hand.

Use the chkconfig command to configure Apache to start at boot:

[root@linux]# chkconfig httpd on

Use the httpd init script in the <code>/etc/init.d directory to start,stop, and restart Apache after booting:

[root@linux]# /etc/init.d/httpd start
[root@linux]# /etc/init.d/httpd stop
[root@linux]# /etc/init.d/httpd restart

You can test whether the Apache process is running with

[root@linux]# pgrep httpd

you should get a response of plain old process ID numbers.

This process may differ between distributions. When using Debian based distribution packages you will note that instead of the httpd you will trype the word apache or apache2.

The configuration file used by Apache is /etc/httpd/conf/httpd.conf in Redhat / Fedora distributions and /etc/apache*/httpd.conf in Debian / Ubuntu distributions. As for most Linux applications, you must restart Apache before changes to this configuration file take effect. All the statements that define the features of each web site are grouped together inside their own <VirtualHost> section, or container, in the httpd.conf file. The most commonly used statements, or directives, inside a <VirtualHost> container are:

  • servername: Defines the name of the website managed by the <VirtualHost> container. This is needed in named virtual hosting only, as I'll explain soon.
  • DocumentRoot: Defines the directory in which the web pages for the site can be found.

By default, Apache searches the DocumentRoot directory for an index, or home, page named index.html. So for example, if you have a servername of www.my-site.com with a DocumentRoot directory of /home/www/site1/, Apache displays the contents of the file /home/www/site1/index.html when you enter http://www.my-site.com in your browser.

Some editors, such as Microsoft FrontPage, create files with an .htm extension, not .html. This isn't usually a problem if all your HTML files have hyperlinks pointing to files ending in .htm as FrontPage does. The problem occurs with Apache not recognizing the topmost index.htm page. The easiest solution is to create a symbolic link (known as a shortcut to Windows users) called index.html pointing to the file index.htm. This then enables you to edit or copy the file index.htm with index.html being updated automatically. You'll almost never have to worry about index.html and Apache again!

This example creates a symbolic link to index.html in the /home/www/site1 directory.

[root@linux]# cd /home/www/site1
[root@linux]# ln -s index.htm index.html
[root@linux]# ll index.*
-rw-rw-r--    1 root     root        48590 Jun 18 23:43 index.htm
lrwxrwxrwx    1 root     root            9 Jun 21 18:05 index.html -> index.htm
[root@linux]#

The l at the very beginning of the index.html entry signifies a link and the -> the link target.

The Default File Location

By default, Apache expects to find all its web page files in the /var/www/html/ directory with a generic DocumentRoot statement at the beginning of httpd.conf. The examples in this chapter use the /home/www directory to illustrate how you can place them in other locations successfully.

File Permissions and Apache

Apache will display Web page files as long as they are world readable. You have to make sure you make all the files and subdirectories in your DocumentRoot have the correct permissions.

It is a good idea to have the files owned by a nonprivileged user so that Web developers can update the files using FTP or SCP without requiring the root password.

To do this:

  • Create a user with a home directory of /home/www.
  • Recursively change the file ownership permissions of the /home/www directory and all its subdirectories.
  • Change the permissions on the /home/www directory to 755, which allows all users, including the Apache's httpd daemon, to read the files inside.
[root@linux]# useradd -g users www
[root@linux]# chown -R www:users /home/www
[root@linux]# chmod 755 /home/www

Now we test for the new ownership with the ll command.

[root@linux]# ll /home/www/site1/index.*
-rw-rw-r--    1 www     users       48590 Jun 25 23:43 index.htm
lrwxrwxrwx    1 www     users           9 Jun 25 18:05 index.html -> index.htm
[root@linux]#

Note: Be sure to FTP or SCP new files to your web server as this new user. This will make all the transferred files automatically have the correct ownership.If you browse your Web site after configuring Apache and get a "403 Forbidden" permissions-related error on your screen, then your files or directories under your DocumentRoot most likely have incorrect permissions. You may also have to use the Directory directive to make Apache serve the pages once the file permissions have been correctly set. If you have your files in the default /home/www directory then this second step becomes unnecessary.

Named Virtual Hosting

You can make your Web server host more than one site per IP address by using Apache's named virtual hosting feature. You use the NameVirtualHost directive in the /etc/httpd/conf/httpd.conf file to tell Apache which IP addresses will participate in this feature. The <VirtualHost> containers in the file then tell Apache where it should look for the Web pages used on each Web site. You must specify the IP address for which each <VirtualHost> container applies.

Named Virtual Hosting Example

Consider an example in which the server is configured to provide content on 97.158.253.26. In the code that follows, notice that within each <VirtualHost> container you specify the primary Web site domain name for that IP address with the ServerName directive. The DocumentRoot directive defines the directory that contains the index page for that site. You can also list secondary domain names that will serve the same content as the primary ServerName using the ServerAlias directive. Apache searches for a perfect match of NameVirtualHost, <VirtualHost>, and ServerName when making a decision as to which content to send to the remote user's Web browser. If there is no match, then Apache uses the first <VirtualHost> in the list that matches the target IP address of the request. This is why the first <VirtualHost> statement contains an asterisk: to indicate it should be used for all other Web queries.

NameVirtualHost 97.158.253.26
<VirtualHost *>
  Default Directives. (In other words, not site #1 or site #2)
</VirtualHost>
<VirtualHost 97.158.253.26>
  servername www.my-site.com
  Directives for site #1
</VirtualHost>
<VirtualHost 97.158.253.26>
  servername www.another-site.com
  Directives for site #2
</VirtualHost>

Be careful with using the asterisk in other containers. A <VirtualHost> with a specific IP address always gets higher priority than a <VirtualHost> statement with an * intended to cover the same IP address, even if the ServerName directive doesn't match. To get consistent results, try to limit the use of your <VirtualHost *> statements to the beginning of the list to cover any other IP addresses your server may have. You can also have multiple NameVirtualHost directives, each with a single IP address, in cases where your Web server has more than one IP address.

IP-Based Virtual Hosting

The other virtual hosting option is to have one IP address per Web site, which is also known as IP-based virtual hosting. In this case, you will not have a NameVirtualHost directive for the IP address, and you must only have a single <VirtualHost> container per IP address. Also, because there is only one Web site per IP address, the ServerName directive isn't needed in each <VirtualHost> container, unlike in named virtual hosting.

IP Virtual Hosting Example: Single Wild Card

In this example, Apache listens on all interfaces, but gives the same content. Apache displays the content in the first <VirtualHost *> directive even if you add another right after it. Apache also seems to enforce the single <VirtualHost> container per IP address requirement by ignoring any ServerName directives you may use inside it.

<VirtualHost *>
  DocumentRoot /home/www/site1
</VirtualHost>

IP Virtual Hosting Example: Wild Card and IP addresses

In this example, Apache listens on all interfaces, but gives different content for addresses 97.158.253.26 and 97.158.253.27. Web surfers get the site1 content if they try to access the web server on any of its other IP addresses.

<VirtualHost *>
  DocumentRoot /home/www/site1
</VirtualHost>
<VirtualHost 97.158.253.26>
  DocumentRoot /home/www/site2
</VirtualHost>
<VirtualHost 97.158.253.27>
  DocumentRoot /home/www/site3
</VirtualHost>

Because it makes configuration easier, system administrators commonly replace the IP address in the <VirtualHost> and NameVirtualHost directives with the* wildcard character to indicate all IP addresses.

If you installed Apache with support for secure HTTPS/SSL, which is used frequently in credit card and shopping cart Web pages, then wild cards won't work. The Apache SSL module demands at least one explicit <VirtualHost> directive for IP-based virtual hosting. When you use wild cards, Apache interprets it as an overlap of name-based and IP-based <VirtualHost> directives and gives error messages because it can't make up its mind about which method to use:

Starting httpd: [Sat Oct 12 21:21:49 2002] [error] VirtualHost _default_:443 -- mixing * ports and non-* ports with a NameVirtualHost address is not supported, proceeding with undefined results

If you try to load any Web page on your web server, you'll see the error:

Configuration - Multiple Sites And IP Addresses

To help you better understand the edits needed to configure the /etc/httpd/conf/httpd.conf file, I'll walk you through an example scenario. The parameters are:

  1. The web site's systems administrator previously created DNS entries for www.my-site.com, my-site.com, www.my-cool-site.com and www.default-site.com to map the IP address 97.158.253.26 on this web server. The domain www.another-site.com is also configured to point to alias IP address 97.158.253.27. The administrator wants to be able to get to www.test-site.com on all the IP addresses.
  2. Traffic to www.my-site.com, my-site.com, and www.my-cool-site.com must get content from subdirectory site2. Hitting these URLs causes Apache to display the contents of file index.html in this directory.
  3. Traffic to www.test-site.com must get content from subdirectory site3.
  4. Named virtual hosting will be required for 97.158.253.26 as in this case we have a single IP address serving different content for a variety of domains. A NameVirtualHost directive for 97.158.253.26 is therefore required.
  5. Traffic going to www.another-site.com will get content from directory site4.
  6. All other domains pointing to this server that don't have a matching ServerName directive will get Web pages from the directory defined in the very first <VirtualHost> container: directory site1. Site www.default-site.com falls in this category.

Securing a Webserver using HTTPS

The secure socket layer protocol SSL allows any networked applications to use encryption. This can be thought of as a process which wraps the socket preparing it to use encryption at the application level. In the case of HTTPS, the server uses a pair of keys, public and private. The server's public key is used by the client to encrypt the session key, the private key is then used to decrypt the session key for use.

The public key is published using certificates. A certificate contains the following information: - Name and Address, Hostname, etc. - Public Key - TTL - (optional) ID + Signature from a certificate authority (CA)

The certificate will be used to establish the authenticity of the server. A valid signature from a known CA is automatically recognised by the client's browser. With Mozilla for example these trusted CA certificates can be found by following the links: Edit -> Preferences -> Privacy & Security -> Certificates then clicking on the “Manage Certificates” button and the Authorities TAB

Image:Ssl.png

On the other hand communications would be too slow if the session was encrypted using public key encryption. Instead, once the authenticity of the server is established, the client generates a unique secret session key which is encrypted using the servers public key found in the certificate. Once the server receives this session key it can decrypt it using the private key associated with the certificate. From there on the communication is encrypted and decrypted using this secrete session key generated by the client.

SSL Virtual Hosts

A separate apache server can be used to listen on port 443 and implement SSL connections. However most default configurations involve a single apache server listening on both ports 80 and 443. For this an additional Listen directive is set in httpd.conf asking the server to listen on port 443. Apache will then bind to both ports 443 and 80. Non encrypted connections are handled on port 80 while an SSL aware virtual host is configured to listen on port 443:

<VirtualHost _default_:443> SSL CONFIGURATION </VirtualHost>


The SSL CONFIGURATION lines are:

SSLEngine on
SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP
SSLCertificateFile PATH_TO_FILE.crt
SSLCertificateKeyFile PATH_TO_FILE.key

We need to generate the servers private key (FILE.key) and certificate (FILE.crt) to complete this configuration.

Managing Certificates

The keys and certificates are usually kept in subdirectories of /etc/httpd/conf called ssl.crt and ssl.key. There should also be a Makefile that will generate both a KEY and a CERTIFICATE in PEM format which is base64 encoded data.

Using the Makefile

For example if we want to generate a self-signed certificate and private key simply type:

make mysite.crt

The Makefile will generate both files mysite.key (the private key) as well as mysite.crt (the certificate file containing the public key). You can use the following directives in httpd.conf:

SSLCertificateFile  ... mysite.crt
SSLCertificateKeyFile ... mysite.key

Certificate Requests

On a production server you would need to generate a new file called a “certificate request” with:

openssl	 req -new  -key mysite.key -out mysite.csr

This file can be sent to a certificate authority (CA) to be signed. The certificate authority will send back the signed certificate.

Pass Phrases

A private key can be generated with or without a passphase, and a private key without a passphrase can be constructed from an existing private key.

A passphrased file: If a private key has a passphrase set then the file starts with

       -----BEGIN RSA PRIVATE KEY-----

Proc-Type: 4,ENCRYPTED DEK-Info: DES-EDE3-CBC,

       ---- snip ----

..... this means that the file is protected by a pass-phrase using 3DES. This was generate by the line /usr/bin/openssl genrsa -des3 1024 > $@ in the Makefile. If the -des3 flag is omitted NO passphrase is set. You can generate a new private key (mysite-nophrase.key) without a passphrase from the old private key (mysite.key) as follows:

openssl rsa -in mysite.key -out mysite-nopass.key

Reading From the SuSE Linux Enterprise Server (Installation and Administration Document) read Pg 741 – 778 on APACHE Implementation on SUSE Linux

Internet Gateway

An internet gateway is useful in two ways:

  • Reduce Internet bandwidth charges
  • Limit access to the Web to only authorized users.

In linux this is achieved by the Squid Proxy Server. The Squid web caching proxy server can achieve both these goals fairly easily. Users configure their web browsers to use the Squid proxy server instead of going to the web directly. The Squid server then checks its web cache for the web information requested by the user. It will return any matching information that finds in its cache, and if not, it will go to the web to find it on behalf of the user. Once it finds the information, it will populate its cache with it and also forward it to the user's web browser.

This reduces the amount of data accessed from the web. Another advantage is that you can configure your firewall to only accept HTTP web traffic from the Squid server and no one else. Squid can then be configured to request usernames and passwords for each user that users its services. This provides simple access control to the Internet.

Squid can be installed through the relevant packages of Package application managers provided by the relevant distribution. To start squid on boot use the chkconfig.

[root@linux]# chkconfig squid on

Use the service command to start, stop, and restart Squid after booting:

[root@linux]# service squid start
[root@linux]# service squid stop
[root@linux]# service squid restart

You can test whether the Squid process is running with the pgrep command:

[root@linux]# pgrep squid

You should get a response of plain old process ID numbers. The /etc/squid/squid.conf File. The main Squid configuration file is squid.conf, and, like most Linux applications, Squid needs to be restarted for changes to the configuration file can take effect.

The Visible Host Name

Squid will fail to start if you don't give your server a hostname. You can set this with the visible_hostname parameter. Here, the hostname is set to the real name of the server bigboy.

visible_hostname linux

Access Control Lists

You can limit users' ability to browse the Internet with access control lists (ACLs). Each ACL line defines a particular type of activity, such as an access time or source network, they are then linked to an http_access statement that tells Squid whether or not to deny or allow traffic that matches the ACL.

Squid matches each Web access request it receives by checking the http_access list from top to bottom. If it finds a match, it enforces the allow or deny statement and stops reading further. You have to be careful not to place a deny statement in the list that blocks a similar allow statement below it. The final http_access statement denies everything, so it is best to place new http_access statements above it

Note: The very last http_access statement in the squid.conf file denies all access. You therefore have to add your specific permit statements above this line. In the chapter's examples, I've suggested that you place your statements at the top of the http_access list for the sake of manageability, but you can put them anywhere in the section above that last line.

Squid has a minimum required set of ACL statements in the ACCESS_CONTROL section of the squid.conf file. It is best to put new customized entries right after this list to make the file easier to read. Restricting Web Access By Time

You can create access control lists with time parameters. For example, you can allow only business hour access from the home network, while always restricting access to host 192.168.1.23.


#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/24
acl business_hours time M T W H F 9:00-17:00
acl RestrictedHost src 192.168.1.23

#
# Add this at the top of the http_access section of squid.conf
#
http_access deny RestrictedHost
http_access allow home_network business_hours

Or, you can allow morning access only:

#
# Add this to the bottom of the ACL section of squid.conf
#
acl mornings time 08:00-12:00

#
# Add this at the top of the http_access section of squid.conf
#
http_access allow mornings

Restricting Access to specific Web sites

Squid is also capable of reading files containing lists of web sites and/or domains for use in ACLs. In this example we create to lists in files named /usr/local/etc/allowed-sites.squid and /usr/local/etc/restricted-sites.squid.

# File: /usr/local/etc/allowed-sites.squid
www.col.org
www.yahoo.com
# File: /usr/local/etc/restricted-sites.squid
www.porn.com
illegal.com

These can then be used to always block the restricted sites and permit the allowed sites during working hours. This can be illustrated by expanding our previous example slightly.

#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/24
acl business_hours time M T W H F 9:00-17:00
acl GoodSites dstdomain "/usr/local/etc/allowed-sites.squid"
acl BadSites  dstdomain "/usr/local/etc/restricted-sites.squid"
#
# Add this at the top of the http_access section of squid.conf
#
http_access deny BadSites
http_access allow home_network business_hours GoodSites

Restricting Web Access By IP Address

You can create an access control list that restricts Web access to users on certain networks. In this case, it's an ACL that defines a home network of 192.168.1.0.

#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/255.255.255.0

You also have to add a corresponding http_access statement that allows traffic that matches the ACL:

#
# Add this at the top of the http_access section of squid.conf
#
http_access allow home_network

Password Authentication Using NCSA

You can configure Squid to prompt users for a username and password. Squid comes with a program called ncsa_auth that reads any NCSA-compliant encrypted password file. You can use the htpasswd program that comes installed with Apache to create your passwords. Here is how it's done:

1) Create the password file. The name of the password file should be /etc/squid/squid_passwd, and you need to make sure that it's universally readable.

[root@linux]# touch /etc/squid/squid_passwd
[root@linux]# chmod o+r /etc/squid/squid_passwd

2) Use the htpasswd program to add users to the password file. You can add users at anytime without having to restart Squid. In this case, you add a username called www:

[root@linux]# htpasswd /etc/squid/squid_passwd www
New password:
Re-type new password:
Adding password for user www
[root@linux]#

3) Find your ncsa_auth file using the locate command.

[root@linux]# locate ncsa_auth
/usr/lib/squid/ncsa_auth
[root@linux]#

4) Edit squid.conf; specifically, you need to define the authentication program in squid.conf, which is in this case ncsa_auth. Next, create an ACL named ncsa_users with the REQUIRED keyword that forces Squid to use the NCSA auth_param method you defined previously. Finally, create an http_access entry that allows traffic that matches the ncsa_users ACL entry. Here's a simple user authentication example; the order of the statements is important:

#
# Add this to the auth_param section of squid.conf
#
auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd

#
# Add this to the bottom of the ACL section of squid.conf
#
acl ncsa_users proxy_auth REQUIRED
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow ncsa_users

5) This requires password authentication and allows access only during business hours. Once again, the order of the statements is important:

#
# Add this to the auth_param section of squid.conf
#
auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd

#
# Add this to the bottom of the ACL section of squid.conf
#
acl ncsa_users proxy_auth REQUIRED
acl business_hours time M T W H F 9:00-17:00
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow ncsa_users business_hours

Remember to restart Squid for the changes to take effect.

Forcing Users To Use Your Squid Server

If you are using access controls on Squid, you may also want to configure your firewall to allow only HTTP Internet access to only the Squid server. This forces your users to browse the Web through the Squid proxy. Making Your Squid Server Transparent To Users

It is possible to limit HTTP Internet access to only the Squid server without having to modify the browser settings on your client PCs. This called a transparent proxy configuration. It is usually achieved by configuring a firewall between the client PCs and the Internet to redirect all HTTP (TCP port 80) traffic to the Squid server on TCP port 3128, which is the Squid server's default TCP port.

Squid Transparent Proxy Configuration

Your first step will be to modify your squid.conf to create a transparent proxy. The procedure is different depending on your version of Squid. Prior to version 2.6: In older versions of Squid, transparent proxy was achieved through the use of the httpd_accel options which were originally developed for http acceleration. In these cases, the configuration syntax would be as follows:

httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

Version 2.6 and Beyond: Newer versions of Squid simply require you to add the word "transparent" to the default "http_port 3128" statement. In this example, Squid not only listens on TCP port 3128 for proxy connections, but will also do so in transparent mode.

http_port 3128 transparent

Configuring iptables to Support the Squid Transparent Proxy

The examples below are based on the discussion of Linux iptables in Chapter 14, "Linux Firewalls Using iptables". Additional commands may be necessary for you particular network topology.

In both cases below, the firewall is connected to the Internet on interface eth0 and to the home network on interface eth1. The firewall is also the default gateway for the home network and handles network address translation on all the network's traffic to the Internet.Only the Squid server has access to the Internet on port 80 (HTTP), because all HTTP traffic, except that coming from the Squid server, is redirected.

If the Squid server and firewall are the same server, all HTTP traffic from the home network is redirected to the firewall itself on the Squid port of 3128 and then only the firewall itself is allowed to access the Internet on port 80.

iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 \
       -j REDIRECT --to-port 3128
iptables -A INPUT -j ACCEPT -m state \
       --state NEW,ESTABLISHED,RELATED -i eth1 -p tcp \
       --dport 3128
iptables -A OUTPUT -j ACCEPT -m state \
       --state NEW,ESTABLISHED,RELATED -o eth0 -p tcp \
       --dport 80
iptables -A INPUT -j ACCEPT -m state \
       --state ESTABLISHED,RELATED -i eth0 -p tcp \
       --sport 80
iptables -A OUTPUT -j ACCEPT -m state \
       --state ESTABLISHED,RELATED -o eth1 -p tcp \
       --sport 80

Note: This example is specific to HTTP traffic. You won't be able to adapt this example to support HTTPS web browsing on TCP port 443, as that protocol specifically doesn't allow the insertion of a "man in the middle" server for security purposes. One solution is to add IP masquerading statements for port 443, or any other important traffic, immediately after the code snippet. This will allow non HTTP traffic to access the Internet without being cached by Squid.

If the Squid server and firewall are different servers, the statements are different. You need to set up iptables so that all connections to the Web, not originating from the Squid server, are actually converted into three connections; one from the Web browser client to the firewall and another from the firewall to the Squid server, which triggers the Squid server to make its own connection to the Web to service the request. The Squid server then gets the data and replies to the firewall which then relays this information to the Web browser client. The iptables program does all this using these NAT statements:

iptables -t nat -A PREROUTING -i eth1 -s ! 192.168.1.100 \
       -p tcp --dport 80 -j DNAT --to 192.168.1.100:3128
iptables -t nat -A POSTROUTING -o eth1 -s 192.168.1.0/24 \
       -d 192.168.1.100 -j SNAT --to 192.168.1.1
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.100 \
       -i eth1 -o eth1 -m state 
        --state NEW,ESTABLISHED,RELATED \
       -p tcp --dport 3128 -j ACCEPT
iptables -A FORWARD -d 192.168.1.0/24 -s 192.168.1.100 \
       -i eth1 -o eth1 -m state --state ESTABLISHED,RELATED \
       -p tcp --sport 3128 -j ACCEPT

In the first statement all HTTP traffic from the home network except from the Squid server at IP address 192.168.1.100 is redirected to the Squid server on port 3128 using destination NAT. The second statement makes this redirected traffic also undergo source NAT to make it appear as if it is coming from the firewall itself. The FORWARD statements are used to ensure the traffic is allowed to flow to the Squid server after the NAT process is complete. The unusual feature is that the NAT all takes place on one interface; that of the home network (eth1).

Apache Log File Analysis Software

Most log analysis tools available for squid are listed on the following site: http://www.squid-cache.org/Scripts/ The main logfile for squid is the /var/log/squid/access.log file. Next is a short overview of calamaris and webalizer. Also notice that webmin produces log reports based on calamaris.

Calamaris

The code is GPL and can be downloaded from http://cord.de/tools/squid/calamaris. You can generate reports as follow:

cat /var/log/squid/access.log | calamaris

In order to get information on webpage requests per host one can use the -R switch: There are many more switches available (check the manpages for calamaris). There are also a number of scripts that can run hourly or monthly reports. These scipts are included in the EXAMPLES file distributed with calamaris.

calamaris -R 5 /var/log/squid/access.log
# Incoming TCP-requests by host
host / target                      request   hit-%   Byte    hit-% sec   kB/sec 
--------------------------------- --------- ------ -------- ------ ---- ------- 
192.168.2.103                    	72   0.00   323336   0.00    0   10.24 
*.redhat.com                       	35   0.00   126726   0.00    0   10.44 
*.suse.co.uk      		20   0.00    63503    0.00    0   13.15 
*.lemonde.fr		6     0.00   109712   0.00    1   16.39 
207.36.15.*    		5     0.00     8946     0.00    0    3.94 
*.akamai.net   		4     0.00    12428    0.00    1    4.43 
other: 2 requested urlhosts	2     0.00     2021     0.00    1    0.71 
192.168.2.101 		63   0.00   295315   0.00    1    4.65 
cord.de 			17   0.00   115787   0.00    0   20.86  
*.doubleclick.net   		13   0.00    26163   0.00    1    2.07 
*.google.com 		10   0.00    30646   0.00    1    3.71 
*.squid-cache.org  	8     0.00    51758   0.00    1    6.53 
<error>                                  4     0.00     4290    0.00    0   10474 
other: 6 requested urlhosts	11   0.00    66671   0.00    5    2.28 
--------------------------------- --------- ------ -------- ------ ---- ------- 
Sum                                     135   0.00   618651   0.00    1    6.51 

Webalizer

This tool is often installed by default on some Linux distributions. It is also GPL'ed and can be downloaded from http://www.mrunix.net/webalizer/. By editing the /etc/webalizer.conf file one can choose between apache access logs, ftp transfer logs or squid logs. Example graphics generated with webaliser.


{{Reading}|From the SuSE Linux Enterprise Server (Installation and Administration Document) read Pg 781 – 800 on Squid Proxy and SquidGuard Implementation on SUSE Linux}}

Summary

In this module you learned how to set up a Linux Webserver and Proxy server. These are common functions that you as the Linux System Administrator will be required to perform as from time to time.



Assignment

Study the following two (2) articles and follow the instructions in them to set up a Linux based proxy server


Image:somerights20.png This work is licenced under a Creative Commons - By Attribution Licence - Share Alike License.

Personal tools
News & Events