29. Misc/FAQ/Wisdom from the mailing list

These topics were too short or not central enough to LVS operation to have their own section.

29.1. Having one director handling multiple LVS sites, Multiple VIPs

Multiple VIPs (and their associated services) can co-exist independantly on an LVS. On the director, add the extra IPs to a device facing the internet. On the realservers, for LVS-DR||VS-Tun, add the VIPs to a device and setup services listening to the ports. On the realservers, for LVS-NAT, add the extra services to the RIP.

Keith Rowland wrote:

Can I use Virtual Server to host multiple domains on the cluster? Can VS be setup to respond to multiple 10-20 different IP addresses and use the clusters to reposnd to any one of them with the proper web directory.

James CE Johnson jjohnson (at) mobsec (dot) com

If I understand the question correctly, then the answer is yes :-) I have one system that has two IP addresses and responds to two names:

  foo.mydomain.com  A.B.C.foo  eth1
  bar.mydomain.com  A.B.C.bar  eth1:0

On that system (kernel 2.0.36 BTW) I have LVS setup as:

  ippfvsadm -A -t A.B.C.foo:80 -R 192.168.42.50:80
  ippfvsadm -A -t A.B.C.bar:80 -R 192.168.42.100:80

To make matters even more confusing, 192.168.42.(50|100) are actually one system where eth0 is 192.168.42.100 and eth0:0 is 192.168.42.50. We'll call that 'node'.

Apache on 'node' is setup to serve foo.mydomain.com on ...100 and bar.mydomain.com on ...50.

It took me a while to sort it out but it all works quite nicely. I can easily move bar.mydomain.com to another node within the cluster by simply changing the ippfvsadm setup on the externally addressable node.

Tao Zhao 6 Nov 2001

what if I need multiple VIPs on the realserver?

Julian Anastasov ja (at) ssi (dot) bg 06 Nov 2001

for i in 180 182 182
do
	ip addr add X.Y.Z.$i dev dummy0
done

There is also an example for setting up multiple VIPs on HA.

29.2. Limiting number of clients connecting to LVS

Milind Patil mpatil (at) iqs (dot) co (dot) in 24 Sep 2001

I want to limit number of users accessing the LVS services at any given time. How can I do it.

Julian

  • for non-NAT cluster (maybe stupid but interesting)

    May be an array from policers, for example, 1024 policers or an user-defined value, power of 2. Each client hits one of the policers based on their IP/Port. This is mostly a job for QoS ingress, even the distributed attack but may be something can be done for LVS? May be we better to develop a QoS Ingress module? The key could be derived from CIP and CPORT, may be something similar to SFQ but without queueing. It can be implemented may be as a patch to the normal policer but with one argument: the real number of policers. Then this extended policer can look into the TCP/UDP packets to redirect each packet to one of the real policers.

  • for NAT only

    Run SFQ qdisc on your external interface(s). It seems this is not a solution for DR method. Of course, one can run SFQ on its uplink router.

  • Linux 2.4 only

    iptables has support to limit the traffic but I'm not sure whether it is useful for your requirements. I assume you want to set limit to each one of these 1024 aggregated flows.

Wenzhuo Zhang

Is anybody actually using the ingress policer for anti-DoS? I tried it several days ago using the script in the iproute2 package: iproute2/examples/SYN-DoS.rate.limit. I've tested it against different 2.2 kernels (2.2.19-7.0.8(redhat kernel), 2.2.19, 2.2.20preX, with all QoS related functions either compiled into the kernel or as modules) and different versions of iproute2. In all cases, tc fails to install the ingress qdisc policer:

    root@panda:~# tc qdisc add dev eth0 handle ffff: ingress
    RTNETLINK answers: No such file or directory
    root@panda:~# /tmp/tc qdisc add dev eth0 handle ffff: ingress
    RTNETLINK answers: No such file or directory

Julian

For 2.2, you need the ds-8 package, at Package for Differentiated Services on Linux. Compile tc by setting TC_CONFIG_DIFFSERV=y in Config. The right command is:

	tc qdisc add dev eth0 ingress

Ratz

The 2.2.x version is not supported anymore. The advanced routing documentation says to only use 2.4.

For 2.4 ingress is in the kernel but it is still unusable for more than one device (look in linux-netdev for reference).

29.3. Setting up a fake service on the realserver with inetd

from Ratz ratz (at) tac (dot) ch

We're going to set up a LVS cluster from scratch. you need

  • 4 machines (2 realserver, 1 load balancer, 1 client) wired like described in various sketches throughout this howto.

  • fun and some spare time (actually quite some if it doesn't work out the first time like described)

The goal is to set up an loadbalanced tcp application. The application will consist of a own written shell script being invoked by inetd. As you might have guessed, security is very low priority, you should get the idea behind this. Of course I should take xinetd and of course I should use a tcpwrapper and maybe even SecurID authentication but here the goal is to understand the fundamental design principals of a LVS cluster and its deploy. All instructions will be done as root.

Setting up the realserver

Edit /etc/inetd.conf and add following line:
lvs-test        stream  tcp     nowait  root    /usr/bin/lvs-info       lvs-info

Edit /etc/services and add following line:
lvs-test        31337/tcp               # supersecure lvs-test port

Now you need to get inetd running. This is different for every Unix. So please have a look at it yourself. You verify if it's running with 'ps ax|grep [i]netd' And to verify if it really runs this port you do a 'netstat -an|grep LISTEN' and if there is a line:

tcp        0      0 0.0.0.0:31337           0.0.0.0:*               LISTEN

you're one step closer to the truth. Now we have to supply the script that will be called if you connect to realserver# port 31337. So simply do this on your command line (copy 'n' paste):

cat > /usr/bin/lvs-info << EOF && chmod 755 /usr/bin/lvs-info
#!/bin/sh

echo "This is a test of machine `ifconfig -a | grep HWaddr | awk '{print $1}'`"
echo
EOF

Now you can test if it really works with telnet or phatcat:

telnet localhost 31337
phatcat localhost 31337

This should spill out something like:

hog:/ # phatcat localhost 31337
This is a test of machine 192.168.1.11

hog:/ #

If it worked, do the same procedure to set up the second realserver. Now we're ready to set up the load balancer. These are the required commands to set it up for our example:

director:/etc/lvs# ipvsadm -A -t 192.168.1.100:31337 -s wrr
director:/etc/lvs# ipvsadm -a -t 192.168.1.100:31337 -r 192.168.1.11 -g -w 1
director:/etc/lvs# ipvsadm -a -t 192.168.1.100:31337 -r 192.168.1.12 -g -w 1

Check it with ipvsadm -L -n:

hog:~ # ipvsadm -L -n
IP Virtual Server version 0.9.14 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  192.168.1.100:31337 wrr
  -> 192.168.1.12:31337          Route   1      0          0
  -> 192.168.1.11:31337          Route   1      0          0
hog:~ #

Now if you connect from outside with the client node to the VIP=192.168.1.100 you should get to one of the two realserver (presumably to ~.12) Reconnect to the VIP again an you should get to the other realserver. If so, be happy, if not go back, check netstat -an, ifconfig -a, arp-problem, routing tables and so on ...

29.4. How to bring down a realserver for maintenance (eg swap disks)

I want to use virtual server functionality to allow switching over from one pool of server processes to another without an interruption in service to clients.

Michael Sparks sparks (at) mcc (dot) ac (dot) uk

current realservers : A,B,C servers to swap into the system instead D,E,F

  • Add servers D,E,F into the system all with fairly high weights (perhaps ramping the weights up slowly so as not to hit them too hard:-)

  • Change the weights of servers A,B,C to 0.

  • All new traffic should now go to D,E,F

  • When the number of connections through A,B,C reaches 0, remove them from the service. This can take time I know but...

from Joe

A planned feature for ipvsadm will be to give a realserver a weight of 0 (now implemented). This realserver will not be sent any new connections and will continue serving its current connections till they close. You may have to wait a while if a user is downloading a 40M file from the realserver.

29.5. Howto turn your single node ftp/http server into an LVS without taking it off-line

e.g. if you want to test LVS on your BIG Sunserver and how to restore an LVS to a single node server again.

current ftp server:        standalone  A

planned LVS (using LVS-DR): realserver A
		           director    Z

Setup the LVS in the normal way with the director's VIP being a new IP for the network. The IP of the standalone server will now also be the IP for the realserver. You can access the realserver via the VIP while the outside users continue to connect to the original IP of A. When you are happy that the VIP gives the right service, change the DNS IP of your ftp site to the VIP. Over the next 24hrs as the new DNS information is propagated to the outside world, users will change over to the VIP to access the server.

To expand the number of servers (to A, B,...), add another server with duplicated files, add an extra entry into the director's tables with ipvsadm.

To restore - in your DNS, change the IP for the service to the realserver IP. When no-one is accessing the VIP anymore, unplug the director.

29.6. shutdown of LVS

You can't shutdown an LVS. However you can stop it forwarding by clearing the ipvsadm table (ipvsadm -C), then allow all connections to expire (check the active connections with ipvsadm) and then remove the ipvs modules (rmmod). Since ip_vs.o requires ip_vs_rr.o etc, you'll have to remove ip_vs_rr.o first.

Do you know how to shutdown LVS? I tried rmmod but it keeps saying that the device is busy.

Kjetil Torgrim Homme kjetilho (at) linpro (dot) no 18 Aug 2001

Run ipvsadm -C. You also need to remove the module(s) for the balancing algorithm(s) before rmmod ip_vs. Run lsmod to see which modules these are.

Roy Walker Roy (dot) Walker (at) GEZWM (dot) com 18 Mar 2002 could not cleanly shutdown his director (LVS 1.0, 2.4.18) which hung at "Send TERM signal". The suggested cure, was to bring down the LVS first (we haven't heard back if it works).

29.7. Other projects like LVS - Beowulf

The difference between a beowulf and an LVS:

The Beowulf project has to do with processor clustering over a network -- parallel computing... Basically putting 64 nodes up and running that all are a part of a collective of resources. Like SMP -- but between a whole bunch of machines with a fast ethernet as a backplane.

LVS, however, is about load-balancing on a network. Someone puts up a load balancer in front of a cluster of servers. Each one of those servers is independent and knows nothing about the rest of the servers in the farm. All requests for services go to the load balancer first. That load balancer then distributes requests to each server. Those servers respond as if the request came straight to them in the first place. So -- with the more servers one adds -- the less load goes to each server.

A person might go to a web site that is load balanced, and their requests would be balanced between four different machines. (Or perhaps all of their requests would go to one machine, and the next person's request would go to another machine)

However, a person who used a Beowulf system would actually be using one processing collaborative that was made up of multiple computers...

I know that's not the best explanation of each, and I apologize for that, but I hope it at least starts to make things a little clearer. Both projects could be expanded on to a great extent, but that might just confuse things farther.

(Joe) -

both use several (or a lot of) nodes.

A beowulf is a collection of nodes working on a single computation. The computation is broken into small pieces and passed to a node, which replies with the result. Eventually the whole computation is done. THe beowulf usually has a single user and the computations can run for weeks.

An LVS is a group of machines offering a service to a client. A dispatcher connects the client to a particular server for the request. When the request is completed, the dispatcher removes the connection between the client and server. The next request from the same client may go to a different server but the client cannot tell which server it has connected to. The connection between client and server may only be seconds long

from a posting to the beowulf mailing list by Alan Heirich -

Thomas Sterling and Donald Becker made "Beowulf" a registered service mark with specific requirements for use:

-- Beowulf is a cluster
-- the cluster runs Linux
-- the O/S and driver software are open source
-- the CPU is multiple sourced (currently, Intel and Alpha)

I assume they did this to prevent profit-hungry vendors from abusing this term; can't you just imagine Micro$oft pushing a "Beowulf" NT-cluster?

(Joe - I looked up the Registered Service Marks on the internet and Beowulf is not one of them.)

(Wensong) Beowulf is for parallel computing, Linux Virtual Server is for scalable network services.

They are quite different now. However, I think they may be unified under "single system image" some day. In the "single system image", every node can see a single system image (the same memory space, the same process space, the same external storage), and the processes/threads can be transparently migrated to other nodes in order to achieve load balance in the cluster. All the processes are checkpointed, they can be restarted in the node or the others if they fails, full fault tolerant can be made here. It will be easy for programmers to code because of single space, they don't need to statically partition jobs to different sites and let them communicate through PVM or MPI. They just need identify the parallelism of his scientific application, and fork the processes or generate threads, because processes/threads will be automatically load balanced on different nodes. For network services, the service daemons just need to fork the processes or generates threads, it is quite simple. I think it needs lots of investigation in how to implement these mechanisms and make the overhead as low as possible.

What Linux Virtual Server has done is very simple, Single IP Address, in which parallel services on different nodes is appeared as a virtual service on a single IP address. The different nodes have their own space, it is far from "single system image". It means that we have a long way to run. :)

29.8. Projects like LVS - Eddie

Eddie http://www.eddieware.org

(Jacek Kujawa blady (at) cnt (dot) pl) Eddie is a load balancing software, using NAT (only NAT), for webservers, written in language erlang. Eddie include intelligent HTTP gateway and Enhanced DNS.

(Joe) Erlang is a language for writing distrubuted applications.

29.9. Recommendations for a redundant file system, RAID

Shain Miley 4 Jun 2001

any recommendations for Level 5 SCSI RAID?

Matthew S. Crocker matthew (at) crocker (dot) com 04 Jun 2001

I have had very good luck with Mylex. We use the DAC960 which is a bit old now but if the newer stuff works as well as what I have I would highly recommend it. You might also want to think about putting your data on a NAS and seperate your CPU from your harddrives

Don Hinshaw dwh (at) openrecording (dot) com 04 Jun 2001

Mylex work well. I use ICP-Vortex (http://www.icp-vortex.com/index_e.html, link dead Jan 2003) which are supported by the Linux kernel. I've also had good luck with Adaptec 3200s and 3400si.

29.10. Thundering herd problem, when down machine(s) come on line

(now handled by code added to the scheduler)

From: Christopher Seawood cls (at) aureate (dot) com

LVS seems to work great until a server goes down (this is where mon comes in). Here's a couple of things to keep in mind. If you're using the Weighted Round-Robin scheduler, then LVS will still attempt to hit the server once it goes down. If you're using the Least Connections scheduler, then all new connections will be directed to the down server because it has 0 connections. You'd think using mon would fix these problem but not in all cases.

Adding mon to the LC setup didn't help matters much. I took one of three servers out of the loop and waited for mon to drop the entry. That worked great. When I started the server back up, mon added the entry. During that time, the 2 running servers had gathered about 1000 connections apiece. When the third server came back up, it immediately received all of the new connections. It kept receiving all of the connections until it had an equal number of connections with the other servers (which by this time...a minute or so later...had fallen to ~700). By this time, the 3rd server had been restarted after due to triggering a high load sensor also monitoring the machine (a necessary evil or so I'm told). At this point, I dropped back to using WRR as I could envision the cycle repeating itself indefinitely.

29.11. on the need for extended testing

(this must have been solved, no-one is complaining about memory leaks now :-)

Jerry Glomph Black black (at) real (dot) com

We have successfully used 2.0.36-vs (direct routing method), but it does fail at extremely high loads. Seems like a cumulative effect, after about a billion or so packets forwarded. Some kind of kernel memory leak, I'd guess.

29.12. loopback on Solaris

Chris Kennedy ckennedy (at) iland (dot) net

The thing I have found out is that on Solaris 2.6, and probably other versions of Solaris, you have to to some magic to get the loopback alias setup. You must run the following commands one at a time:

ifconfig lo0:1 <VIP>
ifconfig lo0:1 <VIP> <VIP>
ifconfig lo0:1 netmask 255.255.255.255
ifconfig lo0:1 up

Which works well and is actually a pointopoint link like ppp which must be the way Solaris defines aliases to the lo interface. It will not let you do this all at once, just each step at a time or you have to start over from scratch on the interface.

Ramon Kagan rkagan (at) YorkU (dot) ca 05 Jun 2002

Just in case anybody is interested. You can do the following on lo0:1 or for paranoid people like me hme1.

ifconfig <intfc> plumb
ifconfig <intfc> <VIP>
ifconfig <intfc> <VIP> <VIP>
ifconfig <intfc> netmask 255.255.255.255
ifconfig <intfc> up

This is from the FAQ but I'm adding that this doesn't have to be on lo0.

29.13. Running clients (eg telnet) on realservers

There are two types of clients on realservers from the point of view of LVS.

  • Clients which have src_addr=RIP (eg telnet run from the command line). These are simpler to handle.

  • Clients which need to have src_addr=VIP (but call from RIP). These are usually call-backs from the LVS'ed service to a demon on the LVS client. Handling these is somewhat problematic. The instances that we know about of this.

Both types of clients require the same understanding of LVS, but because the first case is simple, it is discussed here. The second case has all sorts of ramifications for LVS and for that reason is discussed in the sedtion on authd/identd.

You might have valid reasons for running clients on realservers, e.g. so that the sysadmin could telnet to a remote site. The way to allow clients on the realservers to connect to outside servers is to configure these requests so that they are independant of the LVS setup (you do have to use the network and default gw set by the LVS).

One solution is to NAT the client requests. If the clients on the realservers are required for the LVS'ed service (e.g. on a squid realserver, clients have to connect to 0/0:80), then the RIP should be a public IP - see the section on 3 Tier LVS LVSs.

29.13.1. client requests from realservers in a LVS-NAT LVS

This is simple

  • the director is already the default gw for the realserver (a requirement for NAT).

  • each realserver is replying to LVS packets with its RIP, which is unique (there is no VIP on the realservers with LVS-NAT). NAT'ed client requests will return to the correct realserver.

Here's the command to run on a 2.2.x director to allow realserver1 to telnet to the outside world.

director:# ipchains -A forward -p tcp -j MASQ -s realserver1 telnet -d 0.0.0.0/0

You may have to turn off icmp redirects, if you have a one network LVS-NAT.

director: #echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
director: #echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects

After running this command you can telnet from the realservers. You can do this even if telnet is an LVS'ed service, since the telnet client and demon operate independantly of each other. You can use NAT the rshd and identd clients in the same way (replace telnet with rsh/identd and clients on the realserver can connect to their demons on outside machines).

29.13.2. client requests from realservers in LVS-DR or LVS-Tun LVS's

In general this has not been solved. Calls initiated by the identd client on a realserver will come from the VIP, not the RIP. Some hare-brained schemes have been tried but did not work (NAT'ing out the request from the VIP, so that it emerges from the realserver with src_addr=RIP and then NAT'ing the packet again on the director, so it emerges with src_addr=VIP).

There are specific solutions

In LVS-DR/VS-Tun, if the client and RIP are on the same network. Usually the RIP's on LVS-DR realservers are private addresses. However if the LVS clients and the LVS are all local and on the same network, this will work.

Clients not associated with the LVS'ed services (ie telnet even if telnetd is LVSed, but not authd or rshd) can still be NAT'ed out, since the connect request will come from the RIP and not the VIP. Since the default gw for the realserver in LVS-DR is not the director, you can handle this 2 ways

  • do the NAT'ing on the default gw box (you may not have access to this machine)

  • make the director the default gw for packets from the RIP (see setting up NAT for clients on LVS-DR).

29.14. Bringing down aliased devices

Note

This is no longer a problem if you use the new Policy Routing.

(without bringing them all down)

Problem: if down/delete an aliased device (eg eth0:1) you also bring down the other eth0 devices. This means that you can't bring down an alias remotely as you loose your connection (eth0) to that machine. You then have to go the console of the remote machine to fix it by rmmod'ing the device driver for the device and bring it up again.

The configure script handles this for you and will exit (with instructions on what to do next) if it finds that an aliased device needs to be removed by rmmod'ing the module for the NIC.

(I'm not sure that all of the following is accurate, please test yourself first).

(Stephen D. WIlliams sdw (at) lig (dot) net) whenever you want to down/delete an alias, first set its netmask to 255.255.255.255. This avoids also automatically downing aliases that are on the same netmask and are considered 'secondaries' by the kernel.

(Joe) To bring up an aliased device

$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.0

to bring eth0:1 down without taking out eth0, you do it in 2 steps, first change the netmask

$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.255

then down it

$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.255 down

then eth0 device should be unaffected, but the eth0:1 device will be gone.

This works on one of my machines but not on another (both with 2.2.13 kernels). I will have to look into this. Here's the output from the machine for which this procedure doesn't work.

Examples: Starting setup. The realserver's regular IP/24 on eth0, the VIP/32 on eth0:1 and another IP/24 for illustration on eth0:2. Machine is SMP 2.2.13 net-tools 1.49

chuck:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:6071219 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6317319 errors:0 dropped:0 overruns:4 carrier:0
          collisions:757453 txqueuelen:100
          Interrupt:18 Base address:0x6000

eth0:1    Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.110  Bcast:192.168.1.110  Mask:255.255.255.255
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x6000

eth0:2    Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.240  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x6000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

chuck:~# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.1.110   0.0.0.0         255.255.255.255 UH        0 0          0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 eth0

Deleting eth0:1 with netmask /32

chuck:~# ifconfig eth0:1 192.168.1.110 netmask 255.255.255.255 down
chuck:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:6071230 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6317335 errors:0 dropped:0 overruns:4 carrier:0
          collisions:757453 txqueuelen:100
          Interrupt:18 Base address:0x6000

eth0:2    Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.240  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x6000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0


If you do the same thing with eth0:2 with the /24 netmask
			</para><para>
chuck:~# ifconfig eth0:2 192.168.1.240 netmask 255.255.255.0 down
chuck:~# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:90:27:71:46:B1
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:6071237 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6317343 errors:0 dropped:0 overruns:4 carrier:0
          collisions:757453 txqueuelen:100
          Interrupt:18 Base address:0x6000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:299 errors:0 dropped:0 overruns:0 frame:0
          TX packets:299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

tunl0     Link encap:IPIP Tunnel  HWaddr
          unspec addr:[NONE SET]  Mask:[NONE SET]
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

29.15. Multiple IPs on the Director

Michael Sparks

It's useful for the director to have 3 IP addresses. One which is the real machines base IP address, one which is the virtual service IP address, and then another virtual IP address for servicing the director. The reason for this is associated with director failover.

Suppose:

  • X realservers pinging director on real IP A (assume a heartbeat style monitor) serving pages off virtual IP V. (IP A would be in place of hostip above)

  • Director on IP A fails, backup director (*) on IP B comes online taking over the virtual IP V. By not taking over IP A, IP B can watch for IP A to come back online via the network, rather than via a serial link (etc).

  • Problem is the realservers are still sending to IP A for the heartbeat code to be valid on IP B, the realservers need to send their pings to IP B instead. IMO the easiest solution is to allocate a we need a "heartbeat"/monitor virtual IP. (this is the vhostip)

29.16. Testimonials

This isn't particularly inclusive. We don't pester people for testimonials as we don't want to scare people from posting to the mailing list and we don't want inflated praise. People seem to understand this and don't pester us with their performance data either. The quotes below aren't scientific data, but it is nice to hear. The people who don't like LVS presumably go somewhere else, and we don't hear any complaints from them.

"Daniel Erdös" 2 Feb 2000

How many connections did you really handled? What are your impressions and experiences in "real life"? What are the problems?

Michael Sparks zathras (at) epsilon3 (dot) mcc (dot) ac (dot) uk

Problems - LVS provides a load balancing mechanism, nothing more, nothing less, and does it *extremely* well. If your back end real servers are flakey in anyway, then unless you have monitoring systems in place to take those machines out of service as soon as there are problems with those servers, then users will experience glitches in service.

NB, this is essentially a real server stability issue, not an LVS issue - you'd need good monitoring in place anyway if you weren't using LVS!

Another plus in LVS's favour in something like this over the commercial boxes, is the fact that the load balancer is a Unix type box - meaning your monitoring can be as complex or simple as you like. For example load balancing based on wlc could be supplemented by server info sent to the director.

Drew Streib ds (at) varesearch (dot) com 23 Mar 2000

I can vouch for all sorts of good performance from lvs. I've had single processor boxes handle thousands of simultaneous connections without problems, and yes, the 50,000 connections per second number from the VA cluster is true.

lvs powers SourceForge.net, Linux.com, Themes.org, and VALinux.com. SourceForge uses a single lvs server to support 22 machines, multiple types of load balancing, and an average 25Mbit/sec traffic. With 60Mbit/sec of traffic flowing through the director (and more than 1000 concurrent connections), the box was having no problems whatsoever, and in fact was using very little cpu.

Using DR mode, I've sent request traffic to an director box resulting in near gigabit traffic from the real servers. (Request traffic was on the order of 40Mbit.)

I can say without a doubt that lvs toasts F5/BigIP solutions, at least in our real world implementations. I wouldn't trade a good lvs box for a Cisco Local Director either.

The 50,000 figure is unsubstantiated and was _not_ claimed by anyone at VA Linux Systems. A cluster with 16 apache servers and 2 LVS servers in a was configured for Linux World New York but due to interconnect problems the performance was never measured - we weren't happy with the throughput of the NICs so there didn't seem to be a lot of point. This problem has been resolved and there should be an opportunity to test this again soon.

In recent tests, I've taken multinode clusters to tens of thousands of connections per second. Sorry for any confusion here. The exact 50,000 number from LWCE NY is unsubstantiated.

Jerry Glomph Black black (at) real (dot) com 23 Mar 2000

We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine.

Now, at more terrestrial, but quite high real-world loads, the systems run just fine, for months on end. (using the weighted-least-connection algorithm, usually).

We tried virtually all of the commercial load balancers, LVS beats them all for reliability, cost, manageability, you-name-it.

29.17. Transport Layer Security(TLS)

Noma wrote Nov 2000

Are you going to implement TLS(Transport Layer Security) Ver1.0 on LVS?

Wensong

I haven't read the TLS protocol, so don't know if the TLS transmits IP address and/or port number in payload. In most cases, it should not, because SSL doesn't.

If it doesn't, you can use either of three VS/NAT, LVS-Tun and LVS-DR methods. If it does, LVS-Tun and LVS-DR can still work.

Ted Pavlic tpavlic (at) netwalk (dot) com, Nov 2000

I don't see any reason why LVS would have any bearing on TLS. As far as LVS was concerned, TLS connections would just be like any other connections.

Perhaps you are referring to HTTPS over TLS? Such a protocol has not been completed yet in general, and when it does it still will not need any extra work to be done in the LVS code.

The whole point of TLS is that one connects to the same port as usual and then "upgrades" to a higher level of security on that port. All the secure logic happens at a level so high that LVS wouldn't even notice a change. Things would still work as usual.

Julian Anastasov ja (at) ssi (dot) bg

This is an end-to-end protocol layered on another transport protocol. I'm not a TLS expert but as I understand TLS 1.0 is handled just like the SSL 3.0 and 2.0 are handled, i.e. they require only a support for persistent connections.

29.18. Setting up a hot spare server

Mark Miller markm (at) cravetechnology (dot) com 09 May 2001

We want a configuration where two Solaris based web servers will be setup in a primary and secondary configuration. Rather than load balancing between the two we really want the secondary to act as a hot spare for the primary.

Here is a quick diagram to help illustrate this question:

                  Internet		LD1,LD2 - Linux 2.4 kernel
                      |			RS1,RS2 - Solaris
                   Router
                      |
               -------+-------
               |             |
             -----         -----
             |LD1|         |LD2|
             -----         -----
               |             |
               -------+-------
                      |
                    Switch
                      |
               ---------------
               |             |
             -----         -----
             |RS1|         |RS1|
             -----         -----

Paul Baker pbaker (at) where2getit (dot) com 09 May 2001

Just use heartbeat on the two firewall machines and heartbeat on the two solaris machines.

Horms horms (at) vergenet (dot) net 09 May 2001

You can either add and remove servers from the virtual service (using ipvsadm) or toggle the weights of the servers from zero to non-zero values.

Alexandre Cassen alexandre (dot) cassen (at) canal-plus (dot) com 10 May 2001

For your 2 LDs you need to run a Hot standby protocol. Hearthbeat can be used, you can also use vrrp or hsrp. I am actually working on the IPSEC AH implementation for vrrp. That kind of protocol can be usefull because your LD backup server can be used even if it is in backup state (you simply create 2 LDs VIP and set default gateway of your serveur pool half on LD1 and half on LD2).

For your webserver hot-spare needs, you can use the next keepalived in which there will be "sorry server" facility. This mean exactly what you need => You have a RS server pool, if all the server of this RS server pool are down then the sorry server is placed into the ipvsadm table automaticaly. If you use keepalived keep in mind that you will use NAT topology.

Joe 11 May 2001

Unless there's something else going on that I don't know about, I expect this isn't a great idea. The hot spare is going to degrade (depreciate, disk wear out - although not quite as fast, software need upgrading) just as fast idle as doing work.

You may as well have both working all the time and for the few hours of down time a year that you'll need for planned maintenance, you can make do with one machine. If you only need the capacity of 1 machine, then you can use two smaller machines instead.

29.19. An LVS of LVSs

Since an LVS obeys unix client/server semantics, an LVS can replace a realserver (at least in principle, no-one has done this yet). Each LVS layer could have its own forwarding method, independantly of the other LVSs. The LVS of LVSs would look like this, with realserver_3 being in fact the director of another LVS and having no services running on it.

                        ________
                       |        |
                       | client |
                       |________|
			   |
                           |
                        (router)
                           |
			   |
                           |       ____________
                           |  DIP |            |
                           |------| director_1 |
                           |  VIP |____________|
                           |
                           |
                           |
         ------------------------------------
         |                 |                |
         |                 |                |
     RIP1, VIP         RIP2, VIP        RIP3, VIP
   ______________    ______________    _____________
  |              |  |              |  |             |
  | realserver1  |  | realserver2  |  | realserver3 |
  |              |  |              |  | =director_2 |
  |______________|  |______________|  |_____________|
                                            |
                                            |
         ------------------------------------
         |                 |                |
         |                 |                |
     RIP4, VIP         RIP5, VIP        RIP6, VIP
   ______________    ______________    ______________
  |              |  |              |  |              |
  | realserver4  |  | realserver5  |  | realserver6  |
  |              |  |              |  |              |
  |______________|  |______________|  |______________|

If all realservers were offering http and only realservers1..4 were offering ftp, then you would (presumably) setup the directors with the following weights for each service:

  • director_1: realserver1 http,ftp=1; realserver2 http,ftp=1;realserver3 http=3,ftp=1

  • director_2: realserver4 http,ftp=1; realserver5 http=1 (no ftp);realserver3 http=1 (no ftp)

You might want to do this if realservers4..6 were on a different network (i.e. geographically remote). In this case director_1 would be forwarding by LVS-Tun, while director_2 could use any forwarding method.

29.19.1. An LVS of LVSs: using Windows/Solaris machines with LVS-Tun

This is the sort of ideas we were having in the early days. It turns out that not many people are using LVS-Tun, most people are using Linux realservers, and not many people are using geographically distributed LVSs.

Joe, Jun 99

For the forseeable future many of the servers who could benefit from the LVS will be microsoft or solaris. The problem is that they don't have tunneling. A solution would be to have a linux box in front of each real server on the link from the director to the real server. The linux box appears to be the server to the director (it has the real IP eg 192.168.1.2) but does not have the VIP (eg 192.168.1.110). The linux box decapsulates the packet from the director and now has a packet from the client to the VIP. Can the linux box route this packet to the real server (presumably to an lo device on the real server)?

The linux box could be a diskless 486 machine booting off a floppy with a patched kernel, like the machines in the Linux router project.

Wensong 29 Jun 1999

We can use nested (hyprid) LinuxDirector approach. For example,

    LVS-Tun   ---->   LVS-NAT ---->  RealServer1
         |                 |       ...
         |                 ----->  RealServer2
         |
         |           ....
         |
         |
         -------->   LVS-NAT  ....

Real Servers can run any OS. A LVS-NAT load balancer usually can schedule over 10 general servers. And, these LVS-NATs can be geographically distributed.

By the way, LinuxDirector in kernel 2.2 can use LVS-NAT, VS-TUN and LVS-DR together for servers in a single configuration.

29.20. Connecting from clients through multiple parallel links: the dead gateway problem

Note

This is not an LVS problem, just a normal routing problem. You can have multiple default gateways in Linux. The problem is knowing when one of them has died.

Logu lvslog (at) yahoo (dot) com 5 Oct

I have two isdn internet connection from two different isps. I am going to put an lvs_nat between the users and these two links so as to loadbalace the bandwidth.

Julian

You can use the Linux's multipath feature:

# ip ru
0:      from all lookup local
50:     from all lookup main
...
100:    from 192.168.0.0/24 lookup 100
200:    from all lookup 200
32766:  from all lookup main
32767:  from all lookup 253

# ip r l t 100
default  src DUMMY_IP
	nexthop via ISP1  dev DEV1 weight 1
	nexthop via ISP2  dev DEV2 weight 1

# ip r l t 200
default via ISP1 dev DEV1  src MY_IP1
default via ISP2 dev DEV2  src MY_IP2

You can add my dead gateway detection extension (for now only against 2.2)

This way you will be able fully to utilize the both lines for masquerading. Without this patch you will not be able to select different public IPs to each ISP. They are named "Alternative routes". Of course, in any case the management is not an easy task. It needs understanding.

anon

I currently have multiple adsl modems that connects to the internet.

Alexandre Cassen alexandre (dot) cassen (at) wanadoo (dot) fr 11 Apr 2003

This is a routing design problem, commonly accomplished done by loadbalancing default route at the routing level (netlink). You add 2 default gateway with the same weight to provide outbound loadbalancing. Since current linux kernel routing suffer lake of dead gateway detection, you will need to apply Julian's "dead gateway detection" patch.

29.21. LVS on a Linux/IBM mainframe

Kyle Sparger ksparger (at) dialtoneinternet (dot) net 18 Sep 2001

I'm familiar with the s/390; the zSeries 900 will be similar, but on a 'next-gen' scale -- It's 64 bit and I expect 2-3 times the maximum capacity.

  • The s/390 is ONLY, at most, a 12-way machine in a single frame, 24-way in a two-frame configuration. The CPU's are not super-powered; they're normal CPU's, so imagine a normal 12-24 way, and you have a good idea. It does have special crypto-processors built in, if you can find a way to use them.

  • The s/390, however, has an obnoxiously fast bus -- 24GByte/s. Yes, I did mean gigabytes. Also, I/O takes up almost no CPU time, as the machines have sub-processors to take care of it.

  • The s/390 is a 31bit machine -- yes, 31. One bit defines whether the code is 16 or 31 bit code. The z/900 is a 64bit machine. Note that the s/390, afaik, suffers when attempting to access memory over a certain amount, like any 31/32 bit machine would -- 2 gigs can be addressed in a single clock cycle; greater than that takes longer to process, since it requires more than 32 bits to address.

  • From top to bottom, the entire machine is redundant. There is no single point of failure anywhere in the machine. According to IBM's docs, the MTBF is 30 years. It calls IBM when it's broken, and they come out and fix it. The refrigerator ad was no joke ;) Of course, this doesn't protect you from power outages, but interestingly enough, if I recall correctly, all RAM is either SRAM, or battery backed -- the machine will come back up and continue right where it left off when it lost power. No restarting instances or apps required. No data lost.

There are five premises for the cost-savings:

  • You don't have to design a redundant system -- it's already built in.

  • One machine is easier to manage than n number servers.

  • One machine uses less facilities than n number servers.

  • A single machine, split many ways, can result in higher utilization.

  • Linux, Linux, Linux. All the free software you can shake a stick at.

On the flip-side, there are some constraints:

  • If you have 500 servers, all at 80% CPU usage, there's no way you're going to cram them all onto the mainframe. Part of the premise is that most servers sit at only a fraction of their maximum capacity.

  • The software must be architecture compatible.

  • Mainframe administrators and programmers are rare and expensive.

The ideal situation for an s/390 or z/Series is an application which is not very CPU intensive, but is highly I/O intensive, that must _NEVER_ go down. Could that be why many companies do databases on them? Think airline ticketing systems, financial systems, inventory, etc :) Realize, however, that your cost of entry is probably going to be well over a million dollars, unless you want a crippled entry-level box. You probably don't want to buy this server to run your web site. You probably want to buy it to run your database. That being said, if you happen to order more than you really need -- a reasonably common phenomenon in IT shops -- you can now run Linux instances with that extra capacity. :)

29.22. How do I check to see if my kernel has the ip-vs patch installed?

Short answer: If you need the HOWTO, you shouldn't be using other people's kernels - go compile up an ipvs patched kernel yourself.

Long answer: If you have the kernel binary, then you have a bit of a job ahead of you. If you've compiled it yourself, then you should give it a name like

bzImage-0.9.3-2.4.9-module-forward-shared

to remind you of what it is (here ip_vs 0.9.3 compiled as modules, kernel 2.4.9, forward-shared patch).

Otherwise

  • If ipvs is compiled into the kernel

    $ grep ip_vs_init System.map
    

  • if it's a recent kernel and ip_vs is compiled as a module, running ipvsadm will load the module for you. From there check the loaded modules with lsmod.

  • for older kernels you need to load the module before running ipvsadm. If the kernel doesn't have the ip_vs patch, the module probably won't load.

29.23. Running a test LVS (director, backup director and realservers) on one box

Can I load both the ipvs code and the failover code in a single stand alone machine?

Joe 09 Jul 2001

VMWare?

Henrik Nordstrom hno (at) marasystems (dot) com

user-mode-linux works beautifully for simulating a network of Linux boxes on a single CPU. Use it extensively when hacking on netfilter/iptables, or when testing our patches on new kernels and/or ipvs versions. Also has the added benefit that you can run the kernel under full control of gdb, which greatly simplifies tracking kernel bugs down if you get down to kernel hacking.

Joe

I attended a talk by the UML author at OLS 2001. It's pretty smart software. You can have virtual CPUs, NICs... - you can have a virtual 64-way SMP machine running on your 75MHz pentium I. The performance will be terrible, but you can at least test your application on it.

29.24. mqseries

The LVS worked for a client connected directly to the director, but not from a client on the internet.

Carlos J. Ramos cjramos (at) genasys (dot) es 12 Mar 2002

Now, it seems to be solved by using static routes to hosts instead of using static routes to networks.

There is also another important note. Directors uses MQSeries from IBM, the starting sequence in haresources was mqseries masq.lvs (script for NAT), it looks that the 1 minute needed by mqseries to get up was confusing(!?) masq.lvs or ldirectord. We have just change the order to get up mqseries and masq.lvs, rising up first masq.lvs and finally mqseries.

With these two changes it works perfectly.

29.25. LVS log files

Chris Ruegger

Does LVS maintain a log file or can I configure it to use one so I can see a history of the requests that came in and how it forwarded them?

Joe 1 Apr 2002

It doesn't but it could. LVS does make statistics available.

Another question is whether logging is a good idea. The director is a router with slightly different rules than a regular router. It is designed to handle 1000's requests/sec and operate with no spinning media (eg on a flash card). There's no way you can log all connections to a disk and maintain throughput. You couldn't even review the contents of the logs. People do write filter rules, looking for likely problems and logging suspicious packets. Even reviewing those files overwhelmes most people.

Ratz 2 Apr 2002

LVS works on L4. Maybe the following command will make you happy:

echo 666 > /proc/sys/net/ipv4/vs/debug_level

29.26. LVS and linux vlan

Matt Stockdale

Does the current LVS code work in conjuction with the linux vlan code? We'd like to have a central load balancing device, that connects into our core switch w/ a dot1q trunk, and can have virtual interfaces on any of our many netblocks/vlans.

Benoit Gaussen bgaussen (at) fr (dot) colt (dot) net 20 Mar 2002

I tested it and it works. The only problem I encountered is a MTU problem with eepro100 driver and 8021q code. However there is a small patch on 8021q website. My config was linux 2.4.18/lvs 1.0.0 configured with LVS-NAT.

29.27. multi-home, multi-router LVS

Matthew S. Crocker matthew (at) crocker (dot) com 29 Oct 2002

I use LVS in a multi-homed, multi-router HSRP setup.

Each LVS is connected to a seperate switch Each Router is connected to each switch and my upstream providers. We use BGP4 to talk with our upstream providers. Routers use HSRP failover for an IP address that the LVS boxes use as a gateway address.

The LVS setup is pretty much a standard LVS-NAT install using keepalived. Each LVS has a default route pointing to an IP address which is a virtual IP and part of the HSRP router failover system.

The Routers are standard cisco 7500 series running BGP4 between themselves and my providers. They also run HSRP (Hot Swap Router Protocol) between their ethernet interfaces.

With my setup I can lose a link, a router, a switch or an LVS box and not go down.