Linux optimization

A friend owns a 2-processors PC. My PC has a sole processor. Both my friend's processors hop at 500 MHz. So my 1.5 GHz machine should be faster. Well it is not. When running a simple sole application, my computer is faster indeed. But when several softwares are launched altogether, his computer is significantly faster. For example when a big file is being copied and at the same time Xine is launched, my friend's computer reacts swiftly while mine gets really slumbery. It got me a little jealous. Driven by the Dark Side, I got my hands inside the gears and pulleys, to get my computer to react as fast as his. I'm quite happy with the results.

The readahead parameter
Kernel timeslice
The sound server
A remote SWAP partition
RAID partitions
File system formats




The readahead parameter


The readahead parameter was a good start for tuning trials. Yet on my current Linux SuSE 10.0 it didn't allow for much enhancements. Anyway it's worth trying and it is not too dangerous for your data files. On previous Linux releases it allowed for significant differences and it is quite easy to fiddle with. Just start your computer, open a terminal with root privilege (or use the su command) and type this (assuming your hard drive is /dev/hda):


# hdparm -a 1024 /dev/hda


(If hdparm isn't installed, just install it. I don't know what happens if you have a SCSI hard drive. Note hdparm allows for many hard drive optimizations, like -m, -c, -d or -X. Also note some computers or hard drives don't allow to change the readahead parameter.)

Then start a software, say Firefox, and time how many seconds it takes for the Firefox window to come up. On my computer it was 8 seconds.

Now restart the computer (in order to get a clean RAM cache)  and issue this command:


# hdparm -a 8 /dev/hda


If you sart Firefox, you'll see it takes significantly more time to start. 20 seconds on my computer. You'll get a comparable 3 times slowdown when copying big files.

So a little readahead value seems to slow down the system. Right. But when I start several softwares altogether with a little readahead configured, each one will react much faster. It's a matter of what you choose to optimize. Either you want sole programs to start fast, then you better set a high readahead value, or you want multiple programs to behave efficiently concurrently, then you need a little readahead value. A little value is for example what you need when using several multimedia software altogether. Ever started a long system configuration procedure and decided to write a letter to a friend while the system configuration continues in the background? On my computer, with a default readahead of 256, the typing is very slow. I may have to wait seconds till typed characters are displayed. Quite annoying. I configured a readahead of 32 and the typing got fluent... With no significant decrease in system configuration speed I believe.

Today common Linux systems seem to tune in a compromise value of 256. Perhaps the readahead value is the reason why a few years ago Debian Linux was so much faster than other Linux releases. The default readahead value amongst Linux releases was 8 while maybe Debian used a higher default value.

What's the readahead? Well, each time a file sector is read on the hard drive, the system will read some further sectors of that file, just in case. Indeed most often those further sectors will soon be asked for... The amount of sectors are given in the hdparm -a n /dev/hdx command. This allows for a gain sometimes. Actually a lot of times, since it allows a software to start up to three times faster! Now the problem is in some cases reading further sectors is pointless or even a loss of time. Suppose a server that handles a giant database file. It accesses random tiny parts of the database at a time. If each time a few bytes of the file are being accessed, a whole bunch of 1024 further sectors of the file are read, this represents a severe loss of performance. So, setting a low readahead on such a server can greatly enhance performance.

A readahead lower than 8 sectors means no readahead.

I am quite happy with a readahead of 32. It allows software to start at an acceptable speed (12 seconds for Firefox) and to get fast reactions when starting several softwares altogether. I imposed this at machine start by adding this command in a startup script (I choose to put it inside the /etc/inid.d/chechfs.sh, just below the leading # info lines). The -c and -X parameters are own to my machine. Yours will be different:


/sbin/hdparm -a 32 -c 1 -X udma5 /dev/hda


Some Linux releases like Mandrake/Mandriva and SuSE offer to tune these hdparm parameters in regular configuration files or using a GUI. See /etc/sysconfig on Mandrake/mandriva and use Yast on SuSE. Be careful with Mandrake, it didn't behave neatly when I tried this out a while ago.

If you whish to list your hard drive parameters, simply issue the hdparm command with no options:


# hdparm /dev/hda


This is the output on my computer:


/dev/hda:
 multcount    = 16 (on)
 IO_support   =  1 (32-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)


On my instlled Debian Sarge Linux release, both my DVD reader and CD writer are configured to use the SCSI IDE wrapper. That is by adding the kernel parameters  hdc=ide-scsi hdd=ide-scsi   inside my boot configuration file. So the DVD and CD writer become /dev/scd0 and /dev/scd1. In order to change their readahead parameters I need to access them as /dev/scd0 and /dev/scd1:


# hdparm -a 32 /dev/scd0
# hdparm -a 32 /dev/scd1


This is also true when using the eject command:


# eject /dev/scd0
# eject -t /dev/scd0


A detail: some Linux releases don't activate the DMA channel for the CD and DVD drives or even not for the har disks. You can get a significant speed boost (or a system crash) by activating the DMA, still using the hdparm command:


# hdparm -d 1 /dev/hda


A hint for newbies: type the command beneath to get a manual page on hdparm. You can also ask for  man:hdparm  in Konqueror or search on the Web.


# man hdparm




Kernel timeslice


I wasn't fully satisfied with the tuning of the readahead parameter. Whatever readahead I used, if a big file was being copied, starting Xine would still be significantly slower. This is irritating because every PC is supposed to use its DMA hardware to copy files. I mean: reading the file and writing a copy is mostly done by a hardware DMA chip on the motherboard. So the processor can focus on starting say Xine. But obviously the file copy creates a bottleneck. A bottleneck that seems not to arise on my friend's two-processors PC.

My feeling was my processor didn't have the opportunity to handle the DMA transfer efficiently. Sure the DMA performs the transfer, anyway it has to be controlled by the processor. The processor acts as a ruler. If it is busy doing something else, the transfer will temporarily come to a halt. Something obvious to try to better this situation was to decrease the processor time-slice.

What's a processor time slice? You have noticed your Linux system can perform several tasks altogether. Actually it doesn't really perform them altogether. Rather it performs one task during say 100 milliseconds, then it performs a second task during 100 milliseconds, then again the first task during 100 milliseconds... and so on. You get the feeling the two tasks are performed altogether while they aren't. It's each its turn, but fast enough you don't notice. (Some special events that need immediate attention can interrupt whatever is going on and get the processor to focus on them. This is out of the scope of this text.)


Old approach

What follows yielded good results a year ago. It's still worth reading but don't use it on recent Linux releases. It was useless on my brand new SuSE 10.0. Better use the "New approach" below.

So long it comes to a few simple tasks, a timeslice of a tenth of a second (100 millisecond) is fine. But when it comes to driving the hard drive and launching a multimedia software, 0.1 second is an eon. So I decreased the system timeslice to 0.001 seconds (1 millisecond). Also I changed the way the processor attributes timeslices. This brought a tremendous increase in the system reactivity.

How do you change the timeslice? You need to hack the kernel source and recompile the kernel. (If you aren't used to this, don't try with your regular Linux install or you will blow it up. Install a trial Linux in another partition and read through kernel compilation Howto's. Note you have to find a Howto adequate for your specific Linux release, kernel or boot loader.) The file thats needs to be changed inside the kernel source is kernel/sched.c. It contains these lines amongst others (kernel 2.6):


/*
 * These are the 'tuning knobs' of the scheduler:
 *
 * Minimum timeslice is 5 msecs (or 1 jiffy, whichever is larger),
 * default timeslice is 100 msecs, maximum timeslice is 800 msecs.
 * Timeslices get refilled after they expire.
 */
#define MIN_TIMESLICE          max(5 * HZ / 1000, 1)
#define DEF_TIMESLICE          (100 * HZ / 1000)
#define ON_RUNQUEUE_WEIGHT      30
#define CHILD_PENALTY           95
#define PARENT_PENALTY         100
#define EXIT_WEIGHT              3
#define PRIO_BONUS_RATIO        25
#define MAX_BONUS              (MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
#define INTERACTIVE_DELTA        2
#define MAX_SLEEP_AVG          (DEF_TIMESLICE * MAX_BONUS)
#define STARVATION_LIMIT       (MAX_SLEEP_AVG)
#define NS_MAX_SLEEP_AVG       (JIFFIES_TO_NS(MAX_SLEEP_AVG))


If changed them to this:


#define MIN_TIMESLICE          max(1 * HZ / 1000, 1)
#define DEF_TIMESLICE          (1 * HZ / 1000)
#define ON_RUNQUEUE_WEIGHT      1


The two first 1's mean the the timeslice will be 1 millisecond. I believe the third one means the kernel will tend to allocate each next timeslice to another software. So each software quickly gets its share of time.

I can't promise these tunings are best for your computer too. Maybe you'll have to tune in bigger time slices, like 5 for the first line and 20 for second line... I never got system crashes when fiddling with these parameters but my bicephal friend got Xine crashing when using 1 millisecond time slices. Actually I did once get the kernel hang on startup but I was trying to hack a timeslice lower than 1 millisecond... My current tuning is 5 for both.

The result of this tuning is amazing. My machine simply got sort of perfect. Now Firefox starts first-time in only 6 seconds and needs only 2 seconds on a second start. I can launch whatever softwares I want concurrently, including performing heavy file copies. Some softwares even don't seem to be slowed down by the other softwares. Sofwares gently share the processor time. Worst are some games like Tuxracer, whose politics is to drain all processor and graphic chip resources. The result is acceptable anyway, nevertheless I tend to impose my will upon Tuxracer to get a really neat behavior. For example if I want to play a DVD using Xine and play Tuxracer altogether, I launch Xine first (to prevent Tuxracer from seizing the graphic card) and I launch Tuxracer in a console, first imposing it a high NICE factor:


$ renice 20 $$
$ tuxracer


Of course I changed the Tuxracer configuration file to get it playing in a window (~/.tuxracer/options). Here's a screenshot (clic to enlarge) :





Yet another advantage is I get no more hangs when starting video sequences fromout Firefox, using the MPlayer plugin. Firefox and MPlayer form an effective tandem yet in some cases they used to hang. That problem too has gone. On my Debian Sarge system, with all available codecs and Realplay installed, I can view almost all video sequences on the Web. I never saw this universality on an Apple Macintosh or a Microsoft Windows system.

With a timeslice of 1 or 5 milliseconds the readahead parameter seems of less importance. The default value of 256 seems a good compromise. I choose 32 anyway, because it allows to type texts comfortably even under heavy system load.


New approach

Recent kernels can be tuned to compute their timeslicing at 1,000 Hz instead of the historic 100 Hz. On my current SuSE 10.0 the default frequency is 250 Hz. Curiously this is not optimal for multimedia and games. I tried 1,000 Hz and the result was catastrophic. I got the best results with 100 Hz! I suppose this is because 100 Hz is asumed by most hardware and software components currently. 1,000 Hz is deemed to become the standard but I keep 100 Hz till further evolutions.

What made a strong difference was to change the kernel preemption model. I asked for "Preemptible Kernel (Low-Latency Desktop)". The result is wonderful. The desktop, games and multimedia behaviour is a lot more fluid. The computer is supposed to be a little slower but it gives me the impression to be faster.

The picture below shows the window opened by "make xconfig" and the two changes being made (clic to enlarge):





If you know how to compile a kernel but you'd like a quick procedure on the SuSE 10.0 system, here is one. (Don't do this if you don't know what you are doing. Then ask for someone experienced, read HowTo's and make your first trials on a test machine. Note a complete kernel compilation needs hours on a common computer.)
That's all if your system uses the default GRUB boot loader.




The sound server


(I didn't need to do anything about the sound servers on my current SuSE 10.0.)

One thing I much enjoy with the timeslice trick above is the esd sound server now behaves neatly. What's a sound server? Historically, there was only one way to get sound out of a Linux box: the OSS sound drivers. That were Linux kernel modules capable to address lots of different kinds of sound cards. When a software wanted to emit sound, it simply had to open a channel towards the OSS system and tell what sound to emit. Later on came the ALSA sound drivers, that seem more powerful. The ALSA system is capable to emulate OSS. That means a software that wants to open OSS will be allowed to do so, it will really get the feeling it opens OSS, while in fact it opens ALSA. Other softwares directly open ALSA. The problem with OSS and ALSA is they can handle only one software at a time. Say Xine is playing a movie using ALSA for sound output, if you open a game, that game won't be allowed to emit any sound. The sound channel is allready in use... You can get sound from only one software at a time. Even worse: should Xine stop playing, the game won't get the sound channel. It has to be restarted. A sound server solves this problem. Many sound servers exist but the two current main ones are aRts and esound (meaning Enlightment Sound Daemon). aRts is KDE oriented while esd is Gnome oriented. If say artsd (the aRts daemon) is running and you launch several sound softwares that use aRts, they will be able to output sound altogether. ARts mixes their sounds cleverly and outputs a single stream towards the OSS or ALSA system. You can even make application not meant to use aRts, use aRts anyway. Simply start them using the artsdsp command. Like this:


$ artsdsp tuxracer


The same way, if esd is running (the esound daemon) you can use the esddsp command to make esd unaware software use it anyway. Actually those artsdsp and esddsp sound wrappers seldom give good results. Best is to configure the softwares to use aRts or esound. Or simply trust the software.

Some softwares know only about aRts, others only about esound. The solution to this is to launch both artsd and esd and configure arts to use esound for output. ARts can be configured fromout the KDE control panel. Best configure aRts to suspend its seisure of the sound output after 1 second inactivity. In order to start esd before artsd I put the line below inside a startup script. On my Debian  Sarge I choose the /etc/init.d/gdm script. I put the line just below the leading # lines. Note the trailing & is mandatory or your system will hang at startup. The -nobeeps parameter means the esd daemon musn't emit a few tones when starting. Maybe best not use that parameter for your first trials, in order to hear the esd daemon is starting. The -as 1 parameter means the esd daemon must release the access to the sound card drivers after 1 second inactivity. This allows some asocial softwares like Real Player to emit sound anyway, yet then preventing all other sound softwares:


/usr/bin/esd -as 1 -nobeeps &


So, I created a chain of servers and drivers towards the sound chip of my mother board. ARts serves as a funnel for KDE-oriented software. Esound serves as a funnel for aRts and for Gnome-oriented softwares, feeding the whole to the ALSA sound drivers. Actually I'm not sure esound uses the ALSA interface. Maybe it uses the fake OSS interface towards ALSA... I a software can be configured for its sound output, I make it output to esd, for example Xine.

Note most sofwares need to be compiled with the ability to use these different sound output methods. For example I have the luck with Debian Sarge that the aRts system has been compiled with the ability to output to esound. On other Linux releases this is not the case; aRts is only capable to output directly to OSS or ALSA.




A remote SWAP partition


Yet another trick my friend uses is to put the SWAP partition on another hard drive. My computer has 512 MB RAM and is mostly used for office software, so it almost never uses its SWAP partition. But when the amount of RAM is low compared to the needs of the softwares used, a good practice is to put the SWAP partition on another hard drive than the main system partition. This allows much faster swap operations while software are being loaded. Both hard drives are being used altogether. Say your main Linux partition is /dev/hda2, well you can use a partition on another hard disk for swap, say /dev/hdc7.




RAID partitions


The principle of RAID 0 is you use two physical partitions, each in a different hard drive. These two physical partitions, say /dev/hda7 and /dev/hdb3, form a single "logical" RAID partition, say /dev/md1. Files are split over the two partitions. When files are read or written, the two hard drives are used in parallel, which almost halves the time.

When you look at hard drive spec's, you see the data transfer rate is of say 100 or 133 MB/s. Yet the actual rate at which the data can be read or written from the hard drives heads is about 40 MB/s. So the 133 MB/s the IDE cable and chipset allow aren't much useful. Except if you set up RAID partitions... Then both drives can be read at 40 MB/s, which makes a data transfer rate of 80 MB/s. RAID allows more than two physical partitions to form a logical partition, so even higher transfer rates should be possible.

I long thought RAID logical partitions need special hardware interfaces and dedicated hard drives. Actually I could set up a RAID logical partition using bare simple IDE drives, latched through their standard IDE cables to a common motherboard. Currently I don't use the RAID 0 scheme. Rather I installed a RAID 1 logical drive. The principle of RAID 1 is the data is duplicated on both physical partitions. So, should one hard drive break down, the data will still be present on the other partition. This is very comfortable. I put all my personal data, like my Thunderbird mail directory and my software development directory, on that RAID 1 logical partition. That way I get a real-time backup of my data. I still do perform regular backups of course. But I feel a lot more comfortable when writing texts and the like.

(Maybe best use a text or graphical user interface to make RAID configurations, instead of the procedures listed below.)

This is the procedure I used to set up a logical RAID 1 partition. Please note it is now partly outdated (more below):

I have two IDE hard drives installed. One /dev/hda of 80 GB and a /dev/hdc of 873 MB. Using the fdisk command I created a /dev/hda9 partition of a little more than 873 MB. The whole /dev/hdc drive became one sole /dev/hdc1 partition of 873 MB.

Each two partitions must be typed as being a RAID partition. That is type "fd". This for example is the output of the p command when applying fdisk to /dev/hdc. You can see /dev/hdc1 is neither a "SWAP" nor a "Linux" partition, rather it is a "Linux RAID autodetect" partition. Don't forget to type the two partitions that way or the RAID system won't operate:


Command (m for help): p

Disk /dev/hdc: 853 MB, 853622784 bytes
16 heads, 63 sectors/track, 1654 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1   *           1        1654      833584+  fd  Linux raid autodetect


Next, an /etc/raidtab configuration file must be created, for example with this content. It explains /dev/md0 is made out of /dev/hdc1 and /dev/hda9. The "raid-level" parameter of 1 means RAID 1. I don't know what the "persistent-superblock" and "chunk-size" parameters mean but their values of 1 and 8 should fit any setup:


#
# sample raiddev configuration file
#
raiddev /dev/md0
    raid-level              1
    nr-raid-disks           2
    persistent-superblock   1
    chunk-size              8

    device                  /dev/hdc1
    raid-disk               0
    device                  /dev/hda9
    raid-disk               1


Then, the RAID logical drive must be constructed, using this command:


# mkraid /dev/md0


The mkraid command will read the /etc/raidtab file and RAID-format the two /dev/hda9 and /dev/hdc1 physical partitions. Their union becomes the /dev/md0 logical partition. (Use the -f option to force the formating.)

Now the RAID 1 /dev/md0 logical partition itself is on. It is supposed to be used exactly like any common partition would, like /dev/hda1 or /dev/hda2...

An entry in /etc/fstab must be created for that /dev/md0 partition. Also you must create a mount point (/media/md0 in my case):


/dev/md0        /media/md0      ext3    rw,users,auto   0       2


Just like any partition, the /dev/md0 partition must be formated in order to be able to contain files. Any standard format will fit. I will use XFS nest time but this time I used Extended 3:


# mkfs.ext3 /dev/md0


To end with, mount the partiton and make it open to users:


# mount /media/md0
# chmod a+rwx /media/md0


That's it. Now, everytime the Linux system starts, the /etc/raidtab file will be read and the /dev/md0 logical partition will be established automatically. Then the system will read /etc/fstab and mount /dev/md0 as /media/md0, just like it does for all regular partitions. That's all automatic; no commands to type. Once the system is started you can access the logical partition content through /media/md0.

My first approach was I moved the sensible directories, like .mozilla-thunderbird, inside the /media/md0 partition. And I made links towards them inside my home directory. Currently, as a friend gave me a second and big hard disk (Frédéric Cloth, thanks to him), I set up a whole /root partition as a 8 GB RAID 1 partition.

I made a trial setting up the /usr partition as a RAID 0 partition. The Debian Sarge install CD allows to set such things up from install on. The result was impressive. Big softwares start nearly two times faster. Yet I abandoned doing this. I noticed my old hard drive has some communication problems in some circumstances. Those problems didn't infere with the /usr RAID 0 scheme, anyway I got a little anxious and stopped taking the risk of a major system crash. Actually the speed boost is only for the first start of a software since computer boot (or a start after a heavy hard disk load). Subsequent starts are done from memory cache and won't be faster...

Several tools and commands exist to herd RAID partitions. Maybe install them and read the man pages. Apart from mkraid I use these two commands when I want to hack (first unmount /dev/md0):


# raidstop /dev/md0

# raidstart /dev/md0


I suppose I wouldn't get aware one of the physical partitions fails. To verify the RAID 1 system is OK I use this:


# cat /proc/mdstat


Which yields the output below. The data [2/2] means both partitions are in use:


Personalities : [linear] [raid0] [raid1] [raid5]
md0 : active raid1 hdc1[0] hda9[1]
      833472 blocks [2/2] [UU]

unused devices: <none>


It seems the system can send a mail to the system administrator should a RAID system encounter problems. I'm a newbie at RAID. I only used the very basics of a much broader technology. Best look for docs on the Web if you want to know more about RAID.

As mentioned above, RAID seems to have changed. The /etc/raidtab RAID configuration file is no more in use. Rather the install procedure of my Debian Sarge set up a /etc/mdadm/mdadm.conf file. Beneath is an example. /dev/md2 is the logical partition I used as /usr, /dev/md1 is the logical partition I now use as /home and /dev/md0 was a trial SWAP partition (it behaved fine but I don't use it any more, for security):


DEVICE partitions
ARRAY /dev/md2 level=raid0 num-devices=2 UUID=845552ff:875a5252:be515455:af486446
   devices=/dev/hda11,/dev/hdb11
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=166516ef:fddd6486:4686de46:16166519
   devices=/dev/hda6,/dev/hdb6
ARRAY /dev/md0 level=raid0 num-devices=2 UUID=1868ca86:8763f87e:545611ff:6784515d
   devices=/dev/hda5,/dev/hdb5


Also the mkraid, raidstart and raidstop commands no more are in use. They have been replaced by the mdadm command. At first glance in the man page, mdadm seems hardly usable. Yet examples at the end of the man page make things ways clearer.

RAID support is a part of the kernel system. Either inside the kernbel or as modules.




File system formats


Many different file system formats exist for Linux. The ones that seems most obvious are Ext2, Ext3 and ReiserFS. There also exists XFS and JFS.

All these formats behave neatly if the system lives in an ideal world. That is no system crashes nor power failures, neat startups and neat shutdown. I never encountered problems with those file systems in ideal circumstances. They don't seem to contain internal bugs that would be a menace to the files, for my simple usage anyway.

There are differences in speed between the formats. Some allow to read, write, append, move or delete files a little faster than the others. In certain circumstances a file format can be really much slower than another. On the average I would not take the factor of speed in account. The differences are to little to own a debate.

The acute question to me is what happens to the files when there is a serious problem, that is a system crash, a power failure or a shutdown abort. Next question is: should the file system be damaged beyond repair yet user files a still present, can they be recovered?
Clearly, my prefered format for file security, crash survival and overal comfort is Ext3. One thing I dislike with Ext2 and Ext3 is once in a while the system takes some time at startup to fully check out those file systems. Ext2 is worst in that regard: it needs a lengthy and thorough check after each crash and sometimes that check even doesn't succeed and you are dropped to a system command line. That makes Linux simply unusable if a technician is not at hand. Ext3 seems fine in that regard. I really like Ext3. I even didn't have to hack to recover my files for months, thanks to Ext3.

ReiserFS has one big advantage: it packs little files close together. That allows to get much more space out of little hard drives or partitions. Also I have the feeling ReiserFS allows old and slow computers to be swifter at their files.

XFS has three big disadvantages:
If in doubt, use Ext3. Clearly.



Eric Brasseur  -  May 21 2005  till  April 6 2006       [ Homepage | eric.brasseur@gmail.com ]