My new office workstation was behaving very strangely. Once in a while it would slow down to a crawl and hang doing something. I checked the processlist, but nothing was going wrong. I disabled all the services. I checked my scheduled tasks. Nothing. I left it with top
running, but it showed that 100% of CPU was idle and more than 500 MBytes of RAM was free. Nothing was eating up resources. Neither was there any significant network traffic. The machine wasn’t swapping or anything. Non-the-less the load average would go up as high as 4.0 or even 7.0!
I investigated a bit more and noticed that applications themselves were responding fast. It’s only when they needed to work with the disk that slowdown happenned. That suggested that problem was I/O related. I installed sysstat
and procps
packages and went through a cycle of troubleshooting with iostat
and vmstat
utilities. Nothing came up.
Never ever I had so little data to start off. The problem seemed to be hardware related as I had with both Fedora Linux Core 3 and Fedora Linux Core 4 on this machine.
I went for ideas to hazard. He suggested to see if X had anything to do with it. Good point! I tried to kill the X server when the next slowdown happened, but it failed – the machine was too busy already. I than booted in runlevel 3 and worked for a bit without X. The slowdown happened again.
RAM tests didn’t report any problems. Disk checks went through OK. Unloading and removing of everything and anything out there didn’t help much. Updates for Fedora Linux Core 4 were installed, but didn’t help.
I was pretty much out of ideas and though that something was wrong with the motherboard or the CPU and that the hardware needs to be replaced.
I decided to bother Vladimir once again. He just looked at my computer from like 3 meters away and suggested a new direction. He remembered that recently he had to troubleshoot a similar problem and that it was indeed connected with hardware. He suggested to use nolapic
parameter for the kernel.
I tried it, but it didn’t help, although the delay until the next slowdown was much bigger. Vladimir than suggested to use nolapic noapic acpi=off
arguments for the kernel. I tried it and it worked! For almost two days now I didn’t have any problems.
I guess this is when Vladimir’s status officially changs from Linux guru to Linux God! Finding the solution to the problem by just looking at the computer case from 3 meters away. No need to type anything… Thanks Vlad. I am impressed!
You forgot to write that the actual problem was that timer interrupts stopped generating for some reason. :)
Yeah, well… and the time was going backwards (first time I saw something like that), but is that really interesting to read? :)