Top command is certainly there in every system admins’ frequently used command list. As the man page says, “The top program provides a dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel.” This is simple but still not well explained. I was learning to install munin-node when I decided to read more about every display of top.
(This is screenshot of top run on idealwebtools.com, shared server)
Lets look at each section.
The first section – Uptime
top - 13:46:02 up 1 day, 14:27,
Starting for left, 13:46:02 is current time, which can get it like
aji@sawyer [~]# date
Wed May 23 13:46:23 EDT 2007
The next section says the server uptime, it is important that servers can run without any restart for many 100s of days. You can also check it with
aji@sawyer [~]# uptime
13:48:09 up 1 day, 14:29, 1 user, load average: 2.38, 1.63, 1.62
The second section – Active User
So we have 1 active user, nothing more to say here.
The Third section – Load Average
This is a very important piece of information.
load average: 2.38, 1.63, 1.62
As most explanations tell you, the three values represent processor load averaged over the last 1 minute, 5 minutes, and 15 minutes, respectively. It is average processes that are queued awaiting processor service at during the given time. Many feel that less that 1 queued awaiting processor service per processor is a good. Some feel it can handle 10 queued awaiting processor service per processor. I will still recommend it to be as low as 1 per processor. It is achievable for sure. This does not give you a complete picture as you need to poll it again and again to see the trend. You can use various application that are available which can run a cron job to poll it after a specific period. You can read more about load average at http://www.teamquest.com/resources/gunther/display/5/index.htm. If you have sometime you can even write a small script and run it every 5 minutes using a cron. This load average is very important data to have.
Fourth section – Tasks
Next section will show the task details
Tasks: 194 total, 2 running, 192 sleeping, 0 stopped, 0 zombie
If you have a lot of tasks in running state, do a good analysis to check it. Tasks shown as running should be more properly thought of as ‘ready to run’. If you want to read more about Zombie task, please visit http://www.ussg.iu.edu/hypermail/linux/kernel/0212.1/0864.html. Rest of it is quite obvious, also see the processes that are running and kill unwanted process.
Fifth section – CPUs
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
What does these things mean? Here is a small explanation for each section :-
- us -> User CPU time: The time the CPU has spent running users’ processes that are not niced.
- sy -> System CPU time: The time the CPU has spent running the kernel and its processes.
- ni -> Nice CPU time: The time the CPU has spent running users’ proccess that have been niced.
- wa -> iowait: Amount of time the CPU has been waiting for I/O to complete.
- hi -> Hardware IRQ: The amount of time the CPU has been servicing hardware interrupts.
- si -> Software Interrupts.: The amount of time the CPU has been servicing software interrupts.
- id is idle, in other words CPU idle status
- st is Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown
This shows a breakup of CPU usage, depending on your servers role, you need to optimize it. If you have a lot of disk writing keep a watch on iowait. If might be wondering what does “The time the CPU has spent running users’ processes that are not niced.” mean? If you do a “man nice”, it will say “nice – run a program with modified scheduling priority“. It is called “nice” because the number that is given to a process determines how willing a task is to step aside and let other tasks monopolize the processor. The number varies from -20 to 19. The default value is 0, higher values lower the priority and lower values increase it. If you want to read more about nice, visit http://wiki.linuxquestions.org/wiki/Nice.
When you do a top, it shows the NI value for different process
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
17578 root 15 0 13456 13M 9020 S 18.5 1.3 26:35 1 rhn-applet-gu
19154 root 20 0 1176 1176 892 R 0.9 0.1 0:00 1 top
1 root 15 0 168 160 108 S 0.0 0.0 0:09 0 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 35 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
9 root 15 0 0 0 0 SW 0.0 0.0 0:07 1 bdflush
Sixth section – Memory
Mem: 1536000k total, 1437272k used, 98728k free, 234212k buffers
Swap: 1020116k total, 72k used, 1020044k free, 567208k cached
This is very much self explanatory. Even you can free -m to get a different view
total used free shared buffers cached
Mem: 1500 1403 96 0 228 553
-/+ buffers/cache: 620 879
Swap: 996 0 996
This is RAM and SWAP. If we recall the memory classes we had during post graduation, there are different types of memory – Physical –
- CPU Registers – this is the fastest, its like your hands used to do the tasks in the fastest way but very limited.
- CPU Cache – This is like your office desk, quickly accessible location
- RAM â€“ Random Access Memory – Its like your office, you will have to walk around to get the work done.
- Disk – This is a like a different location all together, so you will have to do a lot of traveling to get the work done. SWAP is basically a location of the disk used when RAM itself is not sufficient. The swap partitions are kept separate (not necessary, you can use a swap file instead) that OS can make the access as fast as possible.
If your server is using a lot of SWAP more often then you need to look into it as it will make your server go slow. We try not to use SWAP as much a possible. Swap cached means, written to swap, but still in memory. OS will anticipate memory needs, and pre-swap inactive data, but keep it in memory.
(SwapTotal – SwapFree – SwapCached) is Actual swapping (memory that will need to be read from disk)
Few more commands and reference for help
- Look at VMstat (do a man vmstat)
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 72 291196 236744 561308 0 0 15 23 6 42 2 0 97 0 0
- Also you can try Sysstat Suite of Resource Monitoring Tools.
A very big post for the day, Enjoy bottoms up for top.