Skip to main content

Troubleshooting high server cpu LOAD and Memory usage

Here we can go throw , how server CPU loads can be troubleshooted. Figure out the root cause and solution

========================================================================

Sometime you may need to find out load in the VZ node where you have cPanel VPS

Issue the below command to spot which VM is the culprit


vzlist -o laverage,veid,hostname 

 check the result and log in to the VM


vzctl enter VMID


========================================================================


Finding Load causing connections and users
==============================

By default dcpumon runs every 5 min to log CPU usage ("top" output) and stores the data into /var/log/dcpumon

# crontab -l | fgrep cpu
*/5 * * * * /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
#

You can view the report with "dcpumonview" command:

root@cpanel [~]# /usr/local/cpanel/bin/dcpumonview
-----------------------------------------------------------
|User    |Domain                         |CPU%|MEM%|MySQL#|

-----------------------------------------------------------


-------------------------------------------------------------------------

To check number of IPs connected to port 80

root@cpanel [~]# netstat -tn 2>/dev/null | grep ':80 ' | awk '{print $5}' |sed -e 's/::ffff://' | cut -f1 -d: | sort | uniq -c | sort -rn | head
      3 107.170.3.41



To check number of IPs connected to port 25

netstat -tn 2>/dev/null | grep ':25 ' | awk '{print $5}' |sed -e 's/::ffff://' | cut -f1 -d: | sort | uniq -c | sort -rn | head


To list number of connections to domains in the server

/usr/bin/lynx -dump -width 500 http://127.0.0.1/whm-server-status | awk  'BEGIN { FS = " " } ; { print $12 }' | sed '/^$/d' | sort | uniq -c | sort -n



To list the Busiest Site in the server

/usr/bin/lynx -dump -width 500 http://127.0.0.1/whm-server-status | grep GET | awk '{print $12}' | sort | uniq -c | sort -rn | head



To list the Busiest Script running on the server

/usr/bin/lynx -dump -width 500 http://127.0.0.1/whm-server-status | grep GET | awk '{print $14}' | sort | uniq -c | sort -rn | head



To list the most running process in the server

ps aux | awk '{print $1}' | sort | uniq -c | sort -nk1 | tail -n5



To list the total process running by the users

ps aux | awk '{print $1}' | sort | uniq -c | sort -nk1 


ps aux | grep spam | awk '{print $2}' | xargs kill -9 

When we see the process in the top result with "php" or "/usr/bin/php", we can find the directory it is working with. You can use,

for i in `ps -ef | awk '/php/{print $2}'`; do ls -l /proc/${i}/cwd; done

We can even check this dynamically with in a particular time limit, say 5 sec. We can use it as below.

while true; do clear; for i in `ps -ef | awk '/php/{print $2}'`;do ls -l /proc/${i}/cwd; done; sleep 5; done


how do we confirm that the server is under DDOS attack?

netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c


To check the port connection extended issue the following command

root@cpanel [~]# netstat -plan |awk '/.*[0-9]+.[0-9]+.[0-9]+.[0-9].*/{gsub(/::ffff:/,"",$0);print $4"\t" $5}'|cut -sd. -f 1->netstat.log;clear;echo "Netstat report";echo;echo "Number of Connections to each port:";cat netstat.log |awk {'print $1'}|cut -d: -f 2|sort|uniq -c|sort -nk 1|tail;echo;echo "Number of connections from each IP:";cat netstat.log |awk {'print $2'}|cut -d: -f 1|sort|uniq -c|sort -nk 1|tail;echo;echo "The number of instances of a particular IP connecting to particular port";cat netstat.log |awk {'print $1 "\t" $2'}|cut -d: -f 2|sort|uniq -c|sort -nk 1|tail;
Netstat report

Number of Connections to each port:
      1 465
      1 55751
      1 587
      1 783
      1 80
      1 953
      1 993
      1 995
      2 22
      6 53

Number of connections from each IP:
      1 114.215.193.84
      1 116.68.67.201
      3 107.170.3.41
     28 0.0.0.0

The number of instances of a particular IP connecting to particular port
      1 44859   107.170.3.41
      1 465     0.0.0.0
      1 55751   114.215.193.84
      1 587     0.0.0.0
      1 783     0.0.0.0
      1 80      0.0.0.0
      1 953     0.0.0.0
      1 993     0.0.0.0
      1 995     0.0.0.0

      6 53      0.0.0.0


====================================================

CPU usage through TOP command
---------------------------------------------------------
You can get an overview through top command

top -cd 3 you can change the interval using d and absolute path is obtained from c option

Shift+p Sort the process using CPU utilization
Shift+m sort the process using memory utilization
Shift+O and p gives swap usage

Shift+W save the result

top -n1 -b > file.txt

save the result in readable format, -n option Gives the iteration
=======================================


Vmstat

Next I will often run "vmstat 1", which prints out statistics every second on the system utilization. The first line is the average since the system was last booted:
denver-database:~ # vmstat 1
procs ---------memory---------- --swap-- --io--- -system-- -----cpu-----
 r  b swpd   free   buff  cache  si  so  bi  bo   in   cs  us sy id wa st
 0  0  116 158096 259308 3083748   0   0  47  39   30   58 11  8 76  5  0
 2  0  116 158220 259308 3083748   0   0   0   0 1706 4899 22 14 64  0  0
 1  0  116 158220 259308 3083748   0   0   0 276 1435 1490  4  2 93  0  0
 0  0  116 158220 259308 3083748   0   0   0   0 1502 1569  5  3 92  0  0
 0  0  116 158220 259308 3083748   0   0   0 892 1394 1529  2  1 97  0  0
 1  0  116 158592 259308 3083748   0   0   0 216 1702 1825  8  7 84  1  0
 0  0  116 158344 259308 3083748   0   0   0 368 1465 1461  8  7 84  0  0
 0  0  116 158344 259308 3083748   0   0   0 940 1992 2115  2  2 95  0  0
 0  0  116 158344 259308 3083748   0   0   0 240 1906 1982  6  7 87  0  0
The first thing I'll look at here is the "wa" column; the mount of CPU time spent waiting. If this is high you almost certainly have something hitting the disc hard.
If the "wa" is high, the next thing I'd look at is the "swap" columns "si" and "so". If these are much above 0 on a regular basis, it probably means you're out of memory and the system is swapping. Since RAM is around a million times faster than a hard drive (10ns instead of 10ms), swapping much can cause the system to really grind to a halt. Note however that some swapping, particularly swapping out, is normal.
Next I'd look at the "id" column under "cpu" for the amount of idle CPU time. If this is around 0, it means the CPU is heavily used. If it is, the "sy" and "us" columns tell us how much time is being used by the kernel and user-space processes.
If CPU "sy" time is high, this can often indicate that there are some large directories (say a user's "spam" mail directory) with hundreds of thousand or millions of entries, or other large directory trees. Another common cause of high "sy" CPU time is the system firewall: iptables. There are other causes of course but these seem to be the primary ones.
If CPU "us" is high, that's easy to track down with "top".


List Active and Inactive Memory

rahul-Inspiron-3542 ~ # vmstat -a
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa
 1  0      0 1758544 695088 1263868    0    0   164   218 1178 2293 21  6 70  4

    1. Free – Amount of free/idle memory spaces.
    2. si – Swaped in every second from disk in Kilo Bytes.
    3. so – Swaped out every second to disk in Kilo Bytes.

. Execute vmstat ‘X’ seconds and (‘N’number of times)

 rahul-Inspiron-3542 ~ # vmstat 2 6
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0 1742616 117412 951516    0    0   158   211 1188 2317 21  6 70  4
 2  0      0 1741456 117412 951280    0    0     0     0 2242 4159 18  6 75  1
 0  0      0 1740604 117412 951500    0    0     0     0 2171 4069 15  5 79  1
 0  0      0 1739704 117424 951544    0    0     0   218 2283 4222 15  4 80  1
 1  0      0 1738656 117424 951680    0    0     0     0 2071 3988 14  6 79  1
 0  0      0 1737168 117444 951680    0    0     0   150 2166 4146 19  6 71  4

Vmstat with timestamps

vmstat command with -t parameter shows timestamps with every line printed as shown below.

vmstat -t 1 5

Statistics of Various Counter

rahul-Inspiron-3542 ~ # vmstat -s
      3951052 K total memory
      2214136 K used memory
      1249828 K active memory
       727244 K inactive memory
      1736916 K free memory
       118576 K buffer memory
       972724 K swap cache
            0 K total swap
            0 K used swap
            0 K free swap
        89917 non-nice user cpu ticks
         6322 nice user cpu ticks
        24612 system cpu ticks
       320330 idle cpu ticks
        16514 IO-wait cpu ticks
            0 IRQ cpu ticks
          974 softirq cpu ticks
            0 stolen cpu ticks
       701523 pages paged in
       948092 pages paged out
            0 pages swapped in
            0 pages swapped out
      5485628 interrupts
     10703327 CPU context switches
   1427291094 boot time
         3821 forks

Disks Statistics

vmstat with -d option display all disks statistics.

rahul-Inspiron-3542 ~ # vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
ram0       0      0       0       0      0      0       0       0      0      0
ram1       0      0       0       0      0      0       0       0      0      0
ram2       0      0       0       0      0      0       0       0      0      0
ram3       0      0       0       0      0      0       0       0      0      0
ram4       0      0       0       0      0      0       0       0      0      0
ram5       0      0       0       0      0      0       0       0      0      0
ram6       0      0       0       0      0      0       0       0      0      0
ram7       0      0       0       0      0      0       0       0      0      0
ram8       0      0       0       0      0      0       0       0      0      0
ram9       0      0       0       0      0      0       0       0      0      0
ram10      0      0       0       0      0      0       0       0      0      0
ram11      0      0       0       0      0      0       0       0      0      0
ram12      0      0       0       0      0      0       0       0      0      0
ram13      0      0       0       0      0      0       0       0      0      0
ram14      0      0       0       0      0      0       0       0      0      0
ram15      0      0       0       0      0      0       0       0      0      0
loop0      0      0       0       0      0      0       0       0      0      0
loop1      0      0       0       0      0      0       0       0      0      0
loop2      0      0       0       0      0      0       0       0      0      0
loop3      0      0       0       0      0      0       0       0      0      0
loop4      0      0       0       0      0      0       0       0      0      0
loop5      0      0       0       0      0      0       0       0      0      0
loop6      0      0       0       0      0      0       0       0      0      0
loop7      0      0       0       0      0      0       0       0      0      0
sda    27433   8650 1379750  757516  10521  39121 1910440  512104      0    135
sr0        0      0       0       0      0      0       0       0      0      0
sdb        0      0       0       0      0      0       0       0      0      0

Display Statistics in Megabytes

The vmstat displays in Megabytes with parameters -S and M(Uppercase & megabytes). By default vmstat displays statistics in kilobytes.


rahul-Inspiron-3542 ~ # vmstat -S M 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  1      0   1693    116    949    0    0   147   201 1208 2360 21  6 70  4
 3  0      0   1700    116    949    0    0     0     0 2261 4220 20  7 73  1
 3  0      0   1700    116    949    0    0     0     0 2092 4047 16  6 77  0
 3  0      0   1700    116    949    0    0     0   304 2058 3966 15  5 78  2
 4  0      0   1700    116    949    0    0     0     0 2001 3822 15  5 80  0


++++++++++++++++++++++++++++++++++++++++++++

ps awwlx --sort=vsz

If there is swapping going on I like to look at the big processes via "ps awwlx --sort=vsz". This shows processes sorted by virtual sizes (which does include shared libraries, but also counts blocks swapped out to disc).

Iostat

For systems where there is a lot of I/O activity (shown via the "bi" and "bo" being high, but "si" and "so" being low), iostat can tell you more about what hard drives the activity is happening on, and what the utilization is. Normally I will run "iostat -x 5" which causes it to print out updated stats every 5 seconds:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.64    0.00    3.95    0.30    0.00   90.11

Device: rrqm/s wrqm/s   r/s   w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda       0.00   9.60  0.60  2.40   6.40  97.60    34.67     0.01  4.80  4.80  1.44
I'll first look at the "%util" column, if it's approaching 100% then that device is being hit hard. In this case we only have one device, so I can't use this to isolate where the heavy activity might be happening, but if the database were on it's own partition that could help track it down.
"await" is a very useful column, it tells us how long the device takes to service a request. If this gets high, it probably indicates saturation.
Other information iostat gives can tell us if the activity is read-oriented or writes, and whether they are small or large writes (based on the sec/s sectors per second rate and the number of read/writes per second).


Iotop

This requires a very recent kernel (2.6.20 or newer), so this isn't something I tend to run very often: most of the systems I maintain are enterprise distros, so they have older kernels. RHEL/CentOS 3/4/5 are too old, Ubuntu Hardy doesn't have iotop, but Lucid does support it.
iotop is like top but it will show processes that are doing heavy I/O. However, often this may be a kernel process so you still may not be able to tell exactly what process is causing the I/O load. It's much better than what we had in the past though.

Display CPU and I/O statistics

iostat without arguments displays CPU and I/O statistics of all partitions as shown below.

rahul-Inspiron-3542 ~ # iostat
Linux 3.11.0-12-generic (rahul-Inspiron-3542)     Wednesday 25 March 2015     _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          20.07    1.28    5.62    3.46    0.00   69.56

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              15.40       277.43       392.27     689879     975456

Shows only CPU Statistics

iostat with -c arguments displays only CPU statistics as shown below.

rahul-Inspiron-3542 ~ # iostat -c
Linux 3.11.0-12-generic (rahul-Inspiron-3542)     Wednesday 25 March 2015     _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          20.12    1.27    5.65    3.43    0.00   69.53


Shows only Disks I/O Statistics

iostat with -d arguments displays only disks I/O statistics of all partitions as shown.

rahul-Inspiron-3542 ~ # iostat -d
Linux 3.11.0-12-generic (rahul-Inspiron-3542)     Wednesday 25 March 2015     _x86_64_    (2 CPU)

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              15.12       271.54       387.08     689879     983408

Shows I/O statistics only of a single device.

By default it displays statistics of all partitions, with -p and device name arguments displays only disks I/O statistics for specific device only as shown.


rahul-Inspiron-3542 ~ # iostat -p /dev/sda
Linux 3.11.0-12-generic (rahul-Inspiron-3542)     Wednesday 25 March 2015     _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          20.20    1.24    5.68    3.39    0.00   69.49

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              14.95       268.29       383.89     689887     987144
sda1              0.06         0.25         0.00        648          0
sda2              0.06         0.26         0.00        656          0
sda3              0.06         0.25         0.00        648          0
sda4              0.00         0.00         0.00          2          0
sda5              0.06         0.25         0.00        648          0
sda6             14.59       266.95       383.89     686433     987144


Display LVM Statistics

With -N (Uppercase) parameter displays only LVM statistics as shown.


rahul-Inspiron-3542 ~ # iostat -N


====================================================
mpstat

rahul-Inspiron-3542 ~ # mpstat 1
Linux 3.11.0-12-generic (rahul-Inspiron-3542)     Wednesday 25 March 2015     _x86_64_    (2 CPU)

07:47:21  IST  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:47:22  IST  all   12.63    0.00    6.06    0.51    0.00    0.51    0.00    0.00    0.00   80.30
07:47:23  IST  all   13.92    0.00    5.67    1.55    0.00    0.00    0.00    0.00    0.00   78.87
07:47:24  IST  all   23.72    0.00    6.05    0.00    0.00    0.00    0.00    0.00    0.00   70.23
07:47:25  IST  all   18.66    0.00    7.18    5.26    0.00    0.00    0.00    0.00    0.00   68.90
===================================================

lsof

lsof -i TCP:1-1024 open files on the port ranges
lsof -i TCP:22 
open files on the port 
lsof -i -u rahul open files by user rahul,

Find Out who’s Looking What Files and Commands


lsof -i  List all network connections

lsof -p 1 List the open files by pid

 kill -9 `lsof -t -u rahul` kill all the process by rahul
===================================================


Resolving: High Apache Memory Usage


#!/bin/bash
echo "This is intended as a guideline only!"
if [ -e /etc/debian_version ]; then
    APACHE="apache2"
elif [ -e /etc/redhat-release ]; then
    APACHE="httpd"
fi
RSS=$(ps -aylC $APACHE |grep "$APACHE" |awk '{print $8'} |sort -n |tail -n 1)
RSS=$(expr $RSS / 1024)
echo "Stopping $APACHE to calculate free memory"
/etc/init.d/$APACHE stop &> /dev/null
MEM=$(free -m |head -n 2 |tail -n 1 |awk '{free=($4); print free}')
echo "Starting $APACHE again"
/etc/init.d/$APACHE start &> /dev/null
echo "MaxClients should be around" $(expr $MEM / $RSS)
 
 echo 20 > /proc/sys/vm/swappiness
==========================================================================================
MYSQL USAGE
+++++++++++++
mysqladmin proc stat

mysqladmin kill pid

Comments

Post a Comment

Popular posts from this blog

RAID

Check the Raid installed lspci | grep RAID     Software Raid ============== Linux Support For Software RAID Currently, Linux supports the following RAID levels (quoting from the man page): LINEAR RAID0 (striping) RAID1 (mirroring) RAID4 RAID5 RAID6 RAID10 MULTIPATH, and FAULTY. MULTIPATH is not a Software RAID mechanism, but does involve multiple devices: each device is a path to one common physical storage device. FAULTY is also not true RAID, and it only involves one device. It provides a layer over a true device that can be used to inject faults. Install mdadm Type the following command under RHEL / CentOS / Fedora Linux: # yum install mdadm Type the following command under Debian / Ubuntu Linux: # apt-get update && apt-get install mdadm How Do I Create RAID1 Using mdadm? Type the following command to create RAID1 using /dev/sdc1 and /dev/sdd1 (20GB size each). First run fdisk on /dev/sdc and /dev/sdd with " Software R

How to tweak linux server harddisk using hdparm

hdparm switches explained http://manpages.ubuntu.com/manpages/intrepid/man8/hdparm.8.html   First of all you have to install hdparm in linux. apt-get install hdparm #hdparm /dev/sda /dev/sda: readonly = 0 (off) readahead = 120 (on) geometry = 8850/255/63, sectors = 142182912, start = 0 Hard disk Performance Information # hdparm -tT /dev/hda /dev/hdd: Timing cached reads: 496 MB in 2.00 seconds = 247.42 MB/sec Timing buffered disk reads: 60 MB in 3.03 seconds = 19.81 MB/sec Hard drive set to low, slow settings # hdparm -cuda /dev/hda /dev/hda: IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 0 (off) readahead = 256 (on) Use below tweaks to increase disk read write performance. For sda drive ~]# hdparm -a 2048 /dev/sda /dev/sda: setting fs readahead to 2048 readahead = 2048 (on) For sdb drive [root@439298a ~]# hdparm -a 2048 /dev/sdb /dev/sdb: setting fs readahead to 2048 readahead = 2048 (on) ]# echo “anticipatory” > /sy

Modsecurity block rule for XMLRPC and wp-login attack

SecAction phase:1,nolog,pass,initcol:ip=%{REMOTE_ADDR},initcol:user=%{REMOTE_ADDR},id:5000134  <Locationmatch "/wp-login.php">  SecRule user:bf_block "@gt 0" "deny,status:401,log,id:5000135,msg:'ip address blocked for 5 minutes, more than 10 login attempts in 3 minutes.'"  SecRule RESPONSE_STATUS "^302" "phase:5,t:none,nolog,pass,setvar:ip.bf_counter=0,id:5000136"  SecRule RESPONSE_STATUS "^200" "phase:5,chain,t:none,nolog,pass,setvar:ip.bf_counter=+1,deprecatevar:ip.bf_counter=1/180,id:5000137"  SecRule ip:bf_counter "@gt 10" "t:none,setvar:user.bf_block=1,expirevar:user.bf_block=300,setvar:ip.bf_counter=0"  </Locationmatch>  SecAction phase:1,nolog,pass,initcol:ip=%{REMOTE_ADDR},initcol:user=%{REMOTE_ADDR},id:5000234  <Locationmatch "/xmlrpc.php">  SecRule user:bf_block "@gt 0" "deny,status:401,log,id:5000235,msg:'ip address blocked for 5 m