Forum Discussion

1 Reply

  • I supposed there might be a process in Uninterruptible sleep (D state).

     

    Check which process is this: ps -eo state,pid,cmd | grep "^D

     

    This zombie process is weird and you should check which process is it.

     

    The output you pasted doesn't show the whole picture.

     

    If it's BIG-IP, you can try running this script:

    #!/usr/bin/python
     
    import commands
    import time
    import re
     
    # 300 seconds = 5 minutes
    MAX_TIME_RUNNING = 600
    start = time.time()
     
    def get_d_processes():
        '''
        Returns a List of Uninterrupted Processes.
        Each Uninterrupted Process is a List containing state, pid and cmd
        '''
        output = commands.getoutput("ps -eo state,pid,cmd | grep '^D'").split('\n')
        for index, process in enumerate(output):
            output[index] = process.strip().split(' ', 2)
            output[index] = re.sub(r' +', ' ', ' '.join(output[index])).split(' ', 2)
        return output
     
    def get_rw_io_by_pid(pid):
        '''
        Returns a tuple with read_bytes and write_bytes
        '''
        return commands.getoutput('cat /proc/%s/io' % pid)
     
    def lsof(pid):
        '''
        Returns
        '''
        return commands.getoutput('lsof -p %s' % pid)
     
    while True:
        top = open('/shared/tmp/top.txt', 'ab')
        top.write('--------------------------\n')
        top.write(commands.getoutput('date') + '\n')
        top.write('--------------------------\n')
        top.write(commands.getoutput('top -Hcbn 1') + '\n')
        file = open('/shared/tmp/processes.txt', 'ab')
        all_d_processes = get_d_processes()
        # if there is no process to check, we skip below code
        if '' != all_d_processes[0][0]:
            file.write('--------------------------\n')
            file.write(commands.getoutput('date') + '\n')
            file.write('--------------------------\n')
            file.write ('All D processes: \n')
            file.write(commands.getoutput("ps -eo state,pid,cmd | grep '^D'") + '\n')
            file.write('--------------------------\n')
            file.write('current top IO: \n')
            file.write('--------------------------\n')
            file.write(commands.getoutput('top -Hcbn 1 | head -6') + '\n')
            for process in all_d_processes:
                file.write('--------------------------\n')
                file.write('Process: \n')
                file.write('--------------------------\n')
                file.write(' '.join(process) + '\n')
                file.write('--------------------------\n')
                file.write('cat /proc/%s/io output: \n' % process[1])
                file.write('--------------------------\n')
                file.write(get_rw_io_by_pid(process[1]) + '\n')
                file.write('--------------------------\n')
                file.write('lsof -p %s output: \n' % process[1] + '\n')
                file.write('--------------------------\n')
                file.write(lsof(process[1]) + '\n')
        time.sleep(10)
        if time.time() > start + MAX_TIME_RUNNING:
            file.close()
            top.close()
            break

    Have a look at /shared/tmp/processes.txt and the output should be something like this:

    $ cat processes.txt
    --------------------------
    Mon Oct 22 23:19:39 CEST 2018
    --------------------------
    All D processes:
    D  1547 [kjournald]
    D  9134 [kjournald]
    D 20838 asm_config_server_rpc_handler.pl
    --------------------------
    current top IO:
    --------------------------
    top - 23:19:40 up 16 days, 21:35,  2 users,  load average: 4.01, 4.60, 3.71
    Tasks: 755 total,   7 running, 746 sleeping,   0 stopped,   2 zombie
    Cpu(s): 27.1%us,  5.3%sy,  3.1%ni, 61.9%id,  2.0%wa,  0.2%hi,  0.4%si,  0.0%st
    Mem:  16528432k total, 15194320k used,  1334112k free,    78960k buffers
    Swap:  1048572k total,   808448k used,   240124k free,  3821212k cached
     
    --------------------------
    Process:
    --------------------------
    D 1547 [kjournald]
    --------------------------
    cat /proc/1547/io output:
    --------------------------
    rchar: 0
    wchar: 0
    syscr: 0
    syscw: 0
    read_bytes: 6504448
    write_bytes: 3407814656
    cancelled_write_bytes: 0
    --------------------------
    lsof -p 1547 output:
     
    --------------------------
    COMMAND    PID USER   FD      TYPE DEVICE SIZE/OFF NODE NAME
    kjournald 1547 root  cwd       DIR 253,17     1024    2 /
    kjournald 1547 root  rtd       DIR 253,17     1024    2 /
    kjournald 1547 root  txt   unknown