记一次数据库无法增删改趋于HANG住状态的故障诊断和处理 -电脑资料

电脑资料 时间:2019-01-01 我要投稿
【meiwen.anslib.com - 电脑资料】

    数据库无法增删改,包括v$transaction视图无法查询,类似于HANG的状态,我首先我通过查询v$session_wait视图,情况如下: SQL> select sid,event,p1,p2,p3,wait_time,seconds_in_wait,state from v$session_wait where wait_class <> 'Idle';

    SID EVENT P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT STATE

    ---------- ----------------------------------------- ---------- ---------- ---------- ---------- --------------- -------------------

    125 library cache lock 2130013560 2158412560 301 0 7850 WAITING

    126 library cache lock 2130014088 2158399144 301 0 7850 WAITING

    127 library cache lock 2130014088 2158495528 301 0 9231 WAITING

    128 library cache lock 2130013560 2158380576 301 0 9231 WAITING

    129 library cache lock 2130013560 2158307736 301 0 10611 WAITING

    130 library cache lock 2130014088 2158498840 301 0 10611 WAITING

    131 buffer busy waits 1 11170 1 0 11839 WAITING

    132 library cache lock 2130013560 2158391432 301 0 11992 WAITING

    133 log file switch (checkpoint incomplete) 0 0 0 0 12616 WAITING

    136 log file switch (checkpoint incomplete) 0 0 0 0 12947 WAITING

    138 enq: TX - row lock contention 1415053318 589854 665 0 13321 WAITING

    139 buffer busy waits 2 9 17 0 12616 WAITING

    141 enq: WF - contention 1464205318 0 0 0 1650 WAITING

    144 enq: CI - contention 1128857606 1 5 0 15355 WAITING

    150 log file switch (checkpoint incomplete) 0 0 0 0 12891 WAITING

    159 switch logfile command 0 0 0 0 17051 WAITING

    161 log file switch (checkpoint incomplete) 0 0 0 0 12715 WAITING

    164 rdbms ipc reply 7 21457644 0 0 0 WAITING

    18 rows selected

    以上看到有三个log file switch (checkpoint incomplete)等待事件,意味着checkpoint没完成,于是查询v$log检查在线日志的情况:

    SQL> select * from v$log;

    GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM

    ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------

    1 1 113 52428800 1 NO CURRENT 8590086940 21-MAR-15 <<<

    2 1 111 52428800 1 YES ACTIVE 8590086619 21-MAR-15 <<<

    3 1 112 52428800 1 YES ACTIVE 8590086938 21-MAR-15 <<<

    SQL> archive log list;

    Database log mode Archive Mode

    Automatic archival Enabled <<<<<<归模式档

    Archive destination /home/oracle/arch

    Oldest online log sequence 111

    Next log sequence to archive 113

    Current log sequence 113

    SQL> !

    [oracle@ora10g bdump]$ df -h

    Filesystem Size Used Avail Use% Mounted on

    /dev/sda3 95G 14G 77G 15% / <<<<<<<<<<<<磁盘空间足够,说明不是磁盘空间不够无法归档造成的

    /dev/sda1 99M 12M 82M 13% /boot

    tmpfs 1006M 0 1006M 0% /dev/shm

    [oracle@ora10g bdump]$ ls -lrt /home/oracle/arch/

    total 107176

    -rw-r----- 1 oracle oinstall 952832 Mar 15 12:02 1_103_847657195.dbf

    -rw-r----- 1 oracle oinstall 29585920 Mar 17 21:35 1_104_847657195.dbf

    -rw-r----- 1 oracle oinstall 14306816 Mar 21 12:02 1_105_847657195.dbf

    -rw-r----- 1 oracle oinstall 22298112 Mar 21 12:19 1_106_847657195.dbf

    -rw-r----- 1 oracle oinstall 42112000 Mar 21 17:14 1_107_847657195.dbf

    -rw-r----- 1 oracle oinstall 159232 Mar 21 17:20 1_108_847657195.dbf

    -rw-r----- 1 oracle oinstall 1536 Mar 21 17:21 1_109_847657195.dbf

    -rw-r----- 1 oracle oinstall 15360 Mar 21 17:24 1_110_847657195.dbf

    -rw-r----- 1 oracle oinstall 148480 Mar 21 17:30 1_111_847657195.dbf

    -rw-r----- 1 oracle oinstall 1024 Mar 21 17:30 1_112_847657195.dbf <<<<<<<<自17:30时间以后不再有归档日志产生

    [oracle@ora10g bdump]$ date

    Sat Mar 21 22:24:09 CST 2015

    在线日志切换之后没有并归档之前都是处于ACTIVE状态,该状态下无法被复用,没有在线日志组可用时候,数据库会挂起.

    做了HANGANALYZE:

    ==============

    HANG ANALYSIS:

    ==============

    Open chains found:

    Chain 1 : :

    <0/133/2/0x83a69028/4790/log file switch (checkpoint inco>

    -- <0/129/1/0x83a6afc8/4877/library cache lock>

    -- <0/125/1/0x83a6cf68/4993/library cache lock>

    -- <0/128/3/0x83a6b7b0/4931/library cache lock>

    -- <0/132/3/0x83a69810/4815/library cache lock> <<<<<<<<<<<

    Chain 2 : :

    <0/164/1/0x83a5f208/3048/rdbms ipc reply>

    -- <0/144/47/0x83a660b8/4402/enq: CI - contention>

    -- <0/138/137/0x83a68058/4757/enq: TX - row lock contention> <<<<<

    Other chains found:

    Chain 3 : :

    <0/123/6/0x83a6d750/5378/No Wait>

    Chain 4 : :

    <0/126/2/0x83a6c780/4991/library cache lock>

    -- <0/127/1/0x83a6bf98/4933/library cache lock>

    -- <0/130/1/0x83a6a7e0/4875/library cache lock>

    Chain 5 : :

    <0/131/1/0x83a69ff8/4821/buffer busy waits>

    -- <0/126/2/0x83a6c780/4991/library cache lock>

    -- <0/127/1/0x83a6bf98/4933/library cache lock>

    -- <0/130/1/0x83a6a7e0/4875/library cache lock>

    Chain 6 : :

    <0/136/7/0x83a68840/4770/log file switch (checkpoint inco>

    Chain 7 : :

    <0/139/10/0x83a67870/4609/buffer busy waits>

    Chain 8 : :

    <0/141/6/0x83a668a0/4748/enq: WF - contention>

    Chain 9 : :

    <0/147/3/0x83a658d0/3079/Streams AQ: qmn slave idle wait>

    Chain 10 : :

    <0/148/4/0x83a650e8/3077/Streams AQ: waiting for time man>

    Chain 11 : :

    <0/150/166/0x83a64900/4752/log file switch (checkpoint inco>

    Chain 12 : :

    <0/152/1/0x83a64118/3068/Streams AQ: qmn coordinator idle>

    Chain 13 : :

    <0/159/3/0x83a62960/3062/switch logfile command>

    Chain 14 : :

    <0/161/1/0x83a609c0/3054/log file switch (checkpoint inco>

    Extra information that will be dumped at higher levels:

    [level 4] : 2 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]

    [level 5] : 8 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]

    [level 6] : 12 node dumps -- [NLEAF]

    [level 10] : 12 node dumps -- [IGN]

    State of nodes

    ([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):

    [122]/0/123/6/0x83b52750/5378/SINGLE_NODE_NW/1/2//none

    [124]/0/125/1/0x83b55220/4993/NLEAF/3/8/[128][132]/127

    [125]/0/126/2/0x83b56788/4991/NLEAF/9/12/[130]/126

    [126]/0/127/1/0x83b57cf0/4933/NLEAF/13/14/[125][130]/129

    [127]/0/128/3/0x83b59258/4931/NLEAF/15/16/[128][124][132]/131

    [128]/0/129/1/0x83b5a7c0/4877/NLEAF/4/7/[132]/124

    [129]/0/130/1/0x83b5bd28/4875/NLEAF/17/18/[125][126][130]/none

    [130]/0/131/1/0x83b5d290/4821/NLEAF/10/11/[132]/125

    [131]/0/132/3/0x83b5e7f8/4815/NLEAF/19/20/[128][124][127][132]/none

    [132]/0/133/2/0x83b5fd60/4790/LEAF/5/6//128 <<<<<<

    [135]/0/136/7/0x83b63d98/4770/SINGLE_NODE/21/22//none

    [137]/0/138/137/0x83b66868/4757/NLEAF/23/28/[143]/none

    [138]/0/139/10/0x83b67dd0/4609/NLEAF/29/30/[132]/none

    [139]/0/140/8/0x83b69338/4605/IGN/31/32//none

    [140]/0/141/6/0x83b6a8a0/4748/NLEAF/33/34/[143]/none

    [143]/0/144/47/0x83b6e8d8/4402/NLEAF/24/27/[163]/137

    [146]/0/147/3/0x83b72910/3079/SINGLE_NODE/35/36//none

    [147]/0/148/4/0x83b73e78/3077/SINGLE_NODE/37/38//none

    [149]/0/150/166/0x83b76948/4752/SINGLE_NODE/39/40//none

    [151]/0/152/1/0x83b79418/3068/SINGLE_NODE/41/42//none

    [154]/0/155/1/0x83b7d450/3066/IGN/43/44//none

    [155]/0/156/1/0x83b7e9b8/3064/IGN/45/46//none

    [158]/0/159/3/0x83b829f0/3062/SINGLE_NODE/47/48//none

    [159]/0/160/1/0x83b83f58/3056/IGN/49/50//none

    [160]/0/161/1/0x83b854c0/3054/SINGLE_NODE/51/52//none

    [161]/0/162/1/0x83b86a28/3052/IGN/53/54//none

    [162]/0/163/1/0x83b87f90/3050/IGN/55/56//none

    [163]/0/164/1/0x83b894f8/3048/LEAF/25/26//143 <<<<<<

    [165]/0/166/1/0x83b8bfc8/3044/IGN/59/60//none

    [166]/0/167/1/0x83b8d530/3042/IGN/61/62//none

    [167]/0/168/1/0x83b8ea98/3040/IGN/63/64//none

    [168]/0/169/1/0x83b90000/3038/IGN/65/66//none

    [169]/0/170/1/0x83b91568/3036/IGN/67/68//none

    做了SYSTEMSTATE DUMP:用ass对SYSTEMSTATE处理以后:System State 1

    ~~~~~~~~~~~~~~~~

    1:

    2: waiting for 'pmon timer' wait

    3: waiting for 'rdbms ipc message' wait

    4: waiting for 'rdbms ipc message' wait

    5: waiting for 'rdbms ipc message' wait

    6: waiting for 'rdbms ipc message' wait

    7: waiting for 'rdbms ipc message' wait

    8: waiting for 'rdbms ipc reply' wait

    9: waiting for 'rdbms ipc message' wait

    10: waiting for 'rdbms ipc message' wait

    11: waiting for 'log file switch (checkpoint incomplete)' wait

    12: waiting for 'rdbms ipc message' wait

    13:

    14:

    15: waiting for 'switch logfile command' wait

    16: waiting for 'rdbms ipc message' wait

    17: waiting for 'rdbms ipc message' wait

    18: waiting for 'Streams AQ: qmn coordinator idle wait' wait

    19: waiting for 'log file switch (checkpoint incomplete)' wait

    Cmd: Insert

    20: for 'Streams AQ: waiting for time management or cleanup tasks' wait

    21: waiting for 'Streams AQ: qmn slave idle wait' wait

    22: waiting for 'enq: CI - contention' [Enqueue CI-00000001-00000005] wait

    23: waiting for 'enq: WF - contention' [Enqueue WF-00000000-00000000] wait

    Cmd: PL/SQL Execute

    24: waiting for 'SQL*Net message from client' wait

    25: waiting for 'buffer busy waits' (2,9,11) wait

    Cmd: Select

    26: waiting for 'enq: TX - row lock contention'[Enqueue TX-0009001E-00000299] wait

    27: waiting for 'log file switch (checkpoint incomplete)' wait

    Cmd: Insert

    28: waiting for 'log file switch (checkpoint incomplete)' wait

    29: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    30: waiting for 'buffer busy waits' (1,2ba2,1) wait

    31: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

    32: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    33: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    34: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

    35: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

    36: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    37: last wait for 'ksdxexeotherwait'

    Blockers

    ~~~~~~~~

    Above is a list of all the processes. If they are waiting for a resource

    then it will be given in square brackets. Below is a summary of the

    waited upon resources, together with the holder of that resource.

    Notes:

    ~~~~~

    o A process id of '???' implies that the holder was not found in the

    systemstate.

    Resource Holder State

    Enqueue CI-00000001-00000005 8: waiting for 'rdbms ipc reply'

    Enqueue WF-00000000-00000000 22: 22: is waiting for 8:

    Enqueue TX-0009001E-00000299 22: 22: is waiting for 8:

    LOCK: handle=7ef56d78 28: waiting for 'log file switch (checkpoint incomplete)'

    LOCK: handle=7ef56f88 30: waiting for 'buffer busy waits' (1,2ba2,1)

    Object Names

    ~~~~~~~~~~~~

    Enqueue CI-00000001-00000005

    Enqueue WF-00000000-00000000

    Enqueue TX-0009001E-00000299

    LOCK: handle=7ef56d78 TABL:EXFSYS.RLM$SCHDNEGACTION

    LOCK: handle=7ef56f88 TABL:EXFSYS.RLM$EVTCLEANUP

    System State 2

    ~~~~~~~~~~~~~~~~

    1:

    2: waiting for 'pmon timer' wait

    3: waiting for 'rdbms ipc message' wait

    4: waiting for 'rdbms ipc message' wait

    5: waiting for 'rdbms ipc message' wait

    6: waiting for 'rdbms ipc message' wait

    7: waiting for 'rdbms ipc message' wait

    8: waiting for 'rdbms ipc reply' wait

    9: waiting for 'rdbms ipc message' wait

    10: waiting for 'rdbms ipc message' wait

    11: waiting for 'log file switch (checkpoint incomplete)' wait

    12: waiting for 'rdbms ipc message' wait

    13:

    14:

    15: waiting for 'switch logfile command' wait

    16: waiting for 'rdbms ipc message' wait

    17: waiting for 'rdbms ipc message' wait

    18: waiting for 'Streams AQ: qmn coordinator idle wait' wait

    19: waiting for 'log file switch (checkpoint incomplete)' wait

    Cmd: Insert

    20: for 'Streams AQ: waiting for time management or cleanup tasks' wait

    21: waiting for 'Streams AQ: qmn slave idle wait' wait

    22: waiting for 'enq: CI - contention' [Enqueue CI-00000001-00000005] wait

    23: waiting for 'enq: WF - contention' [Enqueue WF-00000000-00000000] wait

    Cmd: PL/SQL Execute

    24: waiting for 'SQL*Net message from client' wait

    25: waiting for 'buffer busy waits' (2,9,11) wait

    Cmd: Select

    26: waiting for 'enq: TX - row lock contention'[Enqueue TX-0009001E-00000299] wait

    27: waiting for 'log file switch (checkpoint incomplete)' wait

    Cmd: Insert

    28: waiting for 'log file switch (checkpoint incomplete)' wait

    29: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    30: waiting for 'buffer busy waits' (1,2ba2,1) wait

    31: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

    32: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    33: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    34: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

    35: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

    36: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

    37: last wait for 'ksdxexeotherwait'

    Blockers

    ~~~~~~~~

    Above is a list of all the processes. If they are waiting for a resource

    then it will be given in square brackets. Below is a summary of the

    waited upon resources, together with the holder of that resource.

    Notes:

    ~~~~~

    o A process id of '???' implies that the holder was not found in the

    systemstate.

    Resource Holder State

    Enqueue CI-00000001-00000005 8: waiting for 'rdbms ipc reply'

    Enqueue WF-00000000-00000000 22: 22: is waiting for 8:

    Enqueue TX-0009001E-00000299 22: 22: is waiting for 8:

    LOCK: handle=7ef56d78 28: waiting for 'log file switch (checkpoint incomplete)'

    LOCK: handle=7ef56f88 30: waiting for 'buffer busy waits' (1,2ba2,1)

    Object Names

    ~~~~~~~~~~~~

    Enqueue CI-00000001-00000005

    Enqueue WF-00000000-00000000

    Enqueue TX-0009001E-00000299

    LOCK: handle=7ef56d78 TABL:EXFSYS.RLM$SCHDNEGACTION

    LOCK: handle=7ef56f88 TABL:EXFSYS.RLM$EVTCLEANUP

    List of Processes That May Be Stuck

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    8: waiting for 'rdbms ipc reply' wait

    11: waiting for 'log file switch (checkpoint incomplete)' wait

    15: waiting for 'switch logfile command' wait

    18: waiting for 'Streams AQ: qmn coordinator idle wait' wait

    19: waiting for 'log file switch (checkpoint incomplete)' wait

    20: for 'Streams AQ: waiting for time management or cleanup tasks' wait

    21: waiting for 'Streams AQ: qmn slave idle wait' wait

    22: waiting for 'enq: CI - contention' wait

    23: waiting for 'enq: WF - contention' wait

    24: waiting for 'SQL*Net message from client' wait

    25: waiting for 'buffer busy waits' (2,9,11) wait

    26: waiting for 'enq: TX - row lock contention' wait

    27: waiting for 'log file switch (checkpoint incomplete)' wait

    28: waiting for 'log file switch (checkpoint incomplete)' wait

    29: waiting for 'library cache lock' wait

    30: waiting for 'buffer busy waits' (1,2ba2,1) wait

    31: waiting for 'library cache lock' wait

    32: waiting for 'library cache lock' wait

    33: waiting for 'library cache lock' wait

    34: waiting for 'library cache lock' wait

    35: waiting for 'library cache lock' wait

    36: waiting for 'library cache lock' wait

    SO: 0x83b5fd60, type: 4, owner: 0x83a69028, flag: INIT/-/-/0x00

    (session) sid: 133 trans: 0x822d5790, creator: 0x83a69028, flag: (41) USR/- BSY/-/-/-/-/-

    DID: 0001-001C-00000002, short-term DID: 0000-0000-00000000

    txn branch: (nil)

    oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS

    service name: SYS$USERS

    O/S info: user: oracle, term: UNKNOWN,ospid: 4790, machine: ora10g

    program: oracle@ora10g (J001)

    waiting for 'log file switch (checkpoint incomplete)' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=12130

    =0, =0, =0

    Dumping Session Wait History

    for 'log file switch (checkpoint incomplete)' count=1 wait_time=1107898

    =0, =0, =0

    for 'log file switch (checkpoint incomplete)' count=1 wait_time=336 ...

    PROCESS 8:

    ----------------------------------------

    SO: 0x83a5f208, type: 2, owner: (nil), flag: INIT/-/-/0x00

    (process) Oracle pid=8, calls cur/top: 0x83ba9a28/0x83ba9a28, flag: (16) SYSTEM

    int error: 0, call error: 0, sess error: 0, txn error 0

    (post info) last post received: 0 0 33

    last post received-location: ksrpublish

    last process to post me: 83a68840 1 0

    last post sent: 0 0 24

    last post sent-location: ksasnd

    last process posted by me: 83a5ea20 1 6

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 0x83aa67e8

    O/S info: user: oracle, term: UNKNOWN, ospid: 3048

    OSD pid info: Unix processpid: 3048, image: oracle@ora10g (SMON)

    Short stack dump: ...

    PROCESS 22:

    ----------------------------------------

    SO: 0x83a660b8, type: 2, owner: (nil), flag: INIT/-/-/0x00

    (process) Oracle pid=22, calls cur/top: 0x83bafe00/0x83bafb40, flag: (2) SYSTEM

    int error: 0, call error: 0, sess error: 0, txn error 0

    (post info) last post received: 0 0 0

    last post received-location: No post

    last process to post me: none

    last post sent: 0 0 24

    last post sent-location: ksasnd

    last process posted by me: 83a5e238 1 6

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 0x83aa67e8

    O/S info: user: oracle, term: UNKNOWN, ospid:4402

    OSD pid info: Unix process pid: 4402, image: oracle@ora10g (m000)

    Short stack dump:

    自此,把注意力转移至SMON进程,首先检查SMON的trace文件: /u01/app/oracle/admin/ora10g/bdump/ora10g_smon_3048.trc

    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

    With the Partitioning, OLAP, Data Mining and Real Application Testing options

    ORACLE_HOME = /u01/app/oracle/product/10.2.0

    System name: Linux

    Node name: ora10g

    Release: 2.6.18-194.el5

    Version: #1 SMP Tue Mar 16 21:52:39 EDT 2010

    Machine: x86_64

    Instance name: ora10g

    Redo thread mounted by this instance: 1

    Oracle process number: 8

    Unix process pid: 3048, image: oracle@ora10g (SMON)

    *** 2015-03-21 17:34:17.611

    *** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:34:17.611

    *** SESSION ID:(164.1) 2015-03-21 17:34:17.611

    Waited for detached process: CKPT for 300 seconds:

    *** 2015-03-21 17:34:17.611

    Dumping diagnostic information for CKPT:

    OS pid = 3046

    loadavg : 0.00 0.00 0.00

    memory info: free memory = 0.00M

    swap info: free = 0.00M alloc = 0.00M total = 0.00M

    F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

    0 S oracle30461 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g

    [Thread debugging using libthread_db enabled]

    0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #1 0x0000000003be9b09 in sskgpwwait ()

    #2 0x0000000003bccdf0 in skgpwwait ()

    #3 0x0000000000855f4a in ksdxsus ()

    #4 0x0000000000857118 in ksdxffrz ()

    #5 0x0000000000853003 in ksdxcb ()

    #6 0x0000000001ebd0cf in sspuser ()

    #7

    #8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #9 0x0000000003be9b09 in sskgpwwait ()

    #10 0x0000000003bccdf0 in skgpwwait ()

    #11 0x0000000000798319 in kslwaitns_timed ()

    #12 0x00000000008c3b1d in kskthbwt ()

    #13 0x0000000000797e54 in kslwait ()

    #14 0x00000000029f3f3b in ksarcv ()

    #15 0x000000000082e8bf in ksbabs ()

    #16 0x0000000000835822 in ksbrdp ()

    #17 0x0000000002f4d840 in opirip ()

    #18 0x000000000132b016 in opidrv ()

    #19 0x0000000001eb3146 in sou2o ()

    #20 0x0000000000723245 in opimai_real ()

    #21 0x00000000007230fc in main ()

    A debugging session is active.

    Inferior 1 [process 3046] will be detached.

    Quit anyway? (y or n) [answered Y; input not from terminal]

    *** 2015-03-21 17:34:19.125

    *** 2015-03-21 17:34:29.129

    Waited for detached process: CKPT for 310 seconds:

    *** 2015-03-21 17:34:29.129

    Dumping diagnostic information for CKPT:

    OS pid = 3046

    loadavg : 0.07 0.02 0.00

    memory info: free memory = 0.00M

    swap info: free = 0.00M alloc = 0.00M total = 0.00M

    F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

    0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g

    [Thread debugging using libthread_db enabled]

    0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #1 0x0000000003be9b09 in sskgpwwait ()

    #2 0x0000000003bccdf0 in skgpwwait ()

    #3 0x0000000000855f4a in ksdxsus ()

    #4 0x0000000000857118 in ksdxffrz ()

    #5 0x0000000000853003 in ksdxcb ()

    #6 0x0000000001ebd0cf in sspuser ()

    #7

    #8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #9 0x0000000003be9b09 in sskgpwwait ()

    #10 0x0000000003bccdf0 in skgpwwait ()

    #11 0x0000000000798319 in kslwaitns_timed ()

    #12 0x00000000008c3b1d in kskthbwt ()

    #13 0x0000000000797e54 in kslwait ()

    #14 0x00000000029f3f3b in ksarcv ()

    #15 0x000000000082e8bf in ksbabs ()

    #16 0x0000000000835822 in ksbrdp ()

    #17 0x0000000002f4d840 in opirip ()

    #18 0x000000000132b016 in opidrv ()

    #19 0x0000000001eb3146 in sou2o ()

    #20 0x0000000000723245 in opimai_real ()

    #21 0x00000000007230fc in main ()

    A debugging session is active.

    Inferior 1 [process 3046] will be detached.

    Quit anyway? (y or n) [answered Y; input not from terminal]

    *** 2015-03-21 17:34:29.895

    *** 2015-03-21 17:34:39.899

    Waited for detached process: CKPT for 320 seconds:

    *** 2015-03-21 17:34:39.899

    Dumping diagnostic information for CKPT:

    OS pid = 3046

    loadavg : 0.06 0.01 0.00

    memory info: free memory = 0.00M

    swap info: free = 0.00M alloc = 0.00M total = 0.00M

    F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

    0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g

    [Thread debugging using libthread_db enabled]

    0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #1 0x0000000003be9b09 in sskgpwwait ()

    #2 0x0000000003bccdf0 in skgpwwait ()

    #3 0x0000000000855f4a in ksdxsus ()

    #4 0x0000000000857118 in ksdxffrz ()

    #5 0x0000000000853003 in ksdxcb ()

    #6 0x0000000001ebd0cf in sspuser ()

    #7

    #8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

    #9 0x0000000003be9b09 in sskgpwwait ()

    #10 0x0000000003bccdf0 in skgpwwait ()

    #11 0x0000000000798319 in kslwaitns_timed ()

    #12 0x00000000008c3b1d in kskthbwt ()

    #13 0x0000000000797e54 in kslwait ()

    #14 0x00000000029f3f3b in ksarcv ()

    #15 0x000000000082e8bf in ksbabs ()

    #16 0x0000000000835822 in ksbrdp ()

    #17 0x0000000002f4d840 in opirip ()

    #18 0x000000000132b016 in opidrv ()

    #19 0x0000000001eb3146 in sou2o ()

    #20 0x0000000000723245 in opimai_real ()

    #21 0x00000000007230fc in main ()

    A debugging session is active.

    Inferior 1 [process 3046] will be detached.

    Quit anyway? (y or n) [answered Y; input not from terminal]

    *** 2015-03-21 17:34:40.598

    从日志文件看出,SMON进程在等待与CKPT进程通信,难道CKPT进程异常吗?再将注意力转移至CKPT,查看CKPT的trace文件: *** 2015-03-21 17:26:15.972

    *** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:26:15.972

    *** SESSION ID:(165.1) 2015-03-21 17:26:15.972

    Received ORADEBUG command 'suspend' from process Unix process pid: 3062, image: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<问题出在这里!!!!!

    Received ORADEBUG command 'tracefile_name' from process Unix process pid: 3062, image:

    自此,我们就明白了事情的缘由经过:

    1. 由于CKPT进程被挂起,使得检查点无法完成、SMON进程长时间持有资源Enqueue CI-00000001-00000005,最后导致数据库HANG住;2. 随着资源Enqueue CI-00000001-00000005被SMON进程持有,导致一系列等待事件library cache lock,enq: CI - contention,log file switch (checkpoint incomplete)的出现

    重新把CKPT进程恢复:

    SQL> oradebug setospid 3046

    Oracle pid: 7, Unix process pid: 3046, image: oracle@ora10g (CKPT)

    SQL> oradebug resume;

    Statement processed. SQL> select sid,event,p1,p2,p3,wait_time,seconds_in_wait,state from v$session_wait where wait_class <> 'Idle';

    SID EVENT P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT STATE

    ---------- ---------------------------------------------------------------- ---------- ---------- ---------- ---------- --------------- -------------------

    到此,数据库恢复正常

最新文章