PMON failed to acquire latch
Here was the case, the database was hanging caused by some heavy loading processes. Users complained about the high waiting sessions and there's an error showed in the alert log. Apparently, PMON failed to acquire latch for some reasons:
PMON failed to acquire latch, see PMON dump
And I saw one particular process was taking 1 entire CPU-time out of 16 processors total, and the process is an Oracle shared server process (i.e, via virtual circuit). I suspected that it's the process which caused the error. So I decided to identify the heavy loading SQL statement of the process by this approach:
How to Monitor Real-Time Top SQL on CPU Consuming
The identified SQL was like this:
SELECT * FROM V$OPEN_CURSOR WHERE SID = :B
As we can see, the user was watching and listing the open cursors of someone's session for some purposes. In general, this statement will impact performance a little bit, but it won't be so dramatic like this case except that the shared pool is too fragmented by a lot of trivial open cursors. This could cause PMON failed to acquire latch.
Actions
I have no real cures on this, but here are some actions I recommend and they are listed from low cost to high cost:
- Kill the session identified. The action could have no effect because PMON was already in trouble.
- Kill the shared server process. Sometimes, the shared server process was stuck in an unkown loop or infinite waiting. We can kill the shared server process at OS-level, such as use kill -9 pid immediately.
- Flush shared pool. This operation might take several minutes to complete, please be patient with it.
- Shutdown abort. If the database was not responsive to the above actions, you can try shutdown abort to make the service terminated on the spot.
- Refuse such AP behavior. The above actions can only buy you few hours of peace-time, you should told AP development to stop their inproper behaviors during busy-hours.
SQL> alter system flush shared_pool;
System altered.
Please note that, the data of dynamic views (V$XXX) comes from memory. That is, the first impacted resource will be memory when you query a dynamic view.
I suspect this error "PMON failed to acquire latch, see PMON dump" only appear in 9i, especially below 9.2.0.6.
but it won’t be so dramatic like this case except that the shared pool is too fragmented by a lot of trivial open cursors
Yes, you’re right. That’s why I used ALTER SYSTEM FLUSH SHARED_POOL to clean out unused cursors that made memory fragmented.