ORA-3136
At times, some users complained about connection timeout, and you saw error messages ORA-3136 in sqlnet.log as such:
Fatal NI connect error 12170.
ns main err code: 12535
ns secondary err code: 12606
nt OS err code: 0
VERSION INFORMATION:
...
TNS-12535 TNS:operation timed out
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=X.X.X.X)(PORT=XXX))
nt secondary err code: 0
Tracing not turned on.
WARNING: inbound connection timed out (ORA-3136)
If you got the ORA-3136 accompanied with TNS-12535 and ORA-12170. It's possibly caused by two factors, one is network traffic, the other is database heavy loading.
Luckily, the sqlnet.log logged the client connection data containing IP address for you to trace back the network condition at that time. Then, you can compare network response time during peak windows and off-peak windows to judge whether the network condition was normal or not.
Solution to ORA-3136
Heavy loading of a database cannot be easily solved in one second, but we can mitigate the complaining by increase INBOUND_CONNECT_TIMEOUT_LISTENER, the inbound timeout limit (in seconds).
Here are the steps:
Add Parameter to listener.ora
The default value of INBOUND_CONNECT_TIMEOUT_LISTENER is 60 seconds, we can raise the value to 160 seconds.
Single-instance Databases
We should add the parameter at oracle-level.
$ echo "INBOUND_CONNECT_TIMEOUT_LISTENER = 160" >> $ORACLE_HOME/network/admin/listener.ora
RAC Database
We should add the parameter at grid-level on all nodes.
$ echo "INBOUND_CONNECT_TIMEOUT_LISTENER = 160" >> $GRID_HOME/network/admin/listener.ora
Add Parameter to sqlnet.ora
$ echo "SQLNET.INBOUND_CONNECT_TIMEOUT = 180" >> $ORACLE_HOME/network/admin/sqlnet.ora
The value in sqlnet.ora should be slightly larger than the value in listener.ora, because database authentication needs extra time to do.
Restart the listener
$ lsnrctl stop
$ lsnrctl start
RAC Database
$ srvctl stop listener
$ srvctl start listener
Validate the Result
Then, we can validate the new setting by using telnet, the telnet connection will test open port 1521 and be timed out after 160 seconds, which is 2 minutes and 40 seconds:
$ date; telnet 10.10.10.10 1521; date
Thu Oct 11 11:06:20 CST 2012
Trying 192.168.0.21...
Connected to primary01.example.com (10.10.10.10).
Escape character is '^]'.
Connection closed by foreign host.
Thu Oct 11 11:09:00 CST 2012
The IP address above can be replaced with yours.
Notice that, the higher value you set, the higher security risk you take, it may be taken as a possible leak to DOS (Denial of Service) attack.
Nada melhor do que identificar a causa raiz!
Apos alterar os parametros de TIMEOUT já descrito, e ver que o erro persistia,
realizado a ativação do log com mais detalhe para identificar o IP da maquina que está gerando o problema, incluindo os parametros abaixo no SQLNet.ora e reiniciando o listener.
LOG_DIRECTORY_CLIENT = “/ora/diag/rdbms/ora11gh/ORA11GH/trace”
LOG_FILE_CLIENT = “sqlnet.log”
E alterar o parametro DIAG_ADR_ENABLED=ON
No alert do trace foi coletado o IP da maquina e pelo CMD com o comando abaixo, identificamos o nome da maquina para falar com o responsavel.
ping -a 192.168.X.Y
Thanks for your input.