Friday, 5 October 2018

EXAFUSION and IPCLW traces on EXADATA

Recently I had issue with extensive trace generation on new Exadata system. There was no ORA- errors in alert log and adrci show incidend command returns no rows but the common line in almost every trace was IPCLW line.

tail alert_GDPRD5U4.log
ARC1 (PID:111419): Archived Log entry 22126 added for T-4.S-5539 ID 0x1b27ae48 LAD:1
2018-10-01T13:34:54.117829+00:00
Thread 4 advanced to log sequence 5541 (LGWR switch)
  Current log# 41 seq# 5541 mem# 0: +RECOC1/GDPRD5U/ONLINELOG/group_41.26674.978267741
2018-10-01T13:34:54.262530+00:00
ARC2 (PID:111447): Archived Log entry 22130 added for T-4.S-5540 ID 0x1b27ae48 LAD:1
2018-10-01T14:04:55.583755+00:00
Thread 4 advanced to log sequence 5542 (LGWR switch)
  Current log# 42 seq# 5542 mem# 0: +RECOC1/GDPRD5U/ONLINELOG/group_42.26675.978267741
2018-10-01T14:04:55.684417+00:00
ARC3 (PID:111451): Archived Log entry 22134 added for T-4.S-5541 ID 0x1b27ae48 LAD:1
2018-10-01T14:34:56.770965+00:00
Thread 4 advanced to log sequence 5543 (LGWR switch)
  Current log# 43 seq# 5543 mem# 0: +RECOC1/GDPRD5U/ONLINELOG/group_43.26676.978267741
2018-10-01T14:34:56.870617+00:00
ARC4 (PID:111474): Archived Log entry 22138 added for T-4.S-5542 ID 0x1b27ae48 LAD:1
2018-10-01T15:04:54.979521+00:00
Thread 4 advanced to log sequence 5544 (LGWR switch)
  Current log# 44 seq# 5544 mem# 0: +RECOC1/GDPRD5U/ONLINELOG/group_44.26677.978267741
2018-10-01T15:04:55.096113+00:00
ARC5 (PID:111502): Archived Log entry 22142 added for T-4.S-5543 ID 0x1b27ae48 LAD:1
2018-10-01T15:34:56.429518+00:00
Thread 4 advanced to log sequence 5545 (LGWR switch)

$ adrci

ADRCI: Release 18.0.0.0.0 - Production on Thu Oct 4 08:48:08 2018

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/data/oracle/GDPRD5U/admin"
adrci> show incident

ADR Home = /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4:
*************************************************************************
0 rows fetched

The typical trace file look like that:

$ cat GDPRD5U4_ora_260599.trc
Trace file /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_260599.trc
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
Build label:    RDBMS_18.1CLOUD_LINUX.X64_180131.2
ORACLE_HOME:    /data/oracle/GDPRD5U/product/18.0
System name:    Linux
Node name:      longdpou0304.de.xx.com
Release:        4.1.12-94.7.8.el6uek.x86_64
Version:        #2 SMP Thu Jan 11 20:41:01 PST 2018
Machine:        x86_64
Storage:        Exadata
Instance name: GDPRD5U4
Redo thread mounted by this instance: 4
Oracle process number: 371
Unix process pid: 260599, image: oracle@longdpou0304.de.xx.com


*** 2018-10-04T08:03:09.003257+00:00
*** SESSION ID:(4481.8157) 2018-10-04T08:03:09.003271+00:00
*** CLIENT ID:() 2018-10-04T08:03:09.003275+00:00
*** SERVICE NAME:() 2018-10-04T08:03:09.003279+00:00
*** MODULE NAME:() 2018-10-04T08:03:09.003282+00:00
*** ACTION NAME:() 2018-10-04T08:03:09.003286+00:00
*** CLIENT DRIVER:() 2018-10-04T08:03:09.003289+00:00

IPCLW:[0.0]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e58440 pid:192.168.10.1:36015
IPCLW:[0.1]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e57138 pid:192.168.10.2:11979
IPCLW:[0.2]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e57790 pid:192.168.10.4:2568
IPCLW:[0.3]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e57de8 pid:192.168.10.6:48903
IPCLW:[0.4]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e56ae0 pid:192.168.10.6:15938


The RAC database (18c) was running on Full Rack Exadata X6-2 (High Capacity) on 18.1.4 EXA Image:

SELECT
 CAST(extract(xmltype(confval), '/cli-output/cell/name/text()') AS VARCHAR2(20))  cv_cellname
  , CAST(extract(xmltype(confval), '/cli-output/cell/releaseVersion/text()') AS VARCHAR2(20))  cv_cellVersion 
  , CAST(extract(xmltype(confval), '/cli-output/cell/kernelVersion/text()')  AS VARCHAR2(30))  kernel_version
  , CAST(extract(xmltype(confval), '/cli-output/cell/makeModel/text()')      AS VARCHAR2(50))  make_model
FROM 
    v$cell_config  
WHERE 
    conftype = 'CELL'
ORDER BY
    cv_cellname
/

CV_CELLNAME          CV_CELLVERSION       KERNEL_VERSION                 MAKE_MODEL
-------------------- -------------------- ------------------------------ --------------------------------------------------
longdpou03cel01      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel02      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel03      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel04      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel05      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel06      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel07      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel08      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel09      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel10      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel11      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel12      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel13      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci
longdpou03cel14      18.1.4.0.0.180125.3  4.1.12-94.7.8.el6uek.x86_64    Oracle Corporation ORACLE SERVER X6-2L High Capaci

14 rows selected.

SQL> select CDB from v$database;

CDB
---
NO

SQL> select INSTANCE_NAME,VERSION,STATUS,DATABASE_TYPE,HOST_NAME from gv$instance order by 1

INSTANCE_NAME    VERSION           STATUS       DATABASE_TYPE   HOST_NAME
---------------- ----------------- ------------ --------------- ----------------------------------------------------------------
GDPRD5U1         18.0.0.0.0        OPEN         RAC             longdpou0301.de.xx.com
GDPRD5U2         18.0.0.0.0        OPEN         RAC             longdpou0302.de.xx.com
GDPRD5U3         18.0.0.0.0        OPEN         RAC             longdpou0303.de.xx.com
GDPRD5U4         18.0.0.0.0        OPEN         RAC             longdpou0304.de.xx.com
       
Let's go back to the problem with traces. The was no tracing enabled on the instance/session level but still traces were generated. To check it, I used oradebug eventdump command:

SQL> oradebug setmypid
Statement processed.
SQL> oradebug eventdump session;
SQL> oradebug eventdump system;


If we look at line we can see that some inter-process  communication(IPCLW) happens(CNCT)(maybe connect) using a protocol (RDS) (and not sure but probably dumping connection header(ipclw_dump_cnh)).

IPCLW:[0.0]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e58440 pid:192.168.10.1:36015
IPCLW:[0.1]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e57138 pid:192.168.10.2:11979
IPCLW:[0.2]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e57790 pid:192.168.10.4:2568
IPCLW:[0.3]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e57de8 pid:192.168.10.6:48903
IPCLW:[0.4]{-}[CNCT]:RDS: [1538640189001518]ipclw_dump_cnh NO sbuf ctx:0x40003a3eb298 pt:0x400039e5c0e8 sndh:0x400039e56ae0 pid:192.168.10.6:1593

Ok, we have IPCLW and RDS, so this means some cluster communication over infiniband from node 4th with all nodes(GDPRD5U1 ip 192.168.10.1 and 192.168.10.2, GDPRD5U2 ip 192.168.10.4, GDPRD5U3 ip 192.168.10.6).

I didn't find anything on MOS about IPCLW but searching found a presentation from Anil Nair https://www.slideshare.net/AnilNair27/oracle-rac-features-on-exadata which explain the new feature on Exadata platform.
It's called EXAFUSION which is RAC Cache Fusion over Infiniband interconnect.

Here is a description:

A.7 What's New in Oracle Exadata Database Machine 12c Release 1 (12.1.2.1.0)
https://docs.oracle.com/cd/E80920_01/DBMSO/exadata-whatsnew.htm#DBMSO22059

A.7.10 Exafusion Direct-to-Wire Protocol
Exafusion Direct-to-Wire protocol allows database processes to read and send Oracle Real Applications Cluster (Oracle RAC) messages directly over the Infiniband network bypassing the overhead of entering the OS kernel, and running the normal networking software stack. This improves the response time and scalability of the Oracle RAC environment on Oracle Exadata Database Machine. Exafusion is especially useful for OLTP applications because per message overhead is particularly apparent in small OLTP messages.

Minimum software: Oracle Exadata Storage Server Software release 12.1.2.1.0 contains the OS, firmware, and driver support for Exafusion and Oracle Database software release 12.1.0.2.0 BP1

The EXAFUSION is using IPC LightWeight(IPCLW) component bypassing networking software stack and providing better performance which can make a difference in OLTP databases.

On the instance level we can set parameter to enable exafusion(Pls check MOS Note: EXAFUSION details and supported versions (Doc ID 2037900.1)):

SQL> show parameter exa

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
exafusion_enabled                    integer     1

Pls note that EXAFUSION is enabled by default in 12.2 but in 12.1 is still disabled. My database is already 18c so 12.2.0.2...
So let's do some test when this exafusion communication happens (and hopefully find out why the traces are generated). Check following:

SQL*Plus: Release 18.0.0.0.0 Production on Thu Oct 4 12:33:52 2018
Version 18.1.0.0.0

Copyright (c) 1982, 2017, Oracle.  All rights reserved.


Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0

SQL> set lines 300
SQL> col value for a100
---obtain trace file---
SQL> select value from v$diag_info where NAME='Default Trace File';

VALUE
----------------------------------------------------------------------------------------------------
/data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_292811.trc

---execute any query on local instance---
 SQL> select count(*) from v$instance;

  COUNT(*)

----------
         1
---check trace file content--- SQL> !cat  /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_292811.trc
cat: /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_292811.trc: No such file or directory

---trace file doesn't exits---
---so now execute a query which required nodes communication---
  SQL> select count(*) from gv$instance;

  COUNT(*)

----------
         4

---check trace file conntent---

SQL> !cat  /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_292811.trc
Trace file /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_292811.trc
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
Build label:    RDBMS_18.1CLOUD_LINUX.X64_180131.2
ORACLE_HOME:    /data/oracle/GDPRD5U/product/18.0
System name:    Linux
Node name:      longdpou0304.de.xx.com
Release:        4.1.12-94.7.8.el6uek.x86_64
Version:        #2 SMP Thu Jan 11 20:41:01 PST 2018
Machine:        x86_64
Storage:        Exadata
Instance name: GDPRD5U4
Redo thread mounted by this instance: 4
Oracle process number: 503
Unix process pid: 292811, image: oracle@longdpou0304.de.xx.com (TNS V1-V3)


*** 2018-10-04T12:35:43.330488+00:00
*** SESSION ID:(4482.2286) 2018-10-04T12:35:43.330510+00:00
*** CLIENT ID:() 2018-10-04T12:35:43.330514+00:00
*** SERVICE NAME:(SYS$USERS) 2018-10-04T12:35:43.330518+00:00
*** MODULE NAME:(sqlplus@longdpou0304.de.xx.com (TNS V1-V3)) 2018-10-04T12:35:43.330522+00:00
*** ACTION NAME:() 2018-10-04T12:35:43.330526+00:00
*** CLIENT DRIVER:(SQL*PLUS) 2018-10-04T12:35:43.330529+00:00

IPCLW:[0.0]{-}[CNCT]:RDS: [1538656543330188]ipclw_dump_cnh NO sbuf ctx:0x4000382eac80 pt:0x4000382e6a90 sndh:0x4000382c60e0 pid:192.168.10.6:13943
IPCLW:[0.1]{-}[CNCT]:RDS: [1538656543330563]ipclw_dump_cnh NO sbuf ctx:0x4000382eac80 pt:0x4000382e6a90 sndh:0x4000382c6738 pid:192.168.10.3:45496
IPCLW:[0.2]{-}[CNCT]:RDS: [1538656543330861]ipclw_dump_cnh NO sbuf ctx:0x4000382eac80 pt:0x4000382e6a90 sndh:0x4000382c6d90 pid:192.168.10.2:63990

VOILA! Trace file was generated ... but no tracing is enabled???!!!

SQL> oradebug setmypid
Statement processed.
SQL> oradebug eventdump session;
SQL> oradebug eventdump system;



I found the way to turn off tracing because it can be real problem to your databases as all processes are communicating via interconncect.
In that place the oradebug doc component is very useful as it give us the information about components and events.

SQL> oradebug doc component rdbms.VOS

  VOS                          VOS (ks)
    hang_analysis              Hang Analysis (ksdhng)
    background_proc            Background Processes (ksb, ksbt)
    system_param               System Parameters (ksp, kspt)
    ksu                        Kernel Service User (ksu)
      ksutac                   KSU Timeout Actions ((null))
    ksv_trace                  Kernel Services Slave Management (ksv)
    file                       File I/O (ksfd, ksfdaf)
    ksq                        Kernel Service Enqueues (ksq)
    ksolt_trace                Kernel Services Lightweight Threads (ksolt)
    KSIM                       Kernel Service Instance Management (ksim)
      KSIM_GRPOP               Kernel Service Instance Management Group Operation ((null))
    KSIPC                      VOS IPC (ksipc)
      KSMSQ                    Message queue services (ksmsq)
        KSMSQ_MQL              Message Queueing Layer ((null))
      KSRMA                    ksrma (ksrma)
      KSRMF                    ksrmf (ksrmf)
      KSIPC_AM                 Active Messaging ((null))
      KSIPC_GRP                KSIPC Group Services ((null))
      KSIPC_SN                 KSIPC Shared Nothing ((null))
      KSIPC_KV                 KSIPC Key Value ((null))
      KSIPC_TOPO               KSIPC Topology Services ((null))
      KSIPC_PR                 KSIPC Path Record ((null))
      KSIPC_IPCLW              IPC LightWeight ((null))
      KSIPC_IPCOR              IPC Core Functionality ((null))
      KSIPC_SHREG              KSIPC Shared Registration ((null))
    LREG                       Listener Registration (kml)
    ksupd                      KSU Planned Draining (ksupd)

Disabling EXAFUSION Trace can be done via:

ALTER SESSION SET EVENTS='trace[RDBMS.KSIPC_IPCLW] disk disable, memory disable';
ALTER SYSTEM SET EVENTS='trace[RDBMS.KSIPC_IPCLW] disk disable, memory disable';

Let's reapeat our test:

SQL*Plus: Release 18.0.0.0.0 Production on Thu Oct 4 13:32:14 2018
Version 18.1.0.0.0

Copyright (c) 1982, 2017, Oracle.  All rights reserved.


Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0

SQL> set lines 300
SQL> col value for a100
SQL> select value from v$diag_info where NAME='Default Trace File';

VALUE
----------------------------------------------------------------------------------------------------
/data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_212600.trc

SQL> select count(*) from v$instance;

  COUNT(*)
----------
         1

SQL> !cat /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_212600.trc
cat: /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_212600.trc: No such file or directory

SQL> select count(*) from gv$instance;

  COUNT(*)
----------
         4

SQL> !cat /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_212600.trc
cat: /data/oracle/GDPRD5U/admin/diag/rdbms/gdprd5u/GDPRD5U4/trace/GDPRD5U4_ora_212600.trc: No such file or directory

SQL> oradebug setmypidStatement processed.
SQL> oradebug eventdump session;
trace [RDBMS.KSIPC_IPCLW] disk disable, memory disable


No traces anymore.

Summary: The Exafusion is a new exadata feature which is enabled by default in 12.2 and the internal IPCLW tracing is ON