Objective 9.3 – Troubleshoot Common NSX Component Issues

Knowledge

  • Differentiate NSX Edge logging and troubleshooting commands
    • Logging
      • show log <last n|follow|reverse>
        • Display the system log, last n lines, follow the log, show the log in reverse order
    • Troubleshooting
      • debug packet capture Similar to tcpdump
      • debug packet display interface Similar to tcpdump but for specific interface
      • ping <interface> addr ICMP ping, optionally choose the interface
      • show …. Large number of show commands, eg, arp, configuration [interface|dhcp|firewall|ipsec|loadbalancer|nat|ospf|syslog], ip [bgp|ospf|route]
        Too many to list here, see NSX CLI Guide for more detail
  • Verify NSX Controller cluster status and roles
    • SSH to one of your controller VM to use the CLI
      # show control-cluster status
      Type                Status                                       Since
      --------------------------------------------------------------------------------
      Join status:        Join complete                                09/14 14:08:46
      Majority status:    Connected to cluster majority                09/18 08:45:16
      Restart status:     This controller can be safely restarted      09/18 08:45:06
      Cluster ID:         b20ddc88-cd62-49ad-b120-572c23108520
      Node UUID:          b20ddc88-cd62-49ad-b120-572c23108520
      # show control-cluster roles
                                Listen-IP  Master?    Last-Changed  Count
      api_provider         Not configured      Yes  09/18 08:45:17      6
      persistence_server              N/A      Yes  09/18 08:45:17      5
      switch_manager            127.0.0.1      Yes  09/18 08:45:17      6
      logical_manager                 N/A      Yes  09/18 08:45:17      6
      directory_server                N/A      Yes  09/18 08:45:17      6
  • Verify NSX Controller node connectivity
    • # show control-cluster connections
      role                port            listening open conns
      --------------------------------------------------------
      api_provider        api/443         Y         1
      --------------------------------------------------------
      persistence_server          server/2878               Y                   2
                          client/2888     Y         3
                          election/3888   Y         0
      --------------------------------------------------------
      switch_manager      ovsmgmt/6632    Y         0
                          openflow/6633   Y         0
      -------------------------------------------------------- 
      system              cluster/7777    Y         2

      The Controller cluster majority leader will be listening on port 2878 – other nodes have “-“ in the listening column. The number of “open conns” on the persistence server line should be the number of remaining nodes in the cluster eg 2 for a 3 node cluster.

  • Check NSX Controller API service
    • # show control-cluster connections
      role                port            listening open conns
      --------------------------------------------------------
      api_provider        api/443         Y         1
      --------------------------------------------------------
  • Validate VXLAN and Logical Router mapping tables
    • VXLAN From an ESXi host, use the esxcli command line
      #esxcli network vswitch dvs vmware vxlan network mac --vds-name <value> --vxlan-id value
       [--segment-id value --vtep-ip value]
      IP                  Segment ID        Is MTEP 
      192.168.0.2         192.168.0.0       False
    • Logical Router
      From the NSX controller

      show control-cluster logical-routers instance all

      This gives the LR instance IDs

      show control-cluster logical-routers interface-summary [instance ID] 
      Interface             Type        Id         IP[]
      lif0                  vlan        0          10.0.0.0/24
      lif1                  vlan        101        10.0.1.0/24
      lif2                  vxlan       5020       172.16.10.1/24
  • List Logical Router instances and statistics
    • List instances
      show control-cluster logical-routers instance all

      or

      show control-cluster logical-routers
    • Statistics
      show control-cluster logical-routers stats
  • Verify Logical Router interface and route mapping tables
    • # show control-cluster logical-routers interface-summary 1
      Interface Type Id IP[]
      lif0 vlan 0 10.0.0.0/24 
      lif1 vlan 1 10.0.1.0/24
    • # show control-cluster logical-routers routes 1
      LR-Id Destination Next-Hop
      1 70.70.70.0/24 10.0.1.2
      1 80.80.80.0/24 10.0.0.2
  • Verify active controller connections
    • # show control-cluster core stats
      messages.received 40
      messages.received.dropped 0
      messages.transmitted 22
      messages.transmit.dropped 0
      messages.processing.dropped 0
      connections.up 2
      connections.down 0
      connections.timeout 0
      connections.active 2
      connections.sharding.subscribed 0
  • View Bridge instances and learned MAC addresse
    • Dump bridge info
      # net-vdr --bridge -l <vdrName>
      
      VDR default+edge-1:1460487509 Bridge Information :
      
      Bridge config:
      Name:id             mybridge:1
      Portset name:
      DVS name:           Mgmt_Edge_VDS
      Ref count:          2
      Number of networks: 2
      Number of uplinks:  0
       
              Network 'vlan-100-type-bridging' config:
              Ref count:          2
              Network type:       1
              VLAN ID:            100
              VXLAN ID:           0
              Ageing time:        300
              Fdb entry hold time:1
              FRP filter enable:  1
      
                      Network port '50331655' config:
                      Ref count:          2
                      Port ID:            0x3000007
                      VLAN ID:            4095
                      IOChains installed: 0
      
              Network 'vxlan-5000-type-bridging' config:
              Ref count:          2
              Network type:       1
              VLAN ID:            0
              VXLAN ID:           5000
              Ageing time:        300
              Fdb entry hold time:1
              FRP filter enable:  1
      
                      Network port '50331655' config:
                      Ref count:          2
                      Port ID:            0x3000007
                      VLAN ID:            4095
                      IOChains installed: 0
    • Lists MAC table, learnt on both VXLAN and VLAN sides
      # net-vdr -b --mac default+edge-1
      
      VDR default+edge-1:1460487509 Bridge Information :
      
      Network 'vlan-100-type-bridging' MAC address table:
      MAC table on PortID:              0x0
      MAC table paging mode:            0
      Single MAC address enable:        0
      Single MAC address:               00:00:00:00:00:00
      MAC table last entry shown:       00:50:56:91:5e:93 VLAN-VXLAN: 100-0 Port: 50331661
      total number of MAC addresses:    1
      number of MAC addresses returned: 1
      MAC addresses:
      Destination Address  Address Type  VLAN ID  VXLAN ID  Destination Port  Age
      -------------------  ------------  -------  --------  ----------------  ---
      00:50:56:91:5e:93    Dynamic           100         0          50331661  0
       
      Network 'vxlan-5000-type-bridging' MAC address table:
      MAC table on PortID:              0x0
      MAC table paging mode:            0
      Single MAC address enable:        0
      Single MAC address:               00:00:00:00:00:00
      MAC table   last entry shown:       00:50:56:ae:9b:be VLAN-VXLAN: 0-5000 Port: 50331650
      total number of MAC addresses:    1
      number of MAC addresses returned: 1
      MAC addresses:
      Destination Address  Address Type  VLAN ID  VXLAN ID  Destination Port  Age
      -------------------  ------------  -------  --------  ----------------  ---
      00:50:56:ae:9b:be    Dynamic             0      5000          50331650  0
    • Display Logical Router instances
      • # net-vdr --instance -l
        
        VDR Instance Information :
        ---------------------------
        VDR Instance:               default+edge-1:1460487509
        Vdr Name:                   default+edge-1
        Vdr Id:                     1460487509
        Number of Lifs:             3
        Number of Routes:           1
        State:                      Enabled
        Controller IP:              192.168.110.201
        Control Plane Active:       Yes
        Control Plane IP:           192.168.110.52
        Edge Active:                Yes
    • Verify NSX Manager services status
      • Service status can be view through the NSX Manager Web Interface9.3.NSX Manager Status
    • View Logical Interfaces and routing tables
      • Logical interfacesFrom the CLI on an ESXi host
        # net-vdr --lif -l default+edge-1
        
        VDR default+edge-1:1460487509 LIF Information :
        
        Name:                570d45550000000c
        Mode:                Routing, Distributed, Internal
        Id:                  Vxlan:5004
        Ip(Mask):            10.10.10.1(255.255.255.0)
        Connected Dvs:       Mgmt_Edge_VDS
        VXLAN Control Plane: Enabled
        VXLAN Multicast IP:  0.0.0.1
        State:               Enabled
        Flags:               0x2288
        
        Name:                570d45550000000b
        Mode:                Bridging, Sedimented, Internal
        Id:                  Vlan:100
        Bridge Id:           mybridge:1
        Ip(Mask):            0.0.0.0(0.0.0.0)
        Connected Dvs:       Mgmt_Edge_VDS
        Designated Instance: No
        DI IP:               192.168.110.51
        State:               Enabled
        Flags:               0xd4
        
        Name:                570d45550000000a
        Mode:                Bridging, Sedimented, Internal
        Id:                  Vxlan:5000
        Bridge Id:           mybridge:1
        Ip(Mask):            0.0.0.0(0.0.0.0)
        Connected Dvs:       Mgmt_Edge_VDS
        VXLAN Control Plane: Enabled
        VXLAN Multicast IP:  0.0.0.1
        State:               Enabled
         Flags:               0x23d4
      • Routing
        # net-vdr -R -l default+edge-1
        
        VDR default+edge-1:1460487509 Route Table
        Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
        Legend: [H: Host], [F: Soft Flush] [!: Reject]
         
        Destination      GenMask          Gateway          Flags    Ref Origin   UpTime     Interface
        -----------      -------          -------          -----    --- ------   ------     ---------
         10.10.10.0       255.255.255.0    0.0.0.0          UCI      1   MANUAL   410777     570d45550000000c
    • Analyze NSX Edge statistics
      • Log in to the vSphere Web Client.
      • Click Networking & Security and then click NSX Edges.
      • Double-click an NSX Edge.
      • Click the Monitor tab.
      • Select the period for which you want to view the statistics.

Tools

  • NSX Administration Guide
  • NSX Command Line Interface Reference Guide
  • NSX API Guide
  • NSX Controller CLI
  • NSX Edge CLI
  • NSX API
  • vSphere Web Client
  • VDS Health Check
  • net-dvr
  • http://www.yet.org/2014/09/nsxv-troubleshooting/  – very useful for this section (it’s where I’ve pulled a lot of the above from)

One thought on “Objective 9.3 – Troubleshoot Common NSX Component Issues

  1. Pingback: VMware VCP-NV NSX Study Resources | darrylcauldwell.com

Leave a comment