Get webhook notifications whenever Network & Infrastructure creates an incident, updates an incident, resolves an incident or changes a component status.
Nous avons détecté sur chaque sup active des gra-g1/g2-n7 une anomalie : les 2 cartes mémoires sont en erreur ( raid dead )
C'est également le cas sur 1 carte mémoire de la sup Standby sur gra-g1-n7 ( raid degraded)
gra-g2-n7# show system internal raid | i i cmos|block | head line 5
RAID data from CMOS = 0xa5 0xc3 <<<<
77888 blocks [2/1] [_U]
78400 blocks [2/1] [_U]
39424 blocks [2/1] [_U]
1802240 blocks [2/1] [_U]
gra-g2-n7# slot 6 show system internal raid | i i cmos|block | head line 5
RAID data from CMOS = 0xa5 0xf0 <<<<
77888 blocks [2/2] [UU]
78400 blocks [2/2] [UU]
39424 blocks [2/2] [UU]
1802240 blocks [2/2] [UU]
gra-g1-n7# show system internal raid | i i cmos|block | head line 5
RAID data from CMOS = 0xa5 0xc3
77888 blocks [2/1] [_U]
78400 blocks [2/1] [_U]
39424 blocks [2/1] [_U]
1802240 blocks [2/1] [_U]
gra-g1-n7# slot 6 show system internal raid | i i cmos|block | head line 5
RAID data from CMOS = 0xa5 0xe1
77888 blocks [2/1] [_U]
78400 blocks [2/1] [_U]
39424 blocks [2/1] [_U]
1802240 blocks [2/1] [_U]
gra-g1-n7# exit
Ceci est un bug connu chez Cisco qui touche les sup2E ( CSCus22805 )
Il existe un fix: il faut faire runner un tool cisco qui va réparer le raid.
Pour ce faire, nous avons besoin d'avoir au moins l'une des 2 cartes mémoire de UP sur la sup.
Nous allons, a partir de minuit, faire un switchover sur la sup standby de gra-g2-n7 => pas de downtime
En faisant le switchover, l'ancienne sup active reload, nous pensons récupérer au moins une carte mem pour lancer le tool cisco.
une fois fixé => switchover de nouveau pour revenir dans le même état initial avant la maintenance.
Nous procéderons de la même façon pour le gra-1-n7
Dans le cas ou le switchover ne permet pas de récupérer une des 2 cartes mémoire, nous procéderons a un RMA de la sup
Update(s):
Date: 2016-03-10 00:50:08 UTC gra-g1-n7# sh system redundancy status
Redundancy mode
---------------
administrative: HA
operational: HA
This supervisor (sup-1)
-----------------------
Redundancy state: Active
Supervisor state: Active
Internal state: Active with HA standby
Other supervisor (sup-2)
------------------------
Redundancy state: Standby
Supervisor state: HA standby
Internal state: HA standby
tout est okay ! nous pouvons reprendre une activité normale
Date: 2016-03-10 00:45:25 UTC ser Access Verification
2016 Mar 10 02:38:40 gra-g1-n7 %$ VDC-1 %$ %USBHSD-2-MOUNT: logflash: online
gra-g1-n7(standby) login: 2016 Mar 10 02:45:10 gra-g1-n7 %$ VDC-1 %$ Mar 10 02:45:10 %KERN-2-SYSTEM_MSG: [ 489.298936] Switchover started by redundancy driver - kernel
2016 Mar 10 02:45:10 gra-g1-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).
2016 Mar 10 02:45:10 gra-g1-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 5 is becoming active.
2016 Mar 10 02:45:11 gra-g1-n7 %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
User Access Verification
gra-g1-n7 login:
User Access Verification
Date: 2016-03-10 00:44:54 UTC gra-g1-n7# slot 5 show system internal raid
RAID data from CMOS = 0xa5 0xf0
RAID data from driver disks 0 bad 0 name
Current RAID status info:
Bootflash: /dev/sdc
Mirrorflash: /dev/sdd
md5 : active raid1 sdc5[0] sdd5[1]
78400 blocks [2/2] [UU]
md4 : active raid1 sdc4[0] sdd4[1]
39424 blocks [2/2] [UU]
md3 : active raid1 sdc3[0] sdd3[1]
1802240 blocks [2/2] [UU]
gra-g1-n7# slot 6 show system internal raid
RAID data from CMOS = 0xa5 0xf0
RAID data from driver disks 2 bad 1 name sdc3
Current RAID status info:
Bootflash: /dev/sdc
Mirrorflash: /dev/sdb
md5 : active raid1 sdc5[0] sdb5[1]
78400 blocks [2/2] [UU]
md4 : active raid1 sdc4[0] sdb4[1]
39424 blocks [2/2] [UU]
md3 : active raid1 sdc3[0] sdb3[1]
1802240 blocks [2/2] [UU]
Date: 2016-03-10 00:42:39 UTC gra-g1-n7# slot 6 show system internal raid
RAID data from CMOS = 0xa5 0xf0
RAID data from driver disks 2 bad 1 name sdc3
Current RAID status info:
Bootflash: /dev/sdc
Mirrorflash: /dev/sdb
md3 : active raid1 sdc3[2] sdb3[1]
1802240 blocks [2/1] [_U]
[==>..................] recovery = 11.1% (201152/1802240) finish=2.2min s
peed=11832K/sec
gra-g1-n7# slot 5 show system internal raid
RAID data from CMOS = 0xa5 0xf0
RAID data from driver disks 0 bad 0 name
Current RAID status info:
Bootflash: /dev/sdc
Mirrorflash: /dev/sdd
bootflash:n7000-s2-flash-recovery-tool.10.0.2.gbin
bootflash:n7000-s2-flash-recovery-tool.10.0.2.tar.gz
gra-g1-n7# load bootflash:n7000-s2-flash-recovery-tool.10.0.2.gbin
Loading plugin version 10.0(2)
###############################################################
Warning: debug-plugin is for engineering internal use only!
For security reason, plugin image has been deleted.
###############################################################
INFO: Running on active slot 6, checking if a ha-standby is available...
INFO: Standby present in slot 5. Copying the recovery tool...
###############################################################
Warning: debug-plugin is for engineering internal use only!
For security reason, plugin image has been deleted.
###############################################################
INFO: Running on the standby in slot 5, Checking RAID status...
INFO: Primary=sdc(sdc) Secondary=sdd(sdd) Working=sdd
WARNING: Attempting recovery of primary device sdc
INFO: Removing /dev/sdc from RAID configuration...
INFO: Resetting primary flash...
INFO: Found primary device sdc in 9 seconds.
INFO: Running health checks on the recovered device /dev/sdc...
INFO: Basic I/O tests passed. /dev/sdc looks healthy and responsive.
INFO: Verifying RAID configuration. Got primary=sdc Secondary=sdd
INFO: Adding sdc3 back into md3 RAID configuration...
INFO: sdd3 is already a part of md3.
INFO: Adding sdc4 back into md4 RAID configuration...
INFO: sdd4 is already a part of md4.
INFO: Adding sdc5 back into md5 RAID configuration...
INFO: sdd5 is already a part of md5.
INFO: Adding sdc6 back into md6 RAID configuration...
INFO: sdd6 is already a part of md6.
INFO: Resetting RAID status in CMOS...
WARNING: Flash recovery attempted on module 5.
INFO: A detailed copy of the this log was saved as volatile:flash_repair_log_mod5.tgz.
INFO: Recovery procedures complete on module 5.
INFO: Please check for any errors in previous messages.
INFO: Run 'show system internal file /proc/mdstat' and check 'up status' [UU] for all disks.
INFO: Run 'show diagnostic result module ' on all available supervisor slots.
INFO: And restart CompactFlash test (7) instances if not in running state.
Loading plugin version 10.0(2)
INFO: Now starting the flash recovery procedures on active.
INFO: Primary=sdc(sdc) Secondary=sdb(sdb) Working=sdb
WARNING: Attempting recovery of primary device sdc
INFO: Removing /dev/sdc from RAID configuration...
INFO: Resetting primary flash...
INFO: Found primary device sdc in 9 seconds.
INFO: Running health checks on the recovered device /dev/sdc...
INFO: Basic I/O tests passed. /dev/sdc looks healthy and responsive.
INFO: Verifying RAID configuration. Got primary=sdc Secondary=sdb
INFO: Adding sdc3 back into md3 RAID configuration...
INFO: sdb3 is already a part of md3.
INFO: Adding sdc4 back into md4 RAID configuration...
INFO: sdb4 is already a part of md4.
INFO: Adding sdc5 back into md5 RAID configuration...
INFO: sdb5 is already a part of md5.
INFO: Adding sdc6 back into md6 RAID configuration...
INFO: sdb6 is already a part of md6.
INFO: Resetting RAID status in CMOS...
WARNING: Flash recovery attempted on module 6.
INFO: A detailed copy of the this log was saved as volatile:flash_repair_log_mod6.tgz.
INFO: Recovery procedures complete on module 6.
INFO: Please check for any errors in previous messages.
INFO: Run 'show system internal file /proc/mdstat' and check 'up status' [UU] for all disks.
INFO: Run 'show diagnostic result module ' on all available supervisor slots.
INFO: And restart CompactFlash test (7) instances if not in running state.
gra-g1-n7#
gra-g1-n7#
Date: 2016-03-10 00:33:27 UTC gra-g1-n7#
gra-g1-n7# out-of-service module 5
gra-g1-n7# 2016 Mar 10 02:33:09 gra-g1-n7 %$ VDC-1 %$ %PLATFORM-2-MOD_PWRDN: Module 5 powered down (Serial number )
Date: 2016-03-10 00:32:52 UTC ah bah non...
nmounting file systems...
Making partitions on physical devices...
Partitioning of /dev/sdb failed
Copying saved files back to bootflash...
/dev/md3: Invalid argument
mount: /dev/md3: can't read superblock
Failed to mount bootflash.
Checking obfl filesystem.r
Checking all filesystems..... done.
Starting mcelog daemon
##############################################################
Disk initialization can take over 5 minutes. To avoid interruption,
please run 'system standby manual-boot' on active supervisor
##############################################################
Initializing the system...
Unmounting file systems...
Stopping RAID services...
Making partitions on physical devices...
INIT: Sending processes the TERM signal
INIT: Sending processes the KILL signal
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac Copyright (c) 2002-2013, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php switch(boot)#
power off / power on
Date: 2016-03-10 00:31:57 UTC la sup semble booter.
Reset Reason Registers: 0x1 0x0
Filesystem type is ext2fs, partition type 0x83
GNU GRUB version 0.97
Autobooting bootflash:/n7000-s2-kickstart.6.2.6.bin bootflash:/n7000-s2-dk9.6.2
.6.bin...
Filesystem type is ext2fs, partition type 0x83
Booting kickstart image: bootflash:/n7000-s2-kickstart.6.2.6.bin....
...............................................................................
..........................................
Kickstart digital signature verification Successful
Image verification OK
INIT: version 2/etc/rc.d/rcS.d/S05usb-devs-init: line 273: 2080 Segmentation fault mdadm --assemble /dev/md$dev --verbose $boot_node$dev $boot_mir_node$dev >> /var/log/mdstat.boot 2>&1
RAID assembly failed. Stopping all RAID partitions...
Trying to mount bootflash /dev/sdd3...
Mounted primary /dev/sdd3 as /bootflash
Existing bootflash found, saving files...
Saving 20130124_025211_poap_6564_init.log
Saving 20130124_060701_poap_7064_init.log
Saving 20130321_130854_poap_6973_init.log
Date: 2016-03-10 00:31:33 UTC ser Access Verification
gra-g1-n7(standby) login: 2016 Mar 10 02:30:21 gra-g1-n7 %$ VDC-1 %$ Mar 10 02:30:21 %KERN-2-SYSTEM_MSG: [64030798.995927] Switchover started by redundancy driver - kernel
2016 Mar 10 02:30:22 gra-g1-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).
2016 Mar 10 02:30:22 gra-g1-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 6 is becoming active.
2016 Mar 10 02:30:22 gra-g1-n7 %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
User Access Verification
gra-g1-n7 login:
User Access Verification
Date: 2016-03-10 00:23:19 UTC nous passons a gra-g1-n7
Date: 2016-03-10 00:16:32 UTC ra-g2-n7(standby) login: 2016 Mar 10 02:16:10 gra-g2-n7 %$ VDC-1 %$ Mar 10 02:16:10 %KERN-2-SYSTEM_MSG: [ 604.955502] Switchover started by redundancy driver - kernel
2016 Mar 10 02:16:10 gra-g2-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).
2016 Mar 10 02:16:10 gra-g2-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 5 is becoming active.
2016 Mar 10 02:16:11 gra-g2-n7 %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
User Access Verification
gra-g2-n7 login:
User Access Verification
gra-g2-n7 login:
Date: 2016-03-10 00:15:57 UTC ca semble okay
nous faisons le switchover dans l'autre sens
Date: 2016-03-10 00:15:03 UTC apres un poweroff / no poweroff, la sup boot de nouveau, youpi !
Autobooting bootflash:/n7000-s2-kickstart.6.2.6.bin bootflash:/n7000-s2-dk9.6.2
.6.bin...
Filesystem type is ext2fs, partition type 0x83
Booting kickstart image: bootflash:/n7000-s2-kickstart.6.2.6.bin....
...............................................................................
..........................................
Kickstart digital signature verification Successful
Image verification OK
INIT: version 2Checking obfl filesystem.
Checking all filesystems..r.r.r.r done.
Starting mcelog daemon
r/bootflash//n7000-s2-dk9.6.2.6.bin read done
Loading system software
System image digital signature verification successful.
Uncompressing system image: bootflash:/n7000-s2-dk9.6.2.6.bin Thu Mar 10 02:07:10 CEST 2016
blogger: nothing to do.
C
nous avons recup l'une des 2 flash, on lance le tool cisco !
ra-g2-n7# load bootflash:n7000-s2-f
bootflash:n7000-s2-flash-recovery-tool.10.0.2.gbin
bootflash:n7000-s2-flash-recovery-tool.10.0.2.tar.gz
gra-g2-n7# load bootflash:n7000-s2-flash-recovery-tool.10.0.2.gbin
Loading plugin version 10.0(2)
###############################################################
Warning: debug-plugin is for engineering internal use only!
For security reason, plugin image has been deleted.
###############################################################
INFO: Running on active slot 6, checking if a ha-standby is available...
INFO: Standby present in slot 5. Copying the recovery tool...
###############################################################
Warning: debug-plugin is for engineering internal use only!
For security reason, plugin image has been deleted.
###############################################################
INFO: Running on the standby in slot 5, Checking RAID status...
INFO: Primary=sdd(sdd) Secondary=sdc(sdc) Working=sdc
WARNING: Attempting recovery of primary device sdd
INFO: Removing /dev/sdd from RAID configuration...
INFO: Resetting primary flash...
INFO: Found primary device sdd in 9 seconds.
INFO: Running health checks on the recovered device /dev/sdd...
INFO: Basic I/O tests passed. /dev/sdd looks healthy and responsive.
INFO: Verifying RAID configuration. Got primary=sdd Secondary=sdc
INFO: Adding sdd3 back into md3 RAID configuration...
INFO: sdc3 is already a part of md3.
INFO: Adding sdd4 back into md4 RAID configuration...
INFO: sdc4 is already a part of md4.
INFO: Adding sdd5 back into md5 RAID configuration...
INFO: sdc5 is already a part of md5.
INFO: Adding sdd6 back into md6 RAID configuration...
INFO: sdc6 is already a part of md6.
INFO: Resetting RAID status in CMOS...
WARNING: Flash recovery attempted on module 5.
INFO: A detailed copy of the this log was saved as volatile:flash_repair_log_mod5.tgz.
INFO: Recovery procedures complete on module 5.
INFO: Please check for any errors in previous messages.
INFO: Run 'show system internal file /proc/mdstat' and check 'up status' [UU] for all disks.
INFO: Run 'show diagnostic result module ' on all available supervisor slots.
INFO: And restart CompactFlash test (7) instances if not in running state.
Loading plugin version 10.0(2)
INFO: Now starting the flash recovery procedures on active.
INFO: Both disks are found to be healthy.
INFO: Verifying RAID configuration. Got primary=sdc Secondary=sdb
INFO: RAID device md3 is healthy.
INFO: RAID device md4 is healthy.
INFO: RAID device md5 is healthy.
INFO: RAID device md6 is healthy.
INFO: No recovery was attempted on module 6. All flashes left intact.
INFO: A detailed copy of the this log was saved as volatile:flash_repair_log_mod6.tgz.
INFO: Recovery procedures complete on module 6.
INFO: Please check for any errors in previous messages.
INFO: Run 'show system internal file /proc/mdstat' and check 'up status' [UU] for all disks.
INFO: Run 'show diagnostic result module ' on all available supervisor slots.
INFO: And restart CompactFlash test (7) instances if not in running state.
gra-g2-n7#
gra-g2-n7#
gra-g2-n7#
gra-g2-n7# slot 5 show system internal raid | i i cmos|block | head line 5
RAID data from CMOS = 0xa5 0xf0
77888 blocks [2/1] [_U]
78400 blocks [2/1] [_U]
39424 blocks [2/1] [_U]
1802240 blocks [2/1] [_U]
gra-g2-n7#
gra-g2-n7#
gra-g2-n7# slot 5 show system internal raid | i i cmos|block | head line 5
RAID data from CMOS = 0xa5 0xf0
77888 blocks [2/2] [UU]
78400 blocks [2/2] [UU]
39424 blocks [2/2] [UU]
1802240 blocks [2/2] [UU]
gra-g2-n7#
gra-g2-n7#
gra-g2-n7#
Date: 2016-03-09 23:36:37 UTC La sup est morte, nous lancons un RMA
Date: 2016-03-09 23:22:44 UTC Unmounting file systems...
Making partitions on physical devices...
Partitioning of /dev/sdb failed
Copying saved files back to bootflash...
/dev/md3: Invalid argument
mount: /dev/md3: can't read superblock
Failed to mount bootflash.
Checking obfl filesystem.r
Checking all filesystems..... done.
Starting mcelog daemon
##############################################################
Disk initialization can take over 5 minutes. To avoid interruption,
please run 'system standby manual-boot' on active supervisor
##############################################################
Initializing the system...
Unmounting file systems...
Stopping RAID services...
Making partitions on physical devices...
Telling INIT to Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac Copyright (c) 2002-2013, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php switch(boot)#
Date: 2016-03-09 23:20:12 UTC 2-n7(standby)# 2016 Mar 10 01:19:32 gra-g2-n7 %$ VDC-1 %$ Mar 10 01:19:32 %KERN-2-SYSTEM_MSG: [64027168.155478] Switchover started by redundancy driver - kernel
2016 Mar 10 01:19:32 gra-g2-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).
2016 Mar 10 01:19:32 gra-g2-n7 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 6 is becoming active.
2016 Mar 10 01:19:33 gra-g2-n7 %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
gra-g2-n7#
gra-g2-n7# sh module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 48 1/10 Gbps Ethernet Module N7K-F248XP-25E ok
2 48 1/10 Gbps Ethernet Module N7K-F248XP-25E ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N7K-SUP2E active *
Date: 2016-03-09 23:15:14 UTC Nous allons débuter l'intervention d'ici qq minutes sur le gra-g2-n7
Posted Mar 09, 2016 - 16:35 UTC
This scheduled maintenance affected: Infrastructure || GRA (GRA1, GRA2, GRA3).