Friday 13 November 2015

Virtual Machine Component Protection


  • vSphere 6.0 introduces a powerful new feature as part of vSphere HA called VM Component Protection (VMCP). 
  • VMCP allows HA to respond to a scenario where the connection to the virtual machine datastore is impacted temporary or permanently.
  • It protects virtual machines from storage related events, specifically Permanent Device Loss (PDL) and All Paths Down (APD) incidents.

  • PDL occurs when the storage array issues a SCSI Sense code indicating that device is unavailable.SCSI Sense Code
  • When PDL state is detected host will stop sending I/O requests to the array as it considers the device permanently not available so no reason to issue I/O to device.
  • An unplanned PDL occurs when storage device is unexpectedly unpresented from the storage array without the unmount and detach being executed on the Esxi host.
  • Follow an industry standard maintained by technical committee T10, which is part of international committee on IT standards, all storage arrays that communicate with Esxi host conform to this standard.
  • In the vmkernel.log system log file from an ESXi 5.0 host, you see entries similar to.
  • 2011-04-04T21:07:30.257Z cpu2:2050)ScsiDeviceIO: 2315: Cmd(0x4124003edb00) 0x12, CmdSN 0x51 to dev "naa.600508e000000000c9f6baa7c19f6900" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.Mar 9 23:53:24 esx405 vmkernel: 2:14:39:54.069 cpu3:4300)ScsiDeviceIO: 1688: Command 0x28 to device "naa.60000970000292600219533031453245" failed H:0x1 D:0x0 P:0x3 Possible sense data: 0x0 0x0 0x0.
  • Example of PDL is a failed LUN or an admin inadvertently removing WWN from the zone configuration. 

  • A situation which occurs when a storage device is removed from Esxi host in an uncontrolled manner either due to admin error or device failure.
  • IF PDL sense codes are not returned from a device than device is in an APD state and Esxi continues to send I/O to request until it receives a response.
  • The APD situation needs to be resolved at the storage array/fabric layer to restore connectivity to host.

Refer Below KB Articles for More information.

No comments:

Post a Comment