Hardware
Drive Replacement Procedures
ASSUMPTIONS
- Servers are manufactured by Sun Microsystems, running Solaris 2.6-10.
- Storage is manufactured by Sun Microsystems.
- JBOD is defined as Just a Bunch Of Disks. No hardware RAID. RAID is achieved using software such as Veritas Volume Manager or Solaris Volume Manager.
- HW-RAID is defined as any storage device that has a built in controller with RAID built in. No software RAID required.
- This procedure is intended to locate the proper drive for replacement, not how to replace said drive.
- This document is not intended as a replacement for understanding and documenting the system configuration.
- This document only covers servers/arrays that support Hot-Plugging.
- Any procedure that removes the drive from Solaris requires that any other applications that might be using the drive be disabled on that drive. For example, Veritas must have already disabled the drive, and all mounts must have been unmounted.
For additional information on any specific platform, see
http://sunsolve.sun.com/handbook_pub/
1. General Procedure for all storage other than HW-RAID
01. Obtain Serial Number of Disk drive from Solaris:
a. Determine which drive is failing per messages files or console messages.
Example Message: Mar 24 23:50:17 wpsz36 sendmail[29001]: [ID 801593 mail.info] k2P7oGp28967: to=\root, ctladdr=root (0/1), delay=00:00:01, xdelay=00:00:00, mailer=local, pri=120252, relay=local, dsn=2.0.0, stat=Sent Mar 25 04:54:22 wpsz36 scsi: [ID 107833 kern.warning] WARNING: /pci@6,4000/scsi@3,1/sd@2,0 (sd47): Mar 25 04:54:22 wpsz36 Error for Command: read(10) Error Level: Retryable Mar 25 04:54:22 wpsz36 scsi: [ID 107833 kern.notice] Requested Block: 2219209 Error Block: 2219217 Mar 25 04:54:22 wpsz36 scsi: [ID 107833 kern.notice] Vendor: FUJITSU Serial Number: 0145022958 Mar 25 04:54:22 wpsz36 scsi: [ID 107833 kern.notice] Sense Key: Media Error Mar 25 04:54:22 wpsz36 scsi: [ID 107833 kern.notice] ASC: 0x11 (<vendor unique code 0x11>), ASCQ: 0x1, FRU: 0x0
b. Using the example from a. above, determine which drive is failed/failing:
In This example, “sd47” is the failed/failing drive. Using the example from a. above determine the serial number using the iostat command in Solaris:
# iostat -E | more Search for “sd47”. From this you would find something similar to: sd47 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MAJ3364M SUN36G Revision: 5804 Serial No: 0145022958 RPM: 10025 Heads: 27 Size: 36.42GB <36418595328 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0
c. From here you can see that the serial number is:
Serial Number = 0145022958
02. Provide the Serial Number of the drive to the Engineer physically replacing the drive.
In most cases the Serial Number and/or WWN is printed on the handle of the drive. The engineer can then replace that drive with confidence.
03. If the Serial Number is not available for some reason, then follow the specific procedures for each device as outlined in the following procedures:
A5000/A5100/A5200
The following procedure will blink the light on the specified drive, using either the enclosure/device nomenclature, or the device path.
# luxadm led_blink enclosure,dev # luxadm led_blink box1,f3 box1 = The name of the enclosure f3 = Front of the array, the third drive OR # luxadm led_blink pathname # luxadm led_blink /sbus@6,0/SUNW,socal@d,10000/sf@0,0/ssd@w220000203767d881,0
D1000
There are two versions of D1000's, an 8 drive version and a 12 drive version. Depending on your type and how the target id switch is flipped the slot may be different for the target id's. Also, the locations may depend on whether the tray is “split” or not. Many times if the failure is hard, then an amber light will be lit on the drive, making it easy to determine the failed component.
The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t2d0s0 would mean that t2 is equal to target 2 on the tray going from left to right.
8-Slot D1000
Option Switch 1=UP and 2=DOWN - DEFAULT
Slot 0 1 2 3 8 9 10 11
Option Switch 1=UP and 2=UP
Slot 8 9 10 11 8 9 10 11
Option Switch 1=DOWN and 2=UP
Slot 8 9 10 11 0 1 2 3
Option Switch 1=DOWN and 2=DOWN
Slot 8 9 10 11 8 9 10 11
12-Slot D1000
Option Switch 1=UP and 2=DOWN – DEFAULT
Slot 0 1 2 3 4 5 8 9 10 11 12 13
Option Switch 1=UP and 2=UP
Slot 8 9 10 11 12 13 8 9 10 11 12 13
Option Switch 1=DOWN and 2=UP
Slot 8 9 10 11 12 13 0 1 2 3 4 5
Option Switch 1=DOWN and 2=DOWN
Slot 8 9 10 11 12 13 8 9 10 11 12 13
D2
D2 is like the D1000, but only comes in a 12-Slot configuration. In this case, however, the target ids for the slots are dependent on whether the tray is Split or not. The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t2d0s0 would mean that t2 is equal to slot 2 on the tray going from left to right.
Split-Bus
0 1 2 3 4 5 8 9 10 11 12 13
Single-Bus
8 9 10 11 12 13 8 9 10 11 12 13
A3500/A3000/RSM2000/A1000
These arrays are controlled by the host based software called “RM6”. This GUI based software should be used to maintain these storage arrays. When a drive fails, this software will illuminate the drive in question. If for some reason the LED is not lit, then the software can be used to manually illuminate it.
D240
On this boot device, only the two center drives are Hot Swappable. If the Optical Media or Tape drives are replaced with disk drives, then those disk drives are NOT hot swappable. The D240 can be in a split-bus configuration, which changes the target id's for each slot. The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t1d0s0 would mean that t1 is in slot t1 or the upper center of the tray.
Split-Bus
Left Slot = t6
Center Top = t0
Center Bottom = t0
Right Slot = t6
Single-Bus
Left Slot = t6
Center Top = t1
Center Bottom = t0
Right Slot = t4
D130
The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t3d0s0 would mean that t3 is in slot t3. Depending on configuration, the SCSI id's on this three drive enclosure are from left to right when looking at the front of the array:
3, 4, or 5 OR 10, 11, 12
SE6020/SE6120/T3/3310/3320/A3500/A3500FC
This hardware-raid controller based system is controlled by accessing the controller via serial port or network port. For this system the drive led should be illuminated, however if for some reason it is not, then the drive has not been failed by the system and may still be active. Please work with 1-800-USA-4SUN to determine the correct course of action.
S1
The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t3d0s0 would mean that t3 is in slot t3. On this array there is a toggle switch that allows you to set the Target id of the first drive in the tray. The first drive in the tray is always that far left drive when looking at the front of the tray. For example, if you set the base target to 2, then the next target is 3 and so on.
Multipack/Unipack
The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t3d0s0 would mean that t3 is in slot t3. The Unipacks are simple to determine the correct drive, as there is only one drive in the enclosure. The Multipack has a toggle switch which tells you which targets it will use. There are several types of Multipacks but they all follow the same basic process for setting target id's. The SCSI switch, located on the back, will determine which target id's are used. The actual target id's are silk screened on the slots.
V880/V890/280R/480R/V490
To hot-plug a drive on this FCAL based systems you must follow the correct procedures. By following the correct procedure, the correct hot-plug LED will illuminate on the appropriate drive.
To hot-plug a drive in a V880/V890:
# luxadm remove_device -F /dev/rdsk/c#t#d#s2
This will illuminate the Hot-Plug LED next to the correct drive and prepare the OS for a new drive. There are also other commands to initiate the new drive into the configuration, but that is beyond the scope of the document.
E450
This procedure definitively maps a logical drive to a physical drive for the E450.
01. Determine the UNIX physical device name from the SCSI error message. SCSI error messages are typically displayed in the system console and logged in the /usr/adm/messages file.
WARNING: /pci@6,4000/scsi@4,1/sd@3,0 (sd228) Error for Command: read(10) Error level: Retryable Requested Block: 3991014 Error Block: 3991269 Vendor: FUJITSU Serial Number: 9606005441 Sense Key: Media Error ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0 In the example SCSI error message above, the UNIX physical device name is /pci@6,4000/scsi@4,1/sd@3.
02. Determine the UNIX logical device name by listing the contents of the /dev/rdsk directory. Use the grep command to filter the output for any occurrence of the UNIX physical device name determined in Step 1:
% ls -l /dev/rdsk | grep /pci@6,4000/scsi@4,1/sd@3 lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s0 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:a,raw lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s1 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:b,raw lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s2 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:c,raw lrwxrwxrwx1 root root 45 Jan 30 09:07 c12t3d0s3 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:d,raw lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s4 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:e,raw lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s5 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:f,raw lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s6 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:g,raw lrwxrwxrwx 1 root root 45 Jan 30 09:07 c12t3d0s7 -> ../../devices/pci@6,4000/scsi@4,1/sd@3,0:h,raw
resulting output indicates the associated UNIX logical device name. In this example, the logical device name is c12t3d0.
03. Determine the disk slot number using the prtconf command. Substitute the string disk@ for sd@ in the physical device name determined in Step 1. The result in this example is
/pci@6,4000/scsi@4,1/disk@3.
Use the grep command to find this name in the output of the PRTCONF command:
% prtconf -vp | grep /pci@6,4000/scsi@4,1/disk@3 slot#11: '/pci@6,4000/scsi@4,1/disk@3'
NOTE! When using the "format" command you will need to type "/sd@3,0" in place of "/disk@3".
# format /pci@6,4000/scsi@4,1/sd@3,0
The resulting output indicates the corresponding disk slot number. In this example, the disk slot number is 11.
V440/220R/420R
The internal drives correspond to the SCSI target ID. The target id can be determined by the cxtxdxsx nomenclature used by Solaris, where c0t3d0s0 would mean that t3 is in slot 3.
V210/V240
The drive must be removed from Solaris prior to removing from the hardware. The following commands will achieve disabling the drive from Solaris and illuminating the LED next to the drive.
1. Determine the correct Ap_Id of the drive prior to replacing it:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 CD-ROM connected configured unknown c1 scsi-bus connected configured unknown c1::dsk/c1t0d0 disk connected configured unknown c1::dsk/c1t1d0 disk connected configured unknown c2 scsi-bus connected unconfigured unknown
2. Unconfigure the drive from Solaris:
# cfgadm -c unconfigure c1::dsk/c1t1d0
3. Verify that the drive is disabled:
# cfgadm -al | grep c1t1d0 c1::dsk/c1t1d0 unavailable connected unconfigured unknown
The Drive LED should now be lit.
E250
The E250, like other servers, has different target id's for different slot numbers. In the case of the E250, however, they do not necessarily match up easily. Here is the slot to target layout of the E250:
Disk Slot Number Logical Device Name Physical Device Name Slot 0 c0t0d0 /devices/pci@1f,4000/scsi@3/sd@0,0 Slot 1 c0t8d0 /devices/pci@1f,4000/scsi@3/sd@8,0 Slot 2 c0t9d0 /devices/pci@1f,4000/scsi@3/sd@9,0 Slot 3 c0t10d0 /devices/pci@1f,4000/scsi@3/sd@a,0 Slot 4 c0t11d0 /devices/pci@1f,4000/scsi@3/sd@b,0 Slot 5 c0t12d0 /devices/pci@1f,4000/scsi@3/sd@c,0