Sun Volume Manager

01. State Database and Replicas

     1.1 Creating or removing state database replicas
     1.2 Displaying the replica’s status
     1.3 Deleting state replicas

02. Stripe and Concatenation
03. Volume Tasks

     3.1 Creating a Stripe - (RAID 0)
     3.2 Creating a Concatenation - (RAID 0)
     3.3 Creating a simple mirror from new partitions
     3.4 Mirroring Partitions with data which can be unmounted
     3.5 Mirroring Partitions with data which can not be unmounted - root and /usr
     3.6 Creating RAID 5
     3.7 Removing a Volume

04. Hotspare Pool

     4.1 Associating a Hot Spare Pool with Submirrors
     4.2 Associating or changing a Hot Spare Pool with a RAID5 Metadevice
     4.3 Adding a Hot Spare Slice to All Hot Spare Pools

05 TransMeta Device

     5.1 TransMeta device for unmountable partition
     5.2 TransMeta device for non unmountable partition
     5.3 TransMeta device using Mirror

04. Cookbooks

     4.1 Recovering from a Boot Disk Failure with Solstice


State Database and Replicas:
The Solaris Volume Manager state database contains configuration and status information for all volumes, hot spares, and disk sets. Solaris Volume Manager maintains multiple copies (replicas) of the state database to provide redundancy and to prevent the database from being corrupted during a system crash.

If all state database replicas are lost, you could, lose all data that is stored on your Solaris Volume Manager volumes. For this reason, it is good practice to create enough state database replicas on separate drives and across controllers to prevent catastrophic failure. You can add additional state database replicas to the system at any time

If your system loses a state database replica, SVM must figure out which state database replicas still contain valid data. SVM determines this information by using a majority consensus algorithm. This algorithm requires that a majority (half + 1) of the state database replicas be available and in agreement before any of them are considered valid. You must create at least three state database replicas when you set up your disk configuration. A consensus can be reached as long as at least two of the three state database replicas are available.

The majority consensus algorithm provides the following:

  • The system will stay running if at least half of the state database replicas are available.
  • The system will panic if fewer than half of the state database replicas are available
  • The system will not reboot into multiuser mode unless a majority (half + 1) of the total number of state database replicas is available.

Each state database replica occupies 4 Mbytes (8192 disk sectors) default. Replicas can be stored on the following devices:

  • a dedicated local disk partition
  • a local partition that will be part of a volume
  • a local partition that will be part of a UFS logging device

Note – Replicas cannot be stored on the root (/), swap, or /usr slices, or on slices that contain existing file systems or data. After the replicas have been stored, volumes or file systems can be placed on the same slice. When a state database replica is placed on a slice that becomes part of a volume, the capacity of the volume is reduced by the space that is occupied by the replica(s).


Create State Database Replicas

 # metadb -a -c n -l nnnn -f ctds-of-slice

 -a specifies to add a state database replica.
 -f specifies to force the operation, even if no replicas exist.
 -c n specifies the number of replicas to add to the specified slice.
 -l nnnn specifies the size of the new replicas, in blocks.
 ctds-of-slice specifies the name of the component that will hold the replica.

 # metadb -a -f -c2 c0t1d0s7
 # metadb -a -f -c2 c0t0d0s7
 # metadb
        flags           first blk     block count
     a        u         16            8192            /dev/dsk/c0t1d0s7
     a        u         8208          8192            /dev/dsk/c0t1d0s7
     a        u         16            8192            /dev/dsk/c0t0d0s7
     a        u         8208          8192            /dev/dsk/c0t0d0s7

If you are replacing existing state database replicas, you might need to specify a replica size. Particularly if you have existing state database replicas (on a system upgraded from Solstice DiskSuite, perhaps) that share a slice with a file system, you must replace existing replicas with other replicas of the same size or add new replicas in a different location.

Caution – Do not replace default-sized (1034 block) state database replicas from Solstice DiskSuite with default-sized Solaris Volume Manager replicas on a slice shared with a file system. If you do, the new replicas will overwrite the beginning of your file system and corrupt it.

 # metadb -a -c 3 -l 1034 c0t0d0s7
 # metadb
 flags first blk  block count
 ...
 a u   16         1034         /dev/dsk/c0t0d0s7
 a u   1050       1034         /dev/dsk/c0t0d0s7
 a u   2084       1034         /dev/dsk/c0t0d0s7


To display the replica’s status:

 # metadb -i
        flags           first blk       block count
     a        u         16              8192            /dev/dsk/c0t1d0s7
     a        u         8208            8192            /dev/dsk/c0t1d0s7
     a        u         16              8192            /dev/dsk/c0t0d0s7
     a        u         8208            8192            /dev/dsk/c0t0d0s7
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors


To delete state replicas:

 # metadb -d -f ctds-of-slice
 -d specifies to delete a state database replica.
 -f specifies to force the operation, even if no replicas exist.
 ctds-of-slice specifies the name of the component that holds the replica.

 # metadb -d -f c0t0d0s7

RAID 0 (Stripe and Concatenation)

RAID 0 (Stripe) Volume: A RAID 0 (stripe) volume spreads data equally across all components in the stripe, forming one logical storage unit. These segments are interleaved. When you create a stripe, you can set the interlace value or use the Solaris Volume Manager default interlace value of 16 Kbytes. Once you have created the stripe, you cannot change the interlace value.

RAID 0 (Concatenation) Volume: Concatenated volume writes data to the first available component until it is full, and then moves to the next available component. The data for a concatenated volume is organized serially and adjacently across disk slices, forming one logical storage unit. A concatenation enables you to dynamically expand storage capacity and file system sizes online. A concatenation can also expand any active and mounted UFS file system without having to bring down the system. You can also create a concatenation from a single component. Later, when you need more storage, you can add more components to the concatenation.

Note – To increase the capacity of a stripe, you need to build a concatenated stripe. You must use a concatenation to encapsulate root (/), swap, /usr, /opt, or /var when mirroring these file systems.

RAID 0 (Concatenated Stripe) Volume: A concatenated stripe is a stripe that has been expanded by adding additional components (stripes).



Creating a Stripe volume - (RAID 0)

The following example creates a striped volume using 2 slices named /dev/md/rdsk/d10 using the metainit command.

 metainit {vol-name} {number-of-stripes} {components-per-stripe} 
          {component-names…} [-i interlace-value]

01. Create Stripe:

 # metainit d10 1 2 c0t1d0s0 c0t1d0s1
 d10: Concat/Stripe is setup

02. Use the metastat command to query your new volume

 # metastat d10
 d10: Concat/Stripe
     Size: 4194288 blocks (2.0 GB)
     Stripe 0: (interlace: 32 blocks)
        Device     Start Block  Dbase   Reloc
        c0t1d0s0          0     No      Yes
        c0t1d0s1          0     No      Yes

 Device Relocation Information:
 Device   Reloc  Device ID
 c0t1d0   Yes    id1,dad@AST38410A=5CS09PSH

03. Create file system on it

 # newfs /dev/md/rdsk/d10

04. Mount the file system

 # mount /dev/md/dsk/d10 /mnt


Creating a Concatenation - (RAID 0)

The method used for creating a Concatenated Volume is very similar to that used in creating a Striped Volume - both use the metainit command (obviously using different options) and the same method for creating and mounting a UFS file system for.

Creating RAID 0 (Concatenation) Volumes:

 metainit {volume-name} {number-of-stripes} 
   { [components-per-stripe] | [component-names]…}

Creating a Concatenation of One Slice:

 # metainit d25 1 1 c0t1d0s2
 d25: Concat/Stripe is setup

This example shows the creation of a concatenation, d25, that consists of one stripe (the first number 1) made of a single slice (the second number 1 in front of the slice). The system verifies that the volume has been set up.

Note – This example shows a concatenation that can safely encapsulate existing data.

Create concatenate volume with 2 stripes:

 # metainit d20 2 1 c0t1d0s3 1 c0t1d0s4
 d20: Concat/Stripe is setup

Expanding a nonmetadevice slice Filesystem:

 # metainit d25 2 1 c0t1d0s2 1 c0t2d0s2
 d25: Concat/Stripe is setup

This example shows the creation of a concatenation called d25 out of two slices, /dev/dsk/c0t1d0s2 (which contains a file system) and /dev/dsk/c0t2d0s2. The file system must first be unmounted
Caution – The first slice in the metainit command must be the slice that contains the file system. If not, you will corrupt your data.

Expanding an existing Raid 0 volume Filesytem:
01. Concatenate existing stripes using metattach command:

 # metattach d20 c0t1d0s5 
 d20: component is attached

02. Extend the filesystem

 # growfs -M /mnt  /dev/md/rdsk/d20


Creating a simple mirror from new partitions

1.Create two stripes for two submirors as d21 & d22

 # metainit d21 1 1 c0t0d0s2 
 d21: Concat/Stripe is setup 

 # metainit t d22 1 1 c1t0d0s2 
 d22: Concat/Stripe is setup 

2. Create a mirror device (d20) using one of the submirror (d21)

 # metainit   d20   -m   d21 
 d20: Mirror is setup 

3. Attach the second submirror (D21) to the main mirror device (D20)

 # metattach d20 d22 
 d50: Submirror d52 is attached. 

4. Make file system on new metadevice

 #newfs /dev/md/rdsk/d20 
 edit /etc/vfstab to mount the /dev/dsk/d20 on a mount point. 


Mirroring a Partitions with data which can be unmounted

 # metainit f d1 1 1 c1t0d0s0 
 d1: Concat/Stripe is setup 

 # metainit d2 1 1 c2t0d0s0 
 d2: Concat/Stripe is setup 

 # metainit d0 -m d1 
 d0: Mirror is setup 

 # umount /local 

Edit the /etc/vfstab file so that the file system references the mirror)

 #mount /local 
 #metattach d0 d2 
 d0: Submirror d2 is attached 


Mirroring a Partitions with data which can not be unmounted - root and /usr

Mirroring /usr filesystem

 # metainit -f d12 1 1 c0t3d0s6 
 d12: Concat/Stripe is setup 

 # metainit d22 1 1 c1t0d0s6 
 d22: Concat/Stripe is setup 

 # metainit d2 -m d12 
 d2: Mirror is setup 

(Edit the /etc/vfstab file so that /usr references the mirror)

 # reboot 
 ... 
 ... 

 # metattach d2 d22 
 d2: Submirror d22 is attached 

Mirroring / (root) filesystem

 # metainit -f d11 1 1 c0t3d0s0 
 d11: Concat/Stripe is setup 

 # metainit d12 1 1 c1t3d0s0 
 d12: Concat/Stripe is setup 

 # metainit d10 -m d11 
 d10: Mirror is setup 

 # metaroot d10 
 # lockfs -fa 
 # reboot 
 … 
 … 
 # metattach d10 d12 
 d10: Submirror d12 is attached 

Make Mirrored disk bootable

 # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

Create alterbate name for Mirrored boot disk
a.) Find physical path name for the second boot disk

 # ls -l /dev/rdsk/c1t3d0s0
 lrwxrwxrwx 1 root root 55 Sep 12 11:19 /dev/rdsk/c1t3d0s0 -  
 >../../devices/sbus@1,f8000000/esp@1,200000/sd@3,0:a

b.) Create an alias for booting from disk2

 ok> nvalias bootdisk2 /sbus@1,f8000000/esp@1,200000/sd@3,0:a


Creating RAID 5

The system must contain at least three state database replicas before you can create RAID5 metadevices.

A RAID5 metadevice can only handle a single slice failure.A RAID5 metadevice can be grown by concatenating additional slices to the metadevice. The new slices do not store parity information, however they are parity protected. The resulting RAID5 metadevice continues to handle a single slice failure. Create a RAID5 metadevice from a slice that contains an existing file system.will erase the data during the RAID5 initialization process .The interlace value is key to RAID5 performance. It is configurable at the time the metadevice is created; thereafter, the value cannot be modified. The default interlace value is 16 Kbytes which is reasonable for most of the applications.

A RAID level 5 metadevice is defined using the -r option with an interlace size of 20 Kbytes. The data and parity segments are striped across the slices, c1t1d0s2,c1t2d0s2, and c1t3d0s2

To setup raid5 on three slices of different disks

 # metainit d10 -r c1t1d0s2 c1t2d0s2 c1t3d0s2 -i 20k
 d10: RAID is setup


To remove a Volume:

01. Unmount the filesystem

      # umount /mnt

02. Remove the volume using ‘metaclear’ command.

      # metaclear d20
      d20: Concat/Stripe is cleared

HotSpare Pool

A hot spare pool is a collection of slices reserved by DiskSuite to be automatically substituted in case of a slice failure in either a submirror or RAID5 metadevice . A hot spare cannot be a metadevice and it can be associated with multiple submirrors or RAID5 metadevices. However, a submirror or RAID5 metadevice can only be asociated with one hot spare pool. .Replacement is based on a first fit for the failed slice and they need to be replaced with repaired or new slices. Hot spare pools may be allocated, deallocated, or reassigned at any time unless a slice in the hot spare pool is being used to replace damaged slice of its associated metadevice.


Associating a Hot Spare Pool with Submirrors

 # metaparam -h     hsp100     d10
 # metaparam -h     hsp100     d11
 # metastat d0
 d0: Mirror
 Submirror 0: d10
 State: Okay
 Submirror 1: d11
 State: Okay
 ...
 d10: Submirror of d0
 State: Okay
 Hot spare pool: hsp100
 ...
 d11: Submirror of d0
 State: Okay
 Hot spare pool: hsp100


Associating or changing a Hot Spare Pool with a RAID5 Metadevice

 # metaparam     -h    hsp001     d10
 # metastat     d10
 d10:RAID
 State: Okay
 Hot spare Pool: hsp001


Adding a Hot Spare Slice to All Hot Spare Pools

 # metahs   -a    -all    /dev/dsk/c3t0d0s2
 hsp001: Hotspare is added
 hsp002: Hotspare is added
 hsp003: Hotspare is added

Creating a Trans Meta Device

Trans meta devices enables ufs logging. There is one logging device and a master device and all file system changes are written into logging device and posted on to master device . This greatly reduces the fsck time for very large file systems as fsck has to check only the logging device which is usually of 64 M. maximum size.Logging device preferably should be mirrored and located on a different drive and controller than the master device.

Ufs logging can not be done for root partition.


Trans Metadevice for a File System That Can Be Unmounted

For /home2 Filesystem

 1. Setup metadevice

 # umount /home2
 # metainit d63 -t c0t2d0s2 c2t2d0s1
 d63: Trans is setup

Logging becomes effective for the file system when it is remounted

2. Change vfstab entry & reboot

 from
 /dev/md/dsk/d2 /dev/md/rdsk/d2 /home2 ufs 2 yes -
 to
 /dev/md/dsk/d63 /dev/md/rdsk/d63 /home2 ufs 2 yes -

 # mount /home2

Next reboot displays the following message for logging device

 # reboot
 ...
 /dev/md/rdsk/d63: is logging


Trans Metadevice for a File System That Cannot Be Unmounted

For /usr Filesystem

Setup metadevice

 # metainit -f d20 -t c0t3d0s6 c1t2d0s1
 d20: Trans is setup

2.) Change vfstab entry & reboot:

 from
 /dev/dsk/c0t3d0s6     /dev/rdsk/c0t3d0s6   /usr   ufs      1     no     -
 to
 /dev/md/dsk/d20      /dev/md/rdsk/d20     /usr    ufs     1     no     -

 # reboot


TransMeta device using Mirrors

1.) Setup metadevice

 #umount /home2
 #metainit     d64     -t     d30     d12
 d64 trans is setup

2.) Change vfstab entry & reboot:

 from
 /dev/md/dsk/d30 /dev/md/rdsk/d30 /home2 ufs 2 yes
 to
 /dev/md/dsk/d64 /dev/md/rdsk/d64 /home2 ufs 2 yes

Cookbooks

Recovering from a Boot Disk Failure with Solstice on Solaris

Recovering from a failed boot disk is not a very difficult procedure using Solstice DiskSuite when the system has been properly setup and documented initially. The first step is obviously to identify which piece of hardware failed. In this example it will be the boot disk, which is is /dev/dsk/c0t0d0. Once the failed disk has been identified it is important to boot up the system from the second half of the mirror before the failed device is replaced.

1. Boot from the secondary disk

  1. Identify the failed disk (the examples in this document use /dev/dsk/c0t0d0).
  2. Boot from the secondary disk before replacing the failed disk:
             ok boot altdisk

If no alternate boot alias is available, try to boot off of one of the built-in alternate devices provided in the Open Boot PROM. These are numbered disk0 through disk6 and generally only apply to disks on the system's internal SCSI controller. If all else fails, use probe-scsi-all to determine the device path to the secondary disk, and then create an alias from which to boot.

2. When the system comes up, it will complain about the stale database replicas and will only allow booting into single-user mode. In single user mode, use the metadb command without any arguments to list the replicas that have failed. Delete the stale replicas using metadb –d:

  # metadb
        flags           first blk       block count
     a m  p  luo        16              1034            /dev/dsk/c0t0d0s7
     a    p  luo        1050            1034            /dev/dsk/c0t0d0s7
     a    p  luo        2084            1034            /dev/dsk/c0t0d0s7
     a m  p  luo        16              1034            /dev/dsk/c0t1d0s7
     a    p  luo        1050            1034            /dev/dsk/c0t1d0s7
     a    p  luo        2084            1034            /dev/dsk/c0t1d0s7

 # metadb -d /dev/dsk/c0t0d0s7

3. Shut down the system and replace the failed disk

4. Partition the replacement disk identically to its mirror. Use prtvtoc to print out the volume table of contents (VTOC) of the good disk, and then use the fmthard command to write the table to the new disk:

 # prtvtoc /dev/rdsk/c0t1d0s2 > /tmp/format.out
 # fmthard -s /tmp/format.out /dev/rdsk/c0t0d0s2
  or
 # prtvtoc -h /dev/rdsk/c0t1d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2

Example prtvtoc -h output

 # prtvtoc -h /dev/rdsk/c0t1d0s2
       0      0    00          0   5121944   5121943
       1      0    00    5121944   3072224   8194167
       2      5    00          0  35368272  35368271
       3      0    00    8194168   4099440  12293607
       4      0    00   12293608   7171664  19465271
       5      0    00   19465272  15794624  35259895
       7      0    00   35259896    108376  35368271

5. Recreate the deleted database records with the metadb –a command:

 # metadb -a /dev/dsk/c0t0d0s7

6. Detach the failed submirrors to stop read and write operations to that half of the mirror when activity occurs on the metadevice. The detach must be forced:

This case the mirror as defined as :

d2: Mirror

    Submirror 0: d0
    Submirror 1: d1

d12: Mirror

    Submirror 0: d10
    Submirror 1: d11

d22: Mirror

    Submirror 0: d20
    Submirror 1: d21

d32: Mirror

    Submirror 0: d30
    Submirror 1: d31

d42: Mirror

    Submirror 0: d40
    Submirror 1: d41


 # metadetach -f d2 d0  (/ filesystem)
 # metadetach -f d12 d10  (swap partition)
 # metadetach -f d22 d20  (/var filesystem)
 # metadetach -f d32 d30  (/export/home filesystem)
 # metadetach -f d42 d40  (/u01 filesystem)

7. Clear the failed submirrors:

 # metaclear d0
 # metaclear d10
 # metaclear d20
 # metaclear d30
 # metaclear d40

8. Recreate the submirrors using the metainit command:

 # metainit d0 1 1 c0t0d0s0
 # metainit d10 1 1 c0t0d0s1
 # metainit d20 1 1 c0t0d0s3
 # metainit d30 1 1 c0t0d0s4
 # metainit d40 1 1 c0t0d0s5

9. Reattach the submirrors using the metattach command:

 # metattach d2 d0
 # metattach d12 d10
 # metattach d22 d20
 # metattach d32 d30
 # metattach d42 d40

The submirrors will automatically start to resynchronize.

10. Monitor resynchronization using the metastat command. (Resynchronization time depends on the amount of data on the partitions):

 # metastat | more

11. Check the dump device

    dumpadm -d swap

 # init 6

After resynchronization reboot the system. Ensure that the system is now properly booting from its primary boot disk and that everything is operating normally.