LRAID cmds: Difference between revisions

Revision as of 01:03, 8 July 2011

$ ssh lraid5
Last login: Sun Jul  3 09:14:04 2011 from totoro
Oracle Corporation      SunOS 5.11      snv_151a        November 2010
lundman@solaris:~$  sudo bash
Password:
root@solaris:/home/lundman# # uname -a
SunOS solaris 5.11 snv_151a i86pc i386 i86pc Solaris

Drives attached to the system:

# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0::dsk/c10t0d0           disk         connected    configured   ok
sata1/1::dsk/c10t1d0           disk         connected    configured   ok
sata1/2::dsk/c10t2d0           disk         connected    configured   ok
sata1/3::dsk/c10t3d0           disk         connected    configured   ok
sata1/4::dsk/c10t4d0           disk         connected    configured   ok
sata1/5                        sata-port    empty        unconfigured ok

(Ok, there should be a disk in sata1/5, but the kids probably popped it out)

There are two parts to ZFS. The "pool" of storage, and "file-systems". All pool commands are done with "zpool" and all filesystem commands are done with "zfs".

List your pools, you will initially only have the boot pool:

# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
rpool  29.8G  5.46G  24.3G    18%  1.00x  ONLINE  -
mypool 4.53T  1.39T  3.14T    30%  1.00x  DEGRADED  -

Default boot is called "rpool" in Solaris, you can change the name if you want.

Check out the status of your zpools:

# zpool status
 pool: rpool
state: ONLINE
scan: none requested
config:

       NAME         STATE     READ WRITE CKSUM
       rpool        ONLINE       0     0     0
         c10t0d0s0  ONLINE       0     0     0

errors: No known data errors

 pool: mypool
state: DEGRADED
status: One or more devices has been removed by the administrator.
       Sufficient replicas exist for the pool to continue functioning in a
       degraded state.
action: Online the device using 'zpool online' or replace the device with
       'zpool replace'.
scan: resilvered 27.5G in 0h21m with 0 errors on Thu Jun 30 16:19:20 2011
config:

       NAME                        STATE     READ WRITE CKSUM
       mypool                      DEGRADED     0     0     0
         raidz1-0                  DEGRADED     0     0     0
           c10t5d0                 REMOVED      0     0     0
           c10t4d0                 ONLINE       0     0     0
           c10t3d0                 ONLINE       0     0     0
           c10t2d0                 ONLINE       0     0     0
           c10t1d0                 ONLINE       0     0     0
       logs
         /dev/zvol/dsk/rpool/slog  ONLINE       0     0     0

errors: No known data errors

As you can see, "rpool" is the boot pool (root pool), on disk "c10t0d0". (controller 10, target 0, device 0, slice 0). Slice is "partition" in Solaris world. Device is no longer used, as it is a legacy thing from scsi days. "rpool" is a single disk pool.

My main data pool, "mypool" is missing a disk, but otherwise fine lookin'. Finally the "slog" of "mypool" (transaction logs in ZFS) is stored on SSD, this is for speed. This is a "raidz1" (raid 5) setup. I can lose one disk without data loss. (20% used on parity). Use "raidz2" if you want to be able to lose 2 HDDs without loss. (40% used on parity), and so on. You can also "mirror" (raid1).

ZFS has two kinds of filesystems. Regular ZFS file system, and a "Virtual-Volume". The latter creates a virtual block device (fake disk) which you can then use as a regular disk, ie, format it "ext3" if you really want to, mount it etc. In this case, we create a Volume for "swap", and for "slog". The size of both ZFS-fs, and Volumes, can be changed on the fly.

Listing your filesystems:

# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
rpool                      8.04G  21.2G    93K  /rpool
rpool/ROOT                 2.84G  21.2G    31K  legacy
rpool/ROOT/realtek         2.84G  21.2G  2.71G  /
rpool/ROOT/solaris         2.04M  21.2G  2.52G  /
rpool/dump                 1019M  21.2G  1019M  -
rpool/export                141M  21.2G    32K  /export
rpool/export/home           141M  21.2G    32K  /export/home
rpool/export/home/lundman   141M  21.2G   141M  /export/home/lundman
rpool/slog                 2.06G  21.9G  1.36G  -
rpool/swap                    2G  23.1G   123M  -

mypool                     1.11T  2.45T  55.9K  /mypool
mypool/backup               232G  2.45T   232G  /backup
mypool/data                1.42G  2.45T  1.42G  /data
mypool/secure               906G  2.45T   906G  /media
mypool/swap                   2G  2.45T  25.6K  -
mypool/test                54.3K  2.45T  54.3K  /mypool/test

Note that default Solaris creates lots of little filesystems, one for each boot (realtek & solaris), one for "dump", "swap", "export" "export/home" and "export/home/lundman". But then, filesystems are really cheap in ZFS, so, why not. Gives you lots of control. Traditionally filesystems were quite monolithic, where as in ZFS they are closer to "mkdir".

Checking the attributes of the "rpool":

# zpool get all rpool
NAME   PROPERTY       VALUE               SOURCE
rpool  size           29.8G               -
rpool  capacity       18%                 -
rpool  altroot        -                   default
rpool  health         ONLINE              -
rpool  guid           15126656221277650189  default
rpool  version        31                  default
rpool  bootfs         rpool/ROOT/realtek  local
rpool  delegation     on                  default
rpool  autoreplace    off                 default
rpool  cachefile      -                   default
rpool  failmode       wait                default
rpool  listsnapshots  off                 default
rpool  autoexpand     off                 default
rpool  dedupditto     0                   default
rpool  dedupratio     1.00x               -
rpool  free           24.3G               -
rpool  allocated      5.46G               -
rpool  readonly       off                 -

You won't change the pool attributes that often, but in this case, "failmode" defines what the system should do if the pool as a failure. In this case "wait" (ie, hang until it is fixed.) Which is not so useful. It is often better for it to "continue" in failure mode, so we can try to fix it.

# zpool set failmode=continue rpool

autoreplace controls if it should automatically replace dead HDD with new HDDs, if it detects you have replaced the hardware. Why not? Unless you want to manually try the commands of course.

autoexpand controls if it should automatically make the pool larger, when all HDDs in the pool have been replaced with larger disks. You probably want this on too.

But we are actually looking at "rpool", the SSD, so the last two options do not really make sense. You want to set the for your data pool though.

Likewise, ZFS filesystems (and Volumes) also have attributes, these are more fun to play with.

# zfs get all rpool
rpool  type                            filesystem                      -
rpool  creation                        Fri Dec 24  2:38 2010           -
rpool  used                            8.04G                           -
rpool  available                       21.2G                           -
rpool  referenced                      93K                             -
rpool  compressratio                   1.00x                           -
rpool  mounted                         yes                             -
rpool  quota                           none                            default
rpool  reservation                     none                            default
rpool  recordsize                      128K                            default
rpool  mountpoint                      /rpool                          default
rpool  sharenfs                        off                             default
rpool  checksum                        on                              default
rpool  compression                     off                             default
rpool  atime                           on                              default
rpool  nbmand                          off                             default
rpool  sharesmb                        off                             default
rpool  dedup                           off                             default
rpool  encryption                      off                             -

(this list is considerably larger, I just picked out some of the juicier options).

Most are self-explanatory. But note that "quota" here means the "file-system size". If not set (normal) you can create files in there until the pool is full. If set (say 100G) you can only go to 100G of data. "reservation" refers to how much space the fs should reserve in advance. If you set quota to 100G, that directory (user) can fill 100G, but it allocates no space. So if the pool runs out, the user can not actually reach 100G. If you want it to pre-allocate the promised 100G, you set reservation. You probably don't want to. But do note we set it for "swap" since you don't want swap to NOT have the space promised.

"sharenfs" and "sharesmb" controls if it should exported as NFS, or samba. atime is good to disable (speed). "compression" is often good to have on (lessens the disk read/write and expense of CPU - alas, does nothing for media like movies).

Checking swap:

# zfs get all rpool/swap
rpool/swap  compression                     off
rpool/swap  volsize                         2G
rpool/swap  refreservation                  2G

So, swap set to use 2G of space, pre-allocated. Note your ZFS Volumes can be compressed. So if you put a "ext3" filesystem on it, ZFS will compress the blocks under it, to take up less space. Very sexy.

You use "format" command in Solaris to play with partitions. This is legacy crap you do no longer need to bother with (Since the boot is already done). Looks like this:

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
      0. c10t0d0 <ATA    -OCZ-AGILITY    -1.4  cyl 3889 alt 2 hd 255 sec 63>
         /pci@0,0/pci1043,8496@11/disk@0,0
      1. c10t1d0 <ATA-SAMSUNG HD103SI-1113-931.51GB>
         /pci@0,0/pci1043,8496@11/disk@1,0
      2. c10t2d0 <ATA-SAMSUNG HD103SI-1118-931.51GB>
         /pci@0,0/pci1043,8496@11/disk@2,0
      3. c10t3d0 <ATA-SAMSUNG HD103SI-1118-931.51GB>
         /pci@0,0/pci1043,8496@11/disk@3,0
      4. c10t4d0 <ATA-SAMSUNG HD103SI-1118-931.51GB>
         /pci@0,0/pci1043,8496@11/disk@4,0
Specify disk (enter its number):

Create a pool from disks:

# zpool create mypool raidz c10t1d0 c10t2d0 c10t3d0 c10t4d0 c10t5d0

You can name it anything you want, don't need to use 'mypool'

# zpool status mypool

zpool don't have to be actual disks either, you can make it on fake files:

# mkfile 1G /var/tmp/fakedisk1
# zpool create playpool /var/tmp/fakedisk1

When you do it "for realz" you want slog on SSD. So create a volume for slog, usually the same size as memory.

# zfs create -V 2G rpool/slog

(-V creates a Volume instead of fs, creating volumes creates a dev node at /dev/zvol/dsk/$poolname/$volumename)

# zfs set refreservation=2G rpool/slog

(pre-allocate the disk space, so slog is guaranteed 2G)

# zfs add mypool log /dev/zvol/dsk/rpool/slog

(Attach the Volume slog from pool rpool, to pool zpool)

Create a file-system:

# zfs create mypool/roger
mypool/roger            2.5T   55K  2.5T   1% /mypool/roger

Don't like where it is mounted?

# zfs set mountpoint=/roger mypool/roger

(Note the dataset name, ie pool+fs, starts with pool name 'mypool', no leading slash). ZFS remounts it for you.

And remove it:

# zfs destroy mypool/roger

How you will probably create your media folder:

# zfs create -o mountpoint=/media -o atime=off -o compression=on mypool/secure
# zfs set sharenfs=rw=@192.168.11,root=@192.168.11 zpool/secure

Note I created "mypool/secure" (dataset name) but asked it to be mounted on "/media". Dataset name, vs mountpoint. Just as you can create "mypool/warez" and have it mounted as "/ftp"

# zfs list mypool/secure
mypool/secure                906G  2.45T   906G  /media

Confirm that the NFS sharing worked:

# showmount -e 127.0.0.1
export list for 127.0.0.1:
/media  @192.168.11

CAVEAT: NFS will only work if it can look up hostnames. So your A100/PC needs to live in /etc/hosts on the NAS (and most likely NAS needs to be in hosts on those machines too). Or your router returns some name for those IPs when looked up.

When you set "sharenfs" Solaris will automatically share the volume, but you can control it manually as well.

# zfs shareall
# zfs unshare mypool/secure

But you most likely will only need to "shareall" if you use "encryption" with your filesystem. If you do use encrypted, this is what you do after reboot:

# zfs mount mypool/secure
Enter passphrase: $password
# zfs share mypool/secure
(Or # zfs shareall)

All ZFS attributes (or are they called properties?) can have user-defined values. As it happens, llink will read properties called "net.lundman:sharellink" and if set to "on" will add to ROOTs.

# zfs set net.lundman:sharellink=on mypool/secure

If you lose a HDD, pull the HDD out of the NAS. And insert a replacement drive.

First you need to tell the OS that a new disk is attached:

# cfgadm -al
Ap_Id                                    Type         Receptacle   Occupant     Condition
sata1/5                                 disk         connected    unconfigured   ok

# cfgadm -c configure sata1/5

If you have "autoreplace" set on the pool, you are done. But let's assume you have not.

# zpool status
           c10t5d0                 FAILED      0     0     0

# zpool replace mypool c10t5d0 c10t5d0

# zpool status
action: Online the device using 'zpool online' or replace the device with
       'zpool replace'.
scan: resilvered 27.5G in 0h21m (27%) ETA 4 hours 12 minutes.

       NAME                        STATE     READ WRITE CKSUM
       mypool                      DEGRADED     0     0     0
           c10t5d0                 FAILED       0     0     0
           c10t5d0/1               replacing    0     0     0

Snapshots can be useful. If you are to try something new, or do some changes, upgrade OS, protect yourself from accidentally deleting all files.

# zfs list mypool/secure
mypool/secure                906G  2.45T   906G  /media
# zfs snapshot mypool/secure@20110708

# zfs list -t all
mypool/secure                             906G  2.45T   906G  /media
mypool/secure@20110708                       0      -   906G  -

To create a snapshot, just give the dataset name (mypool/secure) @ snapshot-name. It can be any name, but as convention, many people use the date. Note the snapshot is taking 0 bytes of space. It only records the differences. So, let's delete something:

-rwxrwxrwx  1 501 501  250974208 2007-08-22 19:15 make_big_words.avi
# rm make_big_words.avi

mypool/secure                             906G  2.45T   905G  /media
mypool/secure@20110708                    238M      -   906G  -

Deleted 250M worth of data, so the snapshot now takes up 238M.

If you are happy with the changes, just destroy the snapshot to release the space back.

# zfs destroy mypool/secure@20110708

Am I the only one who is unhappy that the same command 'destroy' is used for both? Since "zfs destroy mypool/secure" and "zfs destroy mypool/secure@20110708" are vastly different. Or even "mypool/secure @20110708". Oops, that space....

However, if you changed your mind, and want it back:

# zfs rollback mypool/secure@20110708
# zfs list -t all

mypool/secure                             906G  2.45T   906G  /media
mypool/secure@20110708                   1.60K      -   906G  -

# ls -l make_big_words.avi 
-rwxrwxrwx 1 501 501 250974208 2007-08-22 19:15 make_big_words.avi

# zfs destroy mypool/secure@20110708

General Solaris crap:

The old SYSV style /etc/init.d and /etc/rc2.d/ still exists, but is no longer used. Some older, or legacy programs, may still put start/stop scripts in there. They will work.

The new system uses SMF. List all your services, and states:

# svcs -a
legacy_run     Jun_06   lrc:/etc/rc2_d/S72autoinstall
disabled       Jun_06   svc:/network/ipsec/ike:default
disabled       Jun_06   svc:/network/ftp:default
online         Jun_06   svc:/network/ssh:default

(Insert giant list here).

You can see S72autoinstall ran with legacy stuff. You have ipsec/ike, but it is not to run. And you have ssh (daemon) and it is running just fine.

If something dies (say, sshd) the SMF system will automatically restart it. This is the main advantage with the new system

To enable something to run:

# svcadm enable ftp

You could use "svcadm enable svc:/network/ftp:default" here, but that is a bit tedious, so as long as it is unique, you can shorten it.

# svcs ftp
STATE          STIME    FMRI
online         Jun_06   svc:/network/ftp:default

turn it off:

# svcadm disable ftp

Restarting:

# svcadm restart llink

Although, since you know it is going to be restarted automatically, you can just "pkill llink". I mean, it's rude, but works :)

If something dies too many times, they can go into "maintenance" state:

# svcs llink
STATE          STIME    FMRI
maintenance   Jun_06   svc:/network/llink:default

Then you need to clear its state first.

# svcadm clear llink
# svcadm enable llink

Note that SMF has dependency built in. So ssh it dependent on network/physical etc, and will only start when those run correctly:

# svcs -l ssh
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/system/filesystem/autofs (online)
dependency   require_all/none svc:/network/loopback (online)
dependency   require_all/none svc:/network/physical (multiple)
dependency   require_all/none svc:/system/cryptosvc (online)
dependency   require_all/none svc:/system/utmp (online)
dependency   optional_all/error svc:/network/ipfilter:default (disabled)
dependency   require_all/restart file://localhost/etc/ssh/sshd_config (online)

So if something is not starting, it may not be something wrong with that program, but rather that a dependent service is not running.

There is a "hd" program that lists some bits, like your HDD temperatures:

# /data/hd

Device    Serial        Vendor   Model             Rev  Temperature
------    ------        ------   -----             ---- -----------
c10t0d0p0  43M324HP783S  ATA      OCZ-AGILITY       1.4  255 C (491 F)
c10t1d0p0  XHJDWS510824  ATA      SAMSUNG HD103SI   1113 31 C (87 F)
c10t2d0p0  XGJ90Z400082  ATA      SAMSUNG HD103SI   1118 35 C (95 F)
c10t3d0p0  XGJ90SB01390  ATA      SAMSUNG HD103SI   1118 34 C (93 F)
c10t4d0p0  XGJ90Z400080  ATA      SAMSUNG HD103SI   1118 34 C (93 F)

I assure you that the SSD is not on fire.

Power management is in /etc/power.conf

# cat /etc/power.conf
device-dependency-property removable-media /dev/fb

autopm                  enable
autoS3                  default

cpupm                   enable
cpu-threshold           1s
# Auto-Shutdown         Idle(min)       Start/Finish(hh:mm)     Behavior
autoshutdown            30              9:00 9:00               noshutdown

device-thresholds  /pci@0,0/pci8086,244e@1e/pci11ab,11ab@0/disk@0,0    10m
device-thresholds  /pci@0,0/pci8086,244e@1e/pci11ab,11ab@0/disk@1,0    10m
device-thresholds  /pci@0,0/pci8086,244e@1e/pci11ab,11ab@0/disk@2,0    10m
device-thresholds  /pci@0,0/pci8086,244e@1e/pci11ab,11ab@0/disk@3,0    10m
device-thresholds  /pci@0,0/pci8086,244e@1e/pci11ab,11ab@0/disk@4,0    10m

The package manager is called "pkg".

# pkg list

# pkg install gnu-emacs-no-x11

# pkg search python

llink lives in /usr/local/etc/llink/ with binaries in /usr/local/bin/