RAID10 を再構築してみる [LinuxでSoftware RAID]
勘違いから始まったわけですが。。。ファイルサーバのRAID10を再構築してみます。
勘違いといいますのも、、
dmesg を眺めていると
[ 7.130124] raid10: raid set md0 active with 4 out of 4 devices
[ 7.130151] md0: detected capacity change from 0 to 4000797556736
[ 7.131405] md0:
[ 7.133764] md: raid6 personality registered for level 6
[ 7.133767] md: raid5 personality registered for level 5
[ 7.133769] md: raid4 personality registered for level 4
[ 7.134502] unknown partition table
unknown とか言われているのをみつけて、さっそく
# fdisk /dev/sda
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
The number of cylinders for this disk is set to 243201.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sda1 1 243202 1953514583+ ee GPT
# parted /dev/sda
GNU Parted 1.8.8.1.159-1e0e
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ATA Hitachi HDS72202 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 2000GB 2000GB
実はこのままでもよかったのですが、、ファイルサーバのバックアップを
とっておいて、System typeを変更します。
# fdisk /dev/sda
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
The number of cylinders for this disk is set to 243201.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): print
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sda1 1 243202 1953514583+ ee GPT
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): L
0 Empty 24 NEC DOS 81 Minix / old Lin bf Solaris
1 FAT12 39 Plan 9 82 Linux swap / So c1 DRDOS/sec (FAT-
2 XENIX root 3c PartitionMagic 83 Linux c4 DRDOS/sec (FAT-
3 XENIX usr 40 Venix 80286 84 OS/2 hidden C: c6 DRDOS/sec (FAT-
4 FAT16 <32M 41 PPC PReP Boot 85 Linux extended c7 Syrinx
5 Extended 42 SFS 86 NTFS volume set da Non-FS data
6 FAT16 4d QNX4.x 87 NTFS volume set db CP/M / CTOS / .
7 HPFS/NTFS 4e QNX4.x 2nd part 88 Linux plaintext de Dell Utility
8 AIX 4f QNX4.x 3rd part 8e Linux LVM df BootIt
9 AIX bootable 50 OnTrack DM 93 Amoeba e1 DOS access
a OS/2 Boot Manag 51 OnTrack DM6 Aux 94 Amoeba BBT e3 DOS R/O
b W95 FAT32 52 CP/M 9f BSD/OS e4 SpeedStor
c W95 FAT32 (LBA) 53 OnTrack DM6 Aux a0 IBM Thinkpad hi eb BeOS fs
e W95 FAT16 (LBA) 54 OnTrackDM6 a5 FreeBSD ee GPT
f W95 Ext'd (LBA) 55 EZ-Drive a6 OpenBSD ef EFI (FAT-12/16/
10 OPUS 56 Golden Bow a7 NeXTSTEP f0 Linux/PA-RISC b
11 Hidden FAT12 5c Priam Edisk a8 Darwin UFS f1 SpeedStor
12 Compaq diagnost 61 SpeedStor a9 NetBSD f4 SpeedStor
14 Hidden FAT16 <3 63 GNU HURD or Sys ab Darwin boot f2 DOS secondary
16 Hidden FAT16 64 Novell Netware af HFS / HFS+ fb VMware VMFS
17 Hidden HPFS/NTF 65 Novell Netware b7 BSDI fs fc VMware VMKCORE
18 AST SmartSleep 70 DiskSecure Mult b8 BSDI swap fd Linux raid auto
1b Hidden W95 FAT3 75 PC/IX bb Boot Wizard hid fe LANstep
1c Hidden W95 FAT3 80 Old Minix be Solaris boot ff BBT
1e Hidden W95 FAT1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)
Command (m for help): print
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sda1 1 243202 1953514583+ fd Linux raid autodetect
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
な、なにかワーニングが。。
/dev/sdb,sdc,sdd も同様に。
そして、再起動すると、確かにdmesgの
[ 7.134502] unknown partition table
は消えましたが。。partedすると。。。
# parted /dev/sda
GNU Parted 1.8.8.1.159-1e0e
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Warning: /dev/sda contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid
fake msdos partition table, as it should. Perhaps it was corrupted -- possibly by a program that doesn't understand
GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a
GPT partition table?
Yes/No?
こっちでもワーニングがでることに。
結果、MBRにしか対応していない、fdiskでsystem typeを変更したので、GPT table がおかしくなってしまったようです。
そして、GPTはmdから正しく認識されない。
まぁ、いっか。ちゃんと動いていたし、もう一度partedでGPT tableを作り直します。
上のつづき
(parted) mklabel
Warning: The existing disk label on /dev/sda will be destroyed and all data on this disk will be lost. Do you want to
continue?
Yes/No? yes
New disk label type? [gpt]? gpt
(parted) print
Model: ATA Hitachi HDS72202 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
(parted) mkpart
Partition name? []? [Enter]
File system type? [ext2]? [Enter]
Start? 0
End? 2000GB
(parted) print
Model: ATA Hitachi HDS72202 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 2000GB 2000GB
(parted) set 1 raid on
(parted) print
Model: ATA Hitachi HDS72202 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 2000GB 2000GB raid
(parted) quit
Information: You may need to update /etc/fstab.
これをsdb,sdc,sdd でも繰り返します。
これで再起動するとはやり現れます。
[ 7.145004] md0: unknown partition table
そして、マウントします。
# bonnie++ -u root -b -d /st1/nao/tmp
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
fs 4G 70546 93 206169 34 71823 19 56805 89 225009 27 182.7 0
ここまで乗りかかったので、ついでに、RAIDの構築オプションを変えて
構築し直してみます。
構築し直すためには、
アンマウント
RAID停止
スーパーブロックの消去
RAID作成
のステップを踏みます。
# umount /st1
# mdadm --misc --stop /dev/md0
# mdadm --misc --zero-superblock /dev/sd[abcd]1
# mdadm --create /dev/md0 -v --raid-devices=4 --level=raid10 --chunk=128 /dev/sd[abcd]1
デフォルトでは、chunk size = 64kbですが、128kb にしてみました。
resync完了後、フォーマットします。
chunk size = 128kb にしたので、stride = 128, stripe-width = 512 に指定します。
# mke2fs -t ext4 -m 0 -E stride=128,stripe-width=512 /dev/md0
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
244195328 inodes, 976757184 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
29809 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
自動的にblock size = 4096 でフォーマットされました。
これでマウントして、
# bonnie++ -u root -b -d /st1/nao
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
fs 4G 68201 97 206894 36 71065 18 57637 85 147445 19 169.3 0
と、Readが明らかに低下しました。
そこで、stride = 128, stripe-width = 512 を指定しないでフォーマットし直してみます。
# mke2fs -t ext4 -m 0 /dev/md0
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
244195328 inodes, 976757184 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
29809 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 37 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
そして、再度。。
# bonnie++ -u root -b -d /st1/nao
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
fs 4G 73175 97 204599 37 71022 19 60744 88 163908 22 168.0 0
・・と若干Read向上。
やっぱりデフォルト値が最適ですか。。
最近よくハマるパターンです。
あれこれやるより「デフォルト値最適説」
ということで、chunk size = 64 で再度、RAID構築しなおします。
ほんとうは、
layout option も試したかったんだけどな。。
ベンチマーク時のマシンスペック
CPU:Intel E3200
HDD:HGST HDS722020ALA330
Linux software RAIDを使って、RAIDを構築。