Paul Tuckfield spoke at the mySQL conference in April about optimizing Youtube.
Toward the end of the talk, Paul shifts to a mainly system-oriented focus on optimization and presents a few tips:
This entry was posted on Sunday, February 10th, 2008 at 12:38 am and is filed under Stories. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.
February 19, 2008 at 7:54 pm
I’d like to see you take this a bit farther and talk about the different types of RAID and which ones work best – 10, 50, 2, 3 and which ones work best for which database. Also, is there still a focus on installing the data on a DB native filesystem (installing on raw) or does that not come into play much anymore.
February 19, 2008 at 8:32 pm
I’ve never used RAID2, or 3. The most popular RAID levels remain 1, 5
and 10.
RAID50 seems too odd a hybrid and I haven’t seen it be useful in
applications I’ve worked on.
RAID50 would be using disk stripes of, say 3 disks, and then striping
them across a RAID0 configuration. That means your minimal RAID50
would be 6 disks, and you have two disks used for parity.
RAID5 is quite slow because all disks become enabled when a write
occurs. This becomes a real problem when your array exceeds 4 disks
because you have five spindles running for each write operation. With
RAID50, you can stripe the data and reduce writes. If each disks is
1tb, your final storage would be 4tb, with no hot spares.
You can contrast that with, say a RAID10, which is very fast and
optimizes protection. Using the same 6 disks described above, you’d
end up with only 3tb, but you’d have increased performance and
protection in case of disk problems.
This is important because it’s common to find that when a RAID fails,
another disk in the enclosure will fail soon after.
Ultimately, in large environment, you won’t find much use of RAID50
because the industry has moved away from this model of
storage. Organizations like Google use individual hosts as storage
nodes and replicate data across nodes. In these cases you’re limited
to the storage in your computer enclosure and you’re inclined to
either treat the entire host as disposable (ie no protection) or be
focused on reliability (RAID1/RAID10).
On the other end of the spectrum are SAN solutions which use more
complex RAID solutions which provide greater protection and scability.
And lastly, you’re seeing and will continue to see a move away from
block level RAID to hybrid LVM/filesystem solutions such as found in
Sun’s ZFS and Hammer filesystem (which is still in development). In
these schemes, the filesystem itself provides the RAID abstraction-
this means the RAID can be more flexible (able to accomodate different
sized disks, resize dynamically, etc.) and provide faster data access.
The Sun Thumper is a designed around ZFS and has been reported to
provide good speed at low cost.