If you want to build a moderately high end file server (at the consumer level, anyway), the path is fraught with with a number of traps.
It’s also quite costly if you want a system that will suffer minimal downtime, require little attention after setting up, and has a decent amount of longevity built in.
I recently acquired the gorgeous Chenbro RM41416B, complete with SATA backplane, slim DVD drive and triple redundant PSU from eBay. The slim DVD drive was IDE — a potential problem — but otherwise this thing reaches perfection for the home enthusiast, happily taking either ATX or eATX boards.

The Chenbro RM41416B
File and Operating Systems
We have a few issues though.
For a start, RAID 5 is not enough — firstly because a one disk redundancy is too little (I have been burned before, losing a stack of data with a simultaneous double hard drive failure); and secondly, because of the RAID 5 write hole.
The solutions aren’t many. There’s RAID 6, which allows two hard drives to fail before things go pear-shaped, but still suffers from write hole issues. Hardware RAID is expensive and often proprietary, while NAS’ are great but often invoke the same proprietary issues. And both are subject to single-point-of-failure with the potential of not being able to recover your data should identical replacement hardware not be available.
The answer is clearly in software, and in this case the saviour is the Zetabyte File System (ZFS).
ZFS is the brainchild of Sun, the company whose name several people curse for the existence of Java, which first slowed down our PCs, and is now busy slowing down phones. Despite this blight, Sun has managed to end up with a few nifty things that make up for it, including ZFS. It’s created quite a stir online, and can be found in Solaris, FreeBSD, and soon Apple OSX Server 10.6 (which may be runnable on PC, assuming some wizardry can be performed). OSX 10.6 is slated for around July 2009, although if we’re lucky, it might come sooner. There are two things appealing to me in ZFS: it avoids the RAID write hole by checksumming everything, and can create Snapshots amazingly quickly based on diffs (that is, an initial snapshot of a file will take up zero space, as the file changes, more information will be added to the snapshot so it can revert it to the original file). So it looks like RAIDZ2, the ZFS equivalent of RAID6, is the order of the day.
I haven’t used FreeBSD before (and hence this may be unfounded), but I’m a little nervous about the quality of the implementation of ZFS. It would make more sense to use OpenSolaris, being Sun’s OS, however the Hardware Compatibility List (HCL) is, well, not very complete, and learning Solaris after coming from a Linux background is like the death of a thousand cuts — just trying to install Nano or FlashPlayer for FireFox gave me a headache. It also has nowhere near the package management ease of Ubuntu, but don’t expect that to get usable ZFS any time soon due to ridiculous posturing. Finally there’s Nexenta, which promises everything and more, but something about that makes me nervous as well.
Resolution: Use OpenSolaris with ZFS, and attempt to get a whole bunch of Intel based hardware on the basis that it should be (gulp) supported.
Cost: AUD$0
Hard Drives
For expansion’s sake, 1TB 7,200RPM hard drives are the order of the day, each costing around AUD$175. Slightly out of my control, I came into possession of five Western Digital RE3 1TB hard drives, all from the same batch. Ideally there shouldn’t be hard drives from the one batch in the one array, as they’re likely to fail at the same time — however it’s also hard to ignore free TBs.

It's hard to ignore free TBs, even if they're in the same batch.
I’d like to set up two arrays of eight drives, the first array as quickly as possible (as free space is waning), the second over a number of months. This means that for the first array, I don’t have time to purchase and manually age hard drives in an attempt to get different batches. There’s a few ways around this: buy different models and brands (of which there are a finite number), or buy from different stores and hope they have different batches. I’m opting for the first, while considering the RE3s, and playing to RAIDZ2′s tolerances.
The proposed arrays then, are as follows:
Array 1:
- Western Digital RE3
- Western Digital RE3
- Seagate Barracuda 7200.11 (Older batch)
- Seagate Barracuda 7200.11 (Newer batch)
- Samsung F1
- Western Digital WD10EADS
- Western Digital WD1001FALS
- Western Digital RE3 (Hot Spare)
Array 2
- Western Digital RE3
- Western Digital RE3
- Samsung F1
- Seagate Barracuda 7200.11
- Samsung F1
- Western Digital WD10EADS
- Western Digital WD1001FALS
- Seagate Barracude 7200.11 (Hot Spare)
Hitachi’s and other server grade hard drives are sadly just too expensive. Another option would be to set up three arrays of five drives — this would minimise having to use similar drives in each, and in the event of catastrophic failure would mean the loss of 3TB of usable data versus 5TB. Still in terms of balance, I believe the two array options is superior. From what I read, the Hot Spares can be shared between pools as well, which would add another level of safety.
Resolution: Kit out Array 1 first, then slowly acquire the drives for Array 2.
Cost: AUD$1,750 over multiple months (not including the six drives already owned)
Motherboards and Storage Controllers
The Chenbro chassis features 16 hot-swappable 3.5″ bays, along with a 5.25″ bay, a 3.5″ internal and floppy bay, and a slimline DVD bay. Acquiring a motherboard with a suitable number of SATA ports to do the case justice is a pointless task — most with beyond six feature a controller that is not well supported outside of Windows (like the annoyingly prevalent JMicron JMB363, although JMicron has claimed increased compatibility of late), and those that exceed eight tend to use a chip to mirror two of the ports.

Ah, so that's the sound of one pain clapping.
Since Intel dropped support for IDE from its chipsets, vendors are using third party chipsets (yep, that dastardly JMB363 again) to do the job, making IDE pointless to use outside of Windows, which increases our need for SATA ports once more. So it’s time to look at controllers.
For a start, PCI is out. PCI uses a shared bus of 133MB/s, and our array of hard drives will punish it way beyond its capabilities. PCI-X is also out — while it’s theoretically capable of just over a GB/s (64-bit @ 133MHz = 1,064MB/s), we may as well use the significantly faster PCI Express (PCI-E), which can handle a nice bidirectional 250MB/s per lane.
Taking into account that a 1TB hard drive can reach around 130MB/s average read, bundling two ports per PCI-E lane isn’t too terrible a thing to do. It’s not optimal for future expansion should things get faster (as SSDs already are), but it does save on cost — and we’re essentially only crippling internal speeds, as external to the box we’ll be limited to gigabit ethernet anyway.
Thing is, getting a controller with eight or more ports without a RAID engine (as we don’t need one, thanks to ZFS) is next to impossible, driving up the cost considerably. One thing that is certain, at this number of drives you’re getting a Serial Attached SCSI (SAS) or Mini-SAS controller, with a set of breakout cables. Typically a breakout cable will fan out to four SATA connectors, and a SAS based card will work with SATA drives just fine.
Intel makes something that’s close, but it’s sold in batches of five and seems to only be overseas. The next sensible option seems to be some second hand HP SmartArray P400s, which do have OpenSolaris support, and go for around AUD$350 at either Systemax or GraysOnline.
This is where we run into issues with the motherboard again — in that to use two P400s, you need two PCI-E 8x slots, wiping out any chance of using a PCI-E graphics card. Even the fancy boards with three PCI-E x16 ports are crippled — while they’re all certainly the right length, electrically it’s only a pair of 16x and one 4x.
You could get a PCI video card, but at this stage it’s a choice of two evils — Nvidia’s FX5200 (which has known problems in Windows displaying widescreen resolutions like 1,680×1,050 over DVI), or ATI’s Radeon 9250 (and ATI’s drivers equal pain in the Linux world, let alone Solaris).
So it’s here you start thinking about server boards with integrated video like Tyan’s i5400PL and adding in an eight port controller card, and weighing it against the cost of getting a standard board with a HighPoint RocketRaid 2340. Of course Sun’s HCL isn’t particularly helpful in mentioning support for either the XGI Z9S GPU on the Tyan, or the HighPoint card (although HighPoint cites FreeBSD support and offers an open source Linux driver, which is a good start). The Adaptec 31605 is on the HCL, but costs around AUD$600 more.
Resolution: To test JMB363 usability and speed in Solaris before committing further.
Cost: Undetermined
Networking
Most chipsets on motherboards these days use CPU cycles to run themselves effectively. To this end, a dedicated Intel PCI-E network card may provide increased performance — it would also be interesting to see the effect of teaming/bonding across multiple ports, and where the performance ceiling is. Mind you, this once again falls into the trap of not having enough PCI-E lanes on a motherboard to support a controller, video card and whatever else may be included.

It looks sexy, but is the extra performance worth the cost?
Resolution: Acquire some Intel cards and do performance testing.
Cost: Undetermined.
Memory and CPUs
For the sake of education, I’d like to run a virtualised copy of Windows Server 2008 on top of Solaris. To this end, I figure 4GB RAM should be enough for now — as to whether I can get away with unbuffered DIMMs depends on the motherboard employed.
From the CPU side, it’s quad core all the way. Thanks to our friends at SpeedLabs, we know that ZFS is multithreaded, and loves on-die cache — although it would be fascinating to see the exact scalability of this during a RAID rebuild or file operations. With this in mind we’ll either end up with a Socket 771 Xeon 5405, or Socket 775 Core 2 Quad Q9550, as both feature 12MB of L2 cache. The cheaper Core 2 Quad 9450 sadly seems to not exist in stock in this country.
Resolution: Figure out motherboard first.
Cost: Undetermined.