Posts Tagged ‘ZFS’

Sometimes giving in is easier

December 13th, 2009

OpenSolaris’ ZFS implementation recently picked up one of the tastiest things it possibly could: block level dedupe.

Except I no longer care.

Too impatient to wait for the RMA on the dead Asus P5Q-E (of which the replacement is now a spare swap-in board), thanks to an incredibly generous friend I picked up a Gigabyte GA-EP45-Extreme… which OpenSolaris b127 hated, and refused to boot with. After a few days of hair pulling and switching off almost everything I could in the BIOS to try and rectify the issue, I finally admitted OpenSolaris was not to be.

The Gigabyte GA-EP45 Extreme, great board, hated by OpenSolaris

The Gigabyte GA-EP45 Extreme, great board, hated by OpenSolaris

Not willing to risk Nexenta, I dropped to FreeBSD 8, the last bastion of ZFS hope (no folks, FUSE does not count).

FreeBSD worked wonderfully from a compatibility front, but I soon discovered that when it came to virtualisation, it had the same options as a prisoner faced with the Spanish inquisition: basically none. There is, ironically, a version of Sun’s VirtualBox floating around, but it’s a hack job that hates 64-bit, and like most things FreeBSD if you’re not running from the command line you’re asking for pain.

And so, hoping that one day Larry Ellison would open up ZFS licensing a little more so the GPL crowd would stop whining and just integrate it already, I sighed, flicked the 3ware 9650SE into hardware RAID 6 and reached for the Ubuntu 9.10 64-bit disc.

It worked.

Post mortem: List of controller cards that will work with OpenSolaris

While I note with grim satisfaction that Areca has still failed to produce a Solaris driver for it’s ARC-1300ix series, here’s a list of PCI-Express cards known to work with OpenSolaris without requiring any RAID 0/JBOD workarounds, and being able to control at least eight drives.

  1. LSI SAS3081E-R
  2. Intel SASUC8I flashed with the SAS8031E-R’s IT (initiator target) firmware
  3. 3ware 9650SE series

Tiny, yes? The last, which I ended up with due to non-availability of the first two in Australia, is significantly more expensive as it has hardware RAID capability as well.

Post mortem: Final system

Rack: HP 10622
OS: Ubuntu 9.10
PSU: Corsair TX-850
CPU: Intel Q9550
Memory: 8GB Corsair Dominator PC-2 8500
Motherboard: Gigabyte GA-EP45 Extreme
GPU: Geforce 7600GS silent (to be swapped out with a PCI card when a second 3ware controller card is bought)
Controller card: 3ware 9650SE-8LPML
Network card: HP NC364T
Case: Chenbro RM41416B
UPS: APC Smart-UPS 750
Switch: Netgear GS724T
System drives: Samsung HD501LJ SATA
Array drives (RAID 6 w/XFS): WD RE3 1TB x3, Samsung HD103UJ 1TB x2, Seagate 7200.11 x2, Seagate 7200.12

The only problem left is the Seagate 7200.12, which seems to keep dropping from the array. I’ll have to see if a firmware update to the 3ware card fixes it, otherwise I may need to swap in a new drive (Update: turns out the ridiculously expensive Mini-SAS to SATA cables I bought were dodgy. Upon replacing, I’ve had no dropouts).

The continuous controller conundrum

June 8th, 2009

Strike another one off the list — the HP Smart Array P400 doesn’t present drives through JBOD to the OS, only through RAID 0.

This adds an extra layer of complexity to rebuilding disks, as when a disk fails, the card assumes a RAID 0 array has died, regardless of what you’re doing with ZFS. Apart from removing the ability to yank a disk on a live array then pop it back in and continue as normal, this adds extra overhead as the card is managing RAID 0 data for every drive attached to it on top of the RAID-Z already being done on the software side. Bad, bad, bad.

The LSI MegaRAID SAS 84016E

The LSI MegaRAID SAS 84016E

We have a new contender though, the LSI MegaRAID SAS 84016E (also known as the Intel SRCSASPH16I), which definitely has OpenSolaris driver support, but as usual is not available in Australia (the Intel is, but is over AU$1,000). It’s more expensive than the vapourous ARC-1300ix-16, thanks to it being PCI-E 8x rather than 4x. It’s also a true RAID card with 256MB of memory, and can handle up to RAID 60 thanks to a 500MHz Intel IOP333 processor.

PC Pitstop sells them at US$689, and the site even has a section saying it ships to Australia. Now if only a certain eBay seller wasn’t selling it for almost US$100 cheaper with free shipping…

Then there’s the Intel RAID Controller SRCSATAWB. This is a modified LSI MegaRAID SAS 8708ELP, doesn’t work in PCI-E 2.0, seems to have virtually the same featureset as the 84016E, but with only two mini-SAS ports. EYO Drop Shipping is currently selling it for AU$576.18.

The Intel RAID Controller SRCSATAWB

The Intel RAID Controller SRCSATAWB

For both, the manuals mention nothing about JBOD, which may resign them to the same scrap heap as the P400. They do mention virtual drives, but these seem to only be accessible when creating an array. There’s no mention of running single drives in order to access software RAID.

Edit: Neither card offers JBOD functionality. At this rate I’ll end up buying the crazily expensive Adaptec 31605 just to get working gear.

One step forwards, two steps back

February 16th, 2009

Some purchasing has recently happened to start the file server project:

  • Intel Q9550 ~ AUD$450
  • 8GB Corsair DDR2 8500 ~ AUD$380
  • MSI P7N Diamond ~ AUD$360

MSI’s P7N Diamond was chosen for one point alone — four PCI-e x16 slots. While a lot of boards have a number of physical x16 slots, they fail to back this up electrically beyond two slots. The MSI board has three x16 electrical slots, with the fourth yellow one being an x8 — perfect for expansion.

The P7N Diamond has just the right amount of PCI-E lanes to satisfy our expansion needs.

The P7N Diamond has just the right amount of PCI-E lanes to satisfy our expansion needs.

OpenSolaris 2008.11 was installed on this setup, on a 500GB drive hooked up to one of the NV sata ports, a DVD drive hooked up to the JMB363 controlled IDE port, a previously acquired GeForce 7600GS inserted, alongside a HighPoint RocketRaid 2340. For kicks, an Intel X25-E was hooked up to check out some awesome transfer speeds.

It wasn’t to be.

Things I’ve learned:

  • OpenSolaris loves the MSI board, pretty much enabling everything. While it recognises the X-Fi sound, sound does not actually work. This isn’t a deal breaker. To my never ending surprise, JMB363 seems to work just fine.
  • Turning off AHCI only results in the rear eSATA ports turning off.
  • Most curiously, OpenSolaris will not recognise the X25-E drive at all. Whether this is related to the NV sata ports or otherwise, I do not know.
  • The HighPoint RocketRaid 2340 is not supported. The dual Marvell 88SX6081 chips on it technically are with voodoo beyond the install process, but are the cause of some problems. These have been patched it seems, but all up it seems less trouble to grab something based off LSI chipsets. While FreeBSD certainly supports the 2340, once again the sturdiness of its implementation of ZFS gives me pause.
  • There’s something called Solaris eXpress Community Edition, which abbreviates to the unfortunate SXCE, or “sexy”. It’s basically a beta containing future code, and sadly also didn’t recognise the X25-E, 2340 or X-Fi.

The remaining options are few to be able to set up a 16 drive array in Solaris. Either acquire the Adaptec 31605 for around AUD$1200, or two HP P400s for around AUD$700. Obviously the HP option is significantly cheaper – so long as it works.

While Solaris may seem ideal, it certainly isn’t cheap to get working thanks to limited hardware support. It could seriously be a wait for Snow Leopard and some Hackintoshing, although this is much better suited to an Intel board than this 780i.

Developments, plodding along

January 15th, 2009

A few things have occured since the last postings, on both the file server and media centre fronts. I figured I should document them before I forget.

Media Centre

  • Apple did not release an updated Mac Mini, so we’re back to waiting on Nvidia’s Ion, which had some impressive demos at CES2009. Steve is a bit busy dying, so there’s obviously other things to focus on (although rumours keep on spinning).
  • XBMC 9.04, due in April, will feature not only Dolby TrueHD decode, but Blu-ray container support (M2TS/M2T/MTS) and the ability to load a file through an external player. Since Media Player Classic can run without GUI, this should work seamlessly. DTS-HD doesn’t seem to be there yet, unless it’s known under some other name I’m not aware of. Either way, a big step along the way to becoming the software of choice. We’ll have to wait and see if it’ll load the Blu-ray disc automatically though, or if you need to point it right at the M2Ts files.
  • After some reading around the net, I’ll have to test out Windows Home Server as a base OS. Otherwise at this stage to save pain it will most likely be a straight XP Professional install. While XBMC’s focus is Linux, I don’t expect easy Blu-ray playback to hit that platform any time soon.
Apparently a 2.5-inch drive can fit in the Ion reference case.

Apparently a 2.5-inch drive can fit in the Ion reference case.

File Server

  • Zebra over at Speedlabs suggested I’d need more than 4GB RAM to make sure Windows Server 2008 virtualisation is snappy. May as well double it to 8GB!
  • Finding out if HighPoint’s RocketRaid 2340 is OpenSolaris compatible is nigh on impossible without simply buying it, even with journalist contacts. If anyone knows somebody within HighPoint, please let me know.
  • Apparently ZFS on FreeBSD is stable so long as you run the 64-bit version, and have over 1GB of RAM according to a friend who has played with it for the last year. It might have to be a reserve option.
  • Crap. I have two of these drives, and Seagate is going all Apple on there being no acknowledgment. Very, very vexing.

The only thing holding up the purchasing of equipment is finding out about the HighPoint card — so here’s hoping I can dig up the information soon.

Building a file server: an exercise in compromise

December 23rd, 2008

If you want to build a moderately high end file server (at the consumer level, anyway), the path is fraught with with a number of traps.

It’s also quite costly if you want a system that will suffer minimal downtime, require little attention after setting up, and has a decent amount of longevity built in.

I recently acquired the gorgeous Chenbro RM41416B, complete with SATA backplane, slim DVD drive and triple redundant PSU from eBay. The slim DVD drive was IDE — a potential problem — but otherwise this thing reaches perfection for the home enthusiast, happily taking either ATX or eATX boards.

The Chenbro RM41416B

The Chenbro RM41416B

File and Operating Systems

We have a few issues though.

For a start, RAID 5 is not enough — firstly because a one disk redundancy is too little (I have been burned before, losing a stack of data with a simultaneous double hard drive failure); and secondly, because of the RAID 5 write hole.

The solutions aren’t many. There’s RAID 6, which allows two hard drives to fail before things go pear-shaped, but still suffers from write hole issues. Hardware RAID is expensive and often proprietary, while NAS’ are great but often invoke the same proprietary issues. And both are subject to single-point-of-failure with the potential of not being able to recover your data should identical replacement hardware not be available.

The answer is clearly in software, and in this case the saviour is the Zetabyte File System (ZFS).

ZFS is the brainchild of Sun, the company whose name several people curse for the existence of Java, which first slowed down our PCs, and is now busy slowing down phones. Despite this blight, Sun has managed to end up with a few nifty things that make up for it, including ZFS. It’s created quite a stir online, and can be found in Solaris, FreeBSD, and soon Apple OSX Server 10.6 (which may be runnable on PC, assuming some wizardry can be performed). OSX 10.6 is slated for around July 2009, although if we’re lucky, it might come sooner. There are two things appealing to me in ZFS: it avoids the RAID write hole by checksumming everything, and can create Snapshots amazingly quickly based on diffs (that is, an initial snapshot of a file will take up zero space, as the file changes, more information will be added to the snapshot so it can revert it to the original file). So it looks like RAIDZ2, the ZFS equivalent of RAID6, is the order of the day.

I haven’t used FreeBSD before (and hence this may be unfounded), but I’m a little nervous about the quality of the implementation of ZFS. It would make more sense to use OpenSolaris, being Sun’s OS, however the Hardware Compatibility List (HCL) is, well, not very complete, and learning Solaris after coming from a Linux background is like the death of a thousand cuts — just trying to install Nano or FlashPlayer for FireFox gave me a headache. It also has nowhere near the package management ease of Ubuntu, but don’t expect that to get usable ZFS any time soon due to ridiculous posturing. Finally there’s Nexenta, which promises everything and more, but something about that makes me nervous as well.

Resolution: Use OpenSolaris with ZFS, and attempt to get a whole bunch of Intel based hardware on the basis that it should be (gulp) supported.

Cost: AUD$0

Hard Drives

For expansion’s sake, 1TB 7,200RPM hard drives are the order of the day, each costing around AUD$175. Slightly out of my control, I came into possession of five Western Digital RE3 1TB hard drives, all from the same batch. Ideally there shouldn’t be hard drives from the one batch in the one array, as they’re likely to fail at the same time — however it’s also hard to ignore free TBs.

It's hard to ignore free TBs, even if they're in the same batch.

It's hard to ignore free TBs, even if they're in the same batch.

I’d like to set up two arrays of eight drives, the first array as quickly as possible (as free space is waning), the second over a number of months. This means that for the first array, I don’t have time to purchase and manually age hard drives in an attempt to get different batches. There’s a few ways around this: buy different models and brands (of which there are a finite number), or buy from different stores and hope they have different batches. I’m opting for the first, while considering the RE3s, and playing to RAIDZ2′s tolerances.

The proposed arrays then, are as follows:

Array 1:

  1. Western Digital RE3
  2. Western Digital RE3
  3. Seagate Barracuda 7200.11 (Older batch)
  4. Seagate Barracuda 7200.11 (Newer batch)
  5. Samsung F1
  6. Western Digital WD10EADS
  7. Western Digital WD1001FALS
  8. Western Digital RE3 (Hot Spare)

Array 2

  1. Western Digital RE3
  2. Western Digital RE3
  3. Samsung F1
  4. Seagate Barracuda 7200.11
  5. Samsung F1
  6. Western Digital WD10EADS
  7. Western Digital WD1001FALS
  8. Seagate Barracude 7200.11 (Hot Spare)

Hitachi’s and other server grade hard drives are sadly just too expensive. Another option would be to set up three arrays of five drives — this would minimise having to use similar drives in each, and in the event of catastrophic failure would mean the loss of 3TB of usable data versus 5TB. Still in terms of balance, I believe the two array options is superior. From what I read, the Hot Spares can be shared between pools as well, which would add another level of safety.

Resolution: Kit out Array 1 first, then slowly acquire the drives for Array 2.

Cost: AUD$1,750 over multiple months (not including the six drives already owned)

Motherboards and Storage Controllers

The Chenbro chassis features 16 hot-swappable 3.5″ bays, along with a 5.25″ bay, a 3.5″ internal and floppy bay, and a slimline DVD bay. Acquiring a motherboard with a suitable number of SATA ports to do the case justice  is a pointless task — most with beyond six feature a controller that is not well supported outside of Windows (like the annoyingly prevalent JMicron JMB363, although JMicron has claimed increased compatibility of late), and those that exceed eight tend to use a chip to mirror two of the ports.

Ah, so that's the sound of one pain clapping.

Ah, so that's the sound of one pain clapping.

Since Intel dropped support for IDE from its chipsets, vendors are using third party chipsets (yep, that dastardly JMB363 again) to do the job, making IDE pointless to use outside of Windows, which increases our need for SATA ports once more. So it’s time to look at controllers.

For a start, PCI is out. PCI uses a shared bus of 133MB/s, and our array of hard drives will punish it way beyond its capabilities. PCI-X is also out — while it’s theoretically capable of just over a GB/s (64-bit @ 133MHz = 1,064MB/s), we may as well use the significantly faster PCI Express (PCI-E), which can handle a nice bidirectional 250MB/s per lane.

Taking into account that a 1TB hard drive can reach around 130MB/s average read, bundling two ports per PCI-E lane isn’t too terrible a thing to do. It’s not optimal for future expansion should things get faster (as SSDs already are), but it does save on cost — and we’re essentially only crippling internal speeds, as external to the box we’ll be limited to gigabit ethernet anyway.

Thing is, getting a controller with eight or more ports without a RAID engine (as we don’t need one, thanks to ZFS) is next to impossible, driving up the cost considerably.  One thing that is certain, at this number of drives you’re getting a Serial Attached SCSI (SAS) or Mini-SAS controller, with a set of breakout cables. Typically a breakout cable will fan out to four SATA connectors, and a SAS based card will work with SATA drives just fine.

Intel makes something that’s close, but it’s sold in batches of five and seems to only be overseas. The next sensible option seems to be some second hand HP SmartArray P400s, which do have OpenSolaris support, and go for around AUD$350 at either Systemax or GraysOnline.

This is where we run into issues with the motherboard again — in that to use two P400s, you need two PCI-E 8x slots, wiping out any chance of using a PCI-E graphics card. Even the fancy boards with three PCI-E x16 ports are crippled — while they’re all certainly the right length, electrically it’s only a pair of 16x and one 4x.

You could get a PCI video card, but at this stage it’s a choice of two evils — Nvidia’s FX5200 (which has known problems in Windows displaying widescreen resolutions like 1,680×1,050 over DVI), or ATI’s Radeon 9250 (and ATI’s drivers equal pain in the Linux world, let alone Solaris).

So it’s here you start thinking about server boards with integrated video like Tyan’s i5400PL and adding in an eight port controller card, and weighing it against the cost of getting a standard board with a HighPoint RocketRaid 2340. Of course Sun’s HCL isn’t particularly helpful in mentioning support for either the XGI Z9S GPU on the Tyan, or the HighPoint card (although HighPoint cites FreeBSD support and offers an open source Linux driver, which is a good start). The Adaptec 31605 is on the HCL, but costs around AUD$600 more.

Resolution: To test JMB363 usability and speed in Solaris before committing further.

Cost: Undetermined

Networking

Most chipsets on motherboards these days use CPU cycles to run themselves effectively. To this end, a dedicated Intel PCI-E network card may provide increased performance — it would also be interesting to see the effect of teaming/bonding across multiple ports, and where the performance ceiling is. Mind you, this once again falls into the trap of not having enough PCI-E lanes on a motherboard to support a controller, video card and whatever else may be included.

It looks sexy, but is the extra performance worth the cost?

It looks sexy, but is the extra performance worth the cost?

Resolution: Acquire some Intel cards and do performance testing.

Cost: Undetermined.

Memory and CPUs

For the sake of education, I’d like to run a virtualised copy of Windows Server 2008 on top of Solaris. To this end, I figure 4GB RAM should be enough for now — as to whether I can get away with unbuffered DIMMs depends on the motherboard employed.

From the CPU side, it’s quad core all the way. Thanks to our friends at SpeedLabs, we know that ZFS is multithreaded, and loves on-die cache — although it would be fascinating to see the exact scalability of this during a RAID rebuild or file operations. With this in mind we’ll either end up with a Socket 771 Xeon 5405, or Socket 775 Core 2 Quad Q9550, as both feature 12MB of L2 cache. The cheaper Core 2 Quad 9450 sadly seems to not exist in stock in this country.

Resolution: Figure out motherboard first.

Cost: Undetermined.