Howard Marks

Network Computing Blogger

Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

See more from this blogger

Disk Drive Failure: An Unavoidable Reality

The folks at Backblaze, the online backup service that’s probably as well known for its ultra-low cost storage pods as it is for its core business, recently released some data based on its experience with more than 25,000 disk drives. As anyone who read the Google and Carnegie Mellon studies a few years ago would expect, disk drives in the real world fail at a much higher rate than the under 1% per year vendors tout on their spec sheets.

In fact, Backblaze reports that about 22% of its disk drives fail in their first four years -- an annual failure rate of over 5%. As expected, the distribution of drive failures over time forms a bathtub-shaped curve. Some drives suffer from infant mortality, as manufacturing defects of one sort or another cause parts to fail within the first year or less. Others fail at random and as drives age, parts or lubricants fail, increasing the failure rate.

I’m sure some of you are thinking, “But Howard, Backblaze doesn’t only -- or even primarily -- use enterprise-class disk drives. My array vendor does qualification and testing of every drive it sells me. That’s why I’m happy to pay two to five times what a Seagate or Western Digital drive would cost from for an HP/HDS/EMC nearline SAS or SATA drive.” Those “special” drives must have higher reliability, mustn’t they?

When disk drives were in short supply due to the floods in Thailand last year, Backblaze went so far as to encourage users to buy Seagate USB drives from Costco and send them to Backblaze to be shucked of their cases and USB controllers.

It’s true that while a casual observer may believe, to paraphrase Gertrude Stein, that a 4TB disk is a 4TB disk is a 4TB disk, there are some significant differences between nearline and consumer disk drives.

[Read about Seagate's new Kinetic Open Storage platform, an architecture that creates smart disk drives with Ethernet interfaces in "Seagate Boosts Disk Drive Intelligence."]

The most significant are in the firmware, where nearline drives -- which are assumedly connected to a RAID controller of some sort -- report errors faster while consumer drives continue to retry. Nearline drives are also programmed to react better to the sympathetic vibrations set up by large groups of drives in arrays, and some vendors use more powerful magnets in their nearline drive positioners. These differences can result in better performance, as demonstrated in a video by Sun Microsystems.

At least one vendor has started shipping its shingled recording drives in USB cases under the theory that the USB port acted as a bottleneck that would hide the lower random IO performance of the shingled drive. If Backblaze used some of those disks, they would perform quite differently than conventional drives.

Google’s experience demonstrated that there wasn’t a significant reliability difference between consumer and enterprise disks. Clearly, Google, which has been known to use Velcro to hold disk drives to motherboards, represents a less-than-optimal environment for disk drives. The same holds for Backblaze, which offered a first-generation pod that -- in my opinion -- was a bit lacking in the vibration mitigation department. Both make up for it in the application software layer.

Still, as your organization’s data custodian, you should plan for a 5% AFR and count yourself lucky if you have fewer drive failures. That means not only keeping spares and service contracts up to date but also using more advanced data protection models than simple RAID-5 single parity to ensure failed drives can be rebuilt from the remaining disks.

Once you accept that disk drives are less reliable than you thought, the more predictable failures SSDs can suffer as they face write endurance exhaustion should be a bit less scary.

Related Reading

More Insights

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Research and Reports

Network Computing: April 2013

TechWeb Careers