SSD reliability in the real world: Google's experience

Author
TheMaartian
Max Output Level: -47.5 dBFS
  • Total Posts : 2774
  • Joined: 2015/05/21 18:30:52
  • Location: Flagstaff, AZ
  • Status: offline
2016/02/25 10:20:50 (permalink)

SSD reliability in the real world: Google's experience

From ZDNet this morning:
=================
Using data from millions of drive days in Google datacenters, a new paper offers production lifecycle data on SSD reliability. Surprise! SSDs fail differently than disks - and in a dangerous way. Here's what you need to know.
 
SSDs are a new phenomenon in the datacenter. We have theories about how they should perform, but until now, little data. That's just changed.
 
The FAST 2016 paper Flash Reliability in Production: The Expected and the Unexpected, (the paper is not available online until Friday) by Professor Bianca Schroeder of the University of Toronto, and Raghav Lagisetty and Arif Merchant of Google, covers:
  • Millions of drive days over 6 years
  • 10 different drive models
  • 3 different flash types: MLC, eMLC and SLC
  • Enterprise and consumer drives

KEY CONCLUSIONS

  • Ignore Uncorrectable Bit Error Rate (UBER) specs. A meaningless number.
  • Good news: Raw Bit Error Rate (RBER) increases slower than expected from wearout and is not correlated with UBER or other failures.
  • High-end SLC drives are no more reliable than MLC drives.
  • Bad news: SSDs fail at a lower rate than disks, but UBER rate is higher (see below for what this means).
  • SSD age, not usage, affects reliability.
  • Bad blocks in new SSDs are common, and drives with a large number of bad blocks are much more likely to lose hundreds of other blocks, most likely due to die or chip failure.
  • 30-80 percent of SSDs develop at least one bad block and 2-7 percent develop at least one bad chip in the first four years of deployment.

THE STORAGE BITS TAKE

Two standout conclusions from the study. First, that MLC drives are as reliable as the more costly SLC "enterprise" drives. This mirrors hard drive experience, where consumer SATA drives have been found to be as reliable as expensive SAS and Fibre Channel drives.
 
One of the major reasons that "enterprise" SSDs are more expensive is due to greater over-provisioning. SSDs are over-provisioned for two main reasons: to allow for ample bad block replacement caused by flash wearout; and, to ensure that garbage collection does not cause write slowdowns.
 
The paper's second major conclusion, that age, not use, correlates with increasing error rates, means that over-provisioning for fear of flash wearout is not needed. None of the drives in the study came anywhere near their write limits, even the 3,000 writes specified for the MLC drives.
 
But it isn't all good news. SSD UBER rates are higher than disk rates, which means that backing up SSDs is even more important than it is with disks. The SSD is less likely to fail during its normal life, but more likely to lose data.
post edited by TheMaartian - 2016/02/25 10:36:20

Intel i7 3.4GHz, 16 GB RAM, 2 TB HD Win10 Home 64-bit Tascam US-16x08
Studio One 4 Pro NotionMelodyne 4 Studio Acoustica 7 Guitar Pro 7
PreSonus FaderPort Nektar P6 M-Audio BX8 D2 Beyerdynamic DT 880 Pro
NI K9U XLN AK, AD2 AAS VS-2, GS-2, VA-2, EP-4, CP-2, OD Toontrack SD3, EZK
#1

6 Replies Related Threads

    mettelus
    Max Output Level: -22 dBFS
    • Total Posts : 5321
    • Joined: 2005/08/05 03:19:25
    • Location: Maryland, USA
    • Status: offline
    Re: SSD reliability in the real world: Google's experience 2016/02/25 11:23:03 (permalink)
    It is good to see this published as Google is (I believe) the number one single-point drive consumer. My SSD is 4.5 years old now and I have pretty religiously backed it up to magnetic media (in anticipation of the catastrophic failure), but can also confirm the UBER issue as the pagefile.sys was on the SSD for a long time and began causing random BSODs (unknown as to why) until I intercepted a memory error message just prior to the BSOD. Shifting the pagefile.sys to magnetic media resolved this. Benchmarks of that same SSD have shown about a 10% performance reduction in 4 years as well, but for "primarily read operations" they have proven their worth.
     
    Two definite system files to keep off an SSD are the virtual memory (pagefile.sys) and Windows Indexing file. Stick both of those guys on a magnetic drive.

    ASUS ROG Maximus X Hero (Wi-Fi AC), i7-8700k, 16GB RAM, GTX-1070Ti, Win 10 Pro, Saffire PRO 24 DSP, A-300 PRO, plus numerous gadgets and gizmos that make or manipulate sound in some way.
    #2
    tlw
    Max Output Level: -49.5 dBFS
    • Total Posts : 2567
    • Joined: 2008/10/11 22:06:32
    • Location: West Midlands, UK
    • Status: offline
    Re: SSD reliability in the real world: Google's experience 2016/02/25 13:35:50 (permalink)
    Personally I regard SSDs as semi-consumables. At least until a format with a more assured life expectancy is developed. I don't regard HDDs as especially reliable either. Between 2011 and 2013 I had three new Seagate Barracudas fail without warning, the second two being warranty replacements for the first.

    Which means using a backup strategy that's effective. Unfortunately there are millions and millions of people with laptops and other devices containing only SSDs who never think of backing data up until they learn the hard way why it's a good idea.

    As for the swap file, in performance terms putting that on an SSD can make a huge difference. Just be prepared to replace the drive once every few years.

    Sonar Platinum 64bit, Windows 8.1 Pro 64bit, I7 3770K Ivybridge, 16GB Ram, Gigabyte Z77-D3H m/board,
    ATI 7750 graphics+ 1GB RAM, 2xIntel 520 series 220GB SSDs, 1 TB Samsung F3 + 1 TB WD HDDs, Seasonic fanless 460W psu, RME Fireface UFX, Focusrite Octopre.
    Assorted real synths, guitars, mandolins, diatonic accordions, percussion, fx and other stuff.
    #3
    Cactus Music
    Max Output Level: 0 dBFS
    • Total Posts : 8424
    • Joined: 2004/02/09 21:34:04
    • Status: offline
    Re: SSD reliability in the real world: Google's experience 2016/02/25 20:10:03 (permalink)
    Interesting stuff, I have an IT friend who has so far refused to use SSD drives. It supprised me they were very strongly opposed to them for main drives in office and laptops machines. The reason given was "Seen to many things go wrong and coruption of data". 
    I have SSD drives in 3 laptops as the only drives. They are all only a year old. 
    I also have my DAW and our office computer on SSD drives for drive C. All data drives are larger 7200 RPM regulars drives. 
    I guess the plan will be to swap them out every 3 years. Possibly by then they will have ironed out these issues.  I certainly don't think I'd trust the larger SSD drives yet. Seems to far to fast. 
     

    Johnny V  
    Cakelab  
    Focusrite 6i61st - Tascam us1641. 
    3 Desktops and 3 Laptops W7 and W10
     http://www.cactusmusic.ca/
     
     
    #4
    robert_e_bone
    Moderator
    • Total Posts : 8968
    • Joined: 2007/12/26 22:09:28
    • Location: Palatine, IL
    • Status: offline
    Re: SSD reliability in the real world: Google's experience 2016/02/26 04:56:36 (permalink)
    I spent BIG money on large SSD's a few years back, maybe 2012, when they first came out at the 512 GB size.  I think they were each around $700-$800, and one failed miserably within the first few MONTHS.
     
    It freaked me out so bad that a high-end drive touting its reliability and all that would fail so quickly, that I rethought my approach to building my monster machine (32 GB memory and liquid cooling and 2 high end video cards and such).  Anyways, I returned the piece of junk failed drive and got store credit, and got 2 separate 2 TB SATA III 7,200 drives and a 43" HDTV with the store credit from returning the bad SSD.
     
    Things have gotten MUCH better with them, but I am still quite careful to backup things of value to me from any SSD drive FAR more often than my 7,200 drives.
     
    Bob Bone
     

    Wisdom is a giant accumulation of "DOH!"
     
    Sonar: Platinum (x64), X3 (x64) 
    Audio Interfaces: AudioBox 1818VSL, Steinberg UR-22
    Computers: 1) i7-2600 k, 32 GB RAM, Windows 8.1 Pro x64 & 2) AMD A-10 7850 32 GB RAM Windows 10 Pro x64
    Soft Synths: NI Komplete 8 Ultimate, Arturia V Collection, many others
    MIDI Controllers: M-Audio Axiom Pro 61, Keystation 88es
    Settings: 24-Bit, Sample Rate 48k, ASIO Buffer Size 128, Total Round Trip Latency 9.7 ms  
    #5
    tlw
    Max Output Level: -49.5 dBFS
    • Total Posts : 2567
    • Joined: 2008/10/11 22:06:32
    • Location: West Midlands, UK
    • Status: offline
    Re: SSD reliability in the real world: Google's experience 2016/02/26 08:17:34 (permalink)
    Cactus Music
    Interesting stuff, I have an IT friend who has so far refused to use SSD drives. It supprised me they were very strongly opposed to them for main drives in office and laptops machines. The reason given was "Seen to many things go wrong and coruption of data". 
    I have SSD drives in 3 laptops as the only drives. They are all only a year old. 
    I also have my DAW and our office computer on SSD drives for drive C. All data drives are larger 7200 RPM regulars drives. 
    I guess the plan will be to swap them out every 3 years. Possibly by then they will have ironed out these issues.  I certainly don't think I'd trust the larger SSD drives yet. Seems to far to fast. 
     


    One of the most common causes of damage to office laptops with conventional drives is people moving the laptop around, dropping it "only" a couple of inches onto their desks, or turning it on then slinging it into fashionable desk-tidy accessories that hold it vertical while it's running and connected to a monitor. Basically, repeatedly doing things that can shock the internal HDD. Which is one reason (besides speed and power saving) laptops fitted with SSDs are increasingly common. For anyone buying them in large quantities removing HDD damage is a worthwhile potential saving. And if you buy your laptops by the dozen, odds are you have an industrial-strength backup regime and the laptops will be replaced in three or four years anyway (even if you still run XP on them).

    The days of the laptop HDD are probably numbered.

    Sonar Platinum 64bit, Windows 8.1 Pro 64bit, I7 3770K Ivybridge, 16GB Ram, Gigabyte Z77-D3H m/board,
    ATI 7750 graphics+ 1GB RAM, 2xIntel 520 series 220GB SSDs, 1 TB Samsung F3 + 1 TB WD HDDs, Seasonic fanless 460W psu, RME Fireface UFX, Focusrite Octopre.
    Assorted real synths, guitars, mandolins, diatonic accordions, percussion, fx and other stuff.
    #6
    azslow3
    Max Output Level: -42.5 dBFS
    • Total Posts : 3297
    • Joined: 2012/06/22 19:27:51
    • Location: Germany
    • Status: offline
    Re: SSD reliability in the real world: Google's experience 2016/02/26 08:20:37 (permalink)
    robert_e_bone
    It freaked me out so bad that a high-end drive touting its reliability and all that would fail so quickly, that I rethought my approach to building my monster machine (32 GB memory and liquid cooling and 2 high end video cards and such).  Anyways, I returned the piece of junk failed drive and got store credit, and got 2 separate 2 TB SATA III 7,200 drives and a 43" HDTV with the store credit from returning the bad SSD.

    "Moster" is relative. After 2 years, I am still lucky with our recent servers, 256GB of ram, OS on 2 mirrored SSDs and 60TB in RAID6, no bad drives so far

    Sonar 8LE -> Platinum infinity, REAPER, Windows 10 pro
    GA-EP35-DS3L, E7500, 4GB, GTX 1050 Ti, 2x500GB
    RME Babyface Pro (M-Audio Audiophile Firewire/410, VS-20), Kawai CN43, TD-11, Roland A500S, Akai MPK Mini, Keystation Pro, etc.
    www.azslow.com - Control Surface Integration Platform for SONAR, ReaCWP, AOSC and other accessibility tools
    #7
    Jump to:
    © 2024 APG vNext Commercial Version 5.1