Monday, June 20, 2011

Data Remanence and Solid State Drives

Deep in the subterranean bowels of the Palatial Overclock Estate (a.k.a. the Heavily Armed Overclock Compound) is a box containing every disk drive from every computer that has ever graced our home. There is another box containing every personal digital assistant, every mobile phone, and any other device featuring persistent read-write storage that may contain personal information about myself or Mrs. Overclock.

We're not paranoid. We're just thinking clearly.

The issue here is data remanence: how to insure that data has really been erased from a storage device. Data that may contain passwords, account numbers, records, schedules, notes, and anything else that could be used and misused by a miscreant, a competitor, or a court of law.

We're not the only ones concerned about this. Mrs. Overclock, a.k.a. Dr. Overclock, Medicine Woman, has to carefully conform to regulations and laws concerning the privacy of the personal medical records of her patients, and that includes computerized medical records. Several times I've been on site at a major financial firm back east whose privacy policies to deal with their clients' personal financial data required their security people to roam their multi-building campus and confiscate any device, ranging from laptop to thumb drive, that was left unsecured and unattended at some poor careless slob's desk. And it seems like every day brings another news story about someone who bought a used computer only to discover that it contained a fascinating wealth of personal data from the previous owner.

I'm writing this article on my beloved MacBook Air, a slim, light, beautifully designed laptop. It is the first laptop I've ever used that does not contain a rotating disk like my other laptops and desktops. I'm using the term "rotating" here quite deliberately to refer to traditional disk drives that have rotating platters that store information magnetically. Nor does it have flash memory, which is a semiconductor device that stores data even when powered off but unlike a rotating disk has no moving mechanical parts, like my personal digital assistants. Instead it has a hybrid device: a solid-state drive (SSD). An SSD stores data persistently in a semiconductor device with no moving parts but which emulates a traditional rotating disk drive. This allows the SSD to be used transparently (more or less, as we shall soon see) in applications which expect a disk drive.

SSDs offer extreme ruggedness, lower power consumption, and potentially much higher performance than rotating disk drives, albeit at a much higher price. This makes them attractive in applications ranging from hand-held portable devices such as mobile phones and laptops (where the attraction is principally the ruggedness and lower power consumption) to very high end servers (for the performance and lower power consumption). At the same time, SSDs bring exciting, some might say terrifying, new issues to the realm of data remanence. To understand why, we have to briefly look at other forms of digital storage, and kind of sneak up from behind on how SSDs actually work on the inside.

Rotating Disks and Tapes

Rotating disks and tapes are inherently mechanical devices that store data magnetically and have to go through a lot of drama involving moving parts which you never see to read and write your data. While they require power to read and write, when powered-off they are capable of storing data for a long long time, years or even decades, without significant degradation. With some extremely expensive equipment you can actually see the tiny magnetic dots that form the bits of your data. How tiny? Modern magnetic media store many gigabits per square inch. A single disk or tape that you can hold in your hand can store several terabytes of data. A terabyte is a million million bytes. That's a lot of bytes.

Both rotating disks and tapes store data in a chunk or block of bytes, and the disk or tape drive reads or writes an entire block at a time. A typical disk drive like you have in your laptop stores data in fixed sized blocks of 512 bytes each. Lots of software, including the operating system running on your laptop, was designed, implemented, and optimized under the assumption on the data being stored in 512 byte blocks. This assumption will become important when we look at SSDs.

Both rotating disks and tapes incur a lot of latency in both read and write operations. In the case of a random read or write to a rotating disk, a mechanical arm containing a read-write head has to swing laterally across the surface of the magnetic platter to be positioned over the correct cylinder where the data resides, and then wait for the correct sector where the data resides within the cylinder to travel beneath the head as the platter rotates. These two delays are referred to as the seek latency and the rotational latency respectively, and together can amount to several milliseconds. Tapes incur seek latency as well, but since tapes are typically used for strictly sequential reads and writes, and tape drive hardware and software is optimized to keep the tape moving, tape drives can and typically do out perform disk drives in terms of raw read and write performance. Only a mad man, or someone with lots of time on their hands, uses tapes for random reads and writes. I've done it.

With rotating disk drives and tapes, the issue of data remanence has been explored thoroughly since the 1960s. The gold-standard in paranoia is the U.S. National Security Agency (NSA), whose publicly available document on this topic, the Storage Device Declassification Manual, recommends degaussing a disk or tape, that is, using a strong electro-magnetic to disrupt the magnetic properties of the media, then physically damaging the drive so that it is a little more obvious to the casual observer that it cannot be used. In both cases, incineration is a second option. Other sources recommend dropping the disk drive in an industrial shredder that reduces it to tiny metal bits.

Note that merely deleting a disk file is entirely inadequate. As everyone who has not been living in a cave realizes, deleting a file merely deletes its metadata, its name and information about its location on the disk, from a directory that is also stored on the disk. It may not even do that; it may just mark the file as deleted, leaving all the metadata at least temporarily in place. Unless you are using some very clever software, the data bits in the original file are all still on disk, where they can be recovered by some even cleverer software, as a lot of crooked politicians, child pornographers, and bent accountants have all discovered to their woe.

To truly erase a file from disk or tape, you must write new bits directly over the old bits of the file. This will keep all but the most motivated and technically advanced individuals from recovering the data. But there as been a lot research that has shown that even then, some, many, or even most of the overwritten bits can be recovered, or at least inferred with a high degree of certainty, using devices like electronic microscopes. Which is exactly why the NSA requires destructive degaussing if not outright incineration. Nuke it from orbit. It's the only way to be sure.

Random Access Memory

Random access memory (RAM) is a semiconductor device, a hunk of metal and silicon, with no moving parts. It is used to store data and software temporarily while it is in active use. RAM is what software developers mean when they say "memory". RAM stores data reliably only as long as power is applied to it. Although we think of RAM as been read or written a byte at a time, modern memory systems only read and write a word at a time to a single RAM device, where a word might be two, four, or even eight bytes in width. There is a lot of hard-wired logic in current microprocessors to handle the conversion between byte access and word access.

The NSA seems quite confident that RAM data is not recoverable after loss of power. However, there has been some research regarding supercooling certain types of RAM right at power off, then using yet more extremely expensive devices to read or infer the bits from the deeply frozen device. As paranoid as I am, I can't say I'm concerned about data remanence in RAM. But people who are looking at the ability to extract secret keys from cryptographic devices that are used, for example, in automated teller machines, are.

Flash Memory

Like RAM, flash memory is also a semiconductor device with no moving parts. It stores your data at the atomic level, as electrons that are trapped in the physical material of the device. Like rotating disk drives, flash memory is persistent, maintaining its data after being powered off. Data storage in flash memory isn't permanent, and its longevity isn't measured in decades like that of the magnetic storage of rotating disks or tapes. But it takes years for the electrical charges stored in a flash device to bleed out, long after you have upgraded to a larger thumb drive.

There are broadly two kinds of flash memory: NOR flash and NAND flash. NOR (literally "not or") and NAND ("not and") describe the kinds of logical operations the device emulates in its very low level operation. That's of less interest to us here than the other properties that make NOR and NAND flash differ from one another, and that makes flash memory of either variety very different from rotating disks, tapes, or even RAM.

NOR flash can read just like like random access memory: data in memory are referenced directly using just an address and hence read by software without much drama. In fact, if the hardware and operating system is suitably designed, software programs can be executed directly out of NOR flash where the system is none the wiser as long as it doesn't try to write to the NOR flash as if it were RAM. This capability is called execute in place (XIP).

NOR flash is written more like a disk drive: commands are sent to a hardware controller implemented inside the NOR flash device that writes data a fixed-sized page at a time. The size of a single flash page depends on the specific flash device, but typical sizes are two or four kilobytes, or several times the size of a typical logical disk block. So you can't just write a single byte or even word of NOR flash. You have to write an entire page, and you have to go through a pretty arcane series of steps to do it. That's how NOR flash differs functionally from RAM, and is also why NOR flash is used as a read-only memory (ROM).

NAND flash extends this disk drive-like interface to both writing and reading. Writes and reads to and from NAND flash is done in a fixed sized page of bytes at a time. Unlike NOR flash, you can't just read a single word. That's how NAND flash differs functionally from NOR flash.

Now here's how both NOR and NAND flash differ from everything we've talked about so far. Whereas blocks on disk and tapes, and words on RAM, can be read, written and over-written, flash memory of either variety requires a third operation: erase. This isn't an optional operation like exists in some disk or tape drives that overwrites existing data. A write operation in flash can only change a binary one to a binary zero. That's how it works right down at the atomic level.

If you want to change the value of that bit, you first have to erase it, turning it back into a binary one. A virgin flash device right from the factory will read as all binary ones, because it arrives at your door in the fully erased state. When you write to it a page at a time, the silicon is selectively turning ones to zeros in the semiconductor media of the flash device. If you want to turn zeros back into ones, you must first erase. Furthermore, an erase operation is done not a page at a time, but a block at a time. The size of a flash block again depends on the specific flash device, but a typical size is 128 kilobytes, many times the size of a flash page.

You can see this is getting complicated. You read and write pages, but you erase blocks, and you must erase an entire block in order to write over even a single page within that block. But it gets worse. Erases don't come for free. Every time you erase a block in a flash device, you are actually permanently altering the device. Do enough erases and eventually the flash device quits working. Unlike RAM and other semiconductor devices, flash wears out. And yes, this applies to consumer devices that use NAND flash, like a USB thumb drive or your tiny digital music player.

Again broadly speaking, flash memory is manufactured in two basic flavors: single-level cell (SLC), and multi-level cell (MLC), depending on how many bits are stored in each flash cell, the basic unit of flash storage. SLC stores a single bit, MLC more than one bit. SLC can endure about a hundreds of thousand erase cycles of each block before it quits working. MLC can endure about a tenth of that amount. On a price per bit, SLC is much more expensive, so most of the flash you find in consumer devices is MLC.

In terms of data remanence, both the NSA and I are pretty confident that when you erase a flash block, whether it be of the NOR or NAND variety, you run very little risk of that data ever being recovered.

Solid State Drives

Flash memory, and in particular NAND flash, fills in interesting niche in the ecosystem of persistent digital storage devices. A read of a page of NAND flash takes on the order of tens of microseconds, making it a thousand times faster than a rotating disk drive with all the mechanical bits it has to move. The write of a page of NAND flash takes on the order of hundreds of microseconds, still pretty good. But the erase of a block of NAND flash takes on the order of thousands of microseconds, meaning milliseconds. Not so good.

It is NAND flash that is used in SSDs. Underneath the hood, SSDs are one or more NAND flash devices on top of which is layered a very smart (or in my personal experience, sometimes not so smart) controller that tries to make the NAND flash look like a rotating disk drive. This is no small feat, since the controller has to emulate the disk by handling application reads and writes in 512 byte blocks, while physically reading and writing two or four kilobyte pages, erasing 128 kilobyte blocks, and trying its best to hide the hideous erase latency inherent in the flash device.

This real-time juggling act is the responsibility of what is generically referred to as the flash translation layer (FTL). The FTL has the following sometimes contradictory goals: transparently map logical disk blocks to pages of flash; hide the write page and most especially the erase block latency; and even out erasing and writing across the entire device so that all erase blocks wear at more or less at the same rate, a process that is called wear leveling. All of these goals are accomplished through an interlocking set of design and implementation strategies in the SSD controller.

An SSD contains more flash storage capacity than the capacity of the disk drive it emulates. It uses this extra flash capacity so that it can maintain a pool of pre-erased blocks to which to write, instead of forcing the application to wait for a lengthly erase cycle to complete. This is called over provisioning. SSD vendors are for the most part quiet on the amount of over provisioning, but at a recent SSD seminar I attended one vendor quoted 28% as being typical for their SSDs.

The mapping of logical disk blocks to flash pages is completely abstract. On a rotating disk, when you write to logical block 42, you are always writing to the same logical block in the same physical location on the disk platter. Okay, this is not strictly speaking true: bad block replacement management in the disk controller firmware may map logical block 42 to a replacement block somewhere else on disk, but it is always the same replacement block. But when you write to logical block 42 on an SSD you are always writing to a completely different flash page than the last time you wrote to logical block 42.

This means the controller embedded in the SSD has to maintain a very complex and dynamic mapping of logical disk blocks to flash pages, not just because disk blocks and flash pages are different sizes, but because the mapping changes every time you think you are overwriting a logical disk block with new data. Because this mapping has to persist between power cycles on the SSD, it is stored in the very same flash device in the SSD, in overhead flash pages inaccessible to you, to be re-read the next time the SSD is powered up. These overhead flash pages are themselves subject to the same wear leveling and mapping algorithms as the flash pages that hold your data.

When you write logical disk block 42, your data is written to a flash page that was erased when the erase block of which it is a part was erased. The flash page containing the disk block formerly known as 42 is slated for eventual erasure, but it is part of an erase block that may contain flash pages that contain logical disk blocks that have not been overwritten. The SSD coalesces multiple logical disk blocks containing good data into a single flash page and updates its mapping in a process known as garbage collection. This activity and the erasure of completely unused erase blocks is a more or less constant background activity on the part of the SSD.

Because multiple disk blocks are stored per flash page, and multiple flash pages are stored per erase block, and because of the abstract mapping between disk blocks and flash pages, there are patterns of disk block writing behavior that appear to be uncorrelated from an application point of view, but that can result in the same erase block being erased and rewritten many times. This is called write amplification. Because the mapping from disk blocks to flash pages is opaque to the application, this is hard to detect, and even harder to prevent.

Flash cells can only be read so many times before the captured charge is depleted. To deal with this, the SSD controller reads flash pages containing good data and rewrites the data to a flash page in an erased block in a process called static wear leveling. This is also a constant background activity on the part of the SSD.

Write latency may be further hidden by the inclusion of a RAM buffer inside the SSD. Logical disk blocks that are read may be cached in RAM, and logical disk blocks written may be buffered in RAM until an erased flash page becomes available and they can be written. The mappings used by the SSD to keep all of this straight may also be cached in RAM and written asynchronously as time permits.

Garbage collection, static wear leveling, and cache write back are asynchronous processes inside the SSD over which you have no control, or even the ability to be aware that they are occurring, through the disk interface presented to your system by the device.

The algorithms that implement the FTL are very complex, much more complex than the relatively simple state machine controllers implemented within the flash device itself. Indeed, inside every SSD is a powerful processor core on which the FTL runs. The SSD also implements the external electrical and command interface (e.g. IDE or SATA) that allows the SSD to connect to its host system as if it were a rotating disk drive. On at least some SSDs, the firmware that implements all of this is stored in the very same flash memory inside the SSD. The SSD controller may, in some sense, boot from itself.

The integrity of not just your data, but of the mapping metadata, and of the SSD's own firmware, is of such critical importance to the stability of the SSD that vendors place capacitors on the SSD to power its operation in the event of a power failure, until the SSD can reach a stable recoverable state. Apparently this sometimes even works. But there is a lot of anecdotal evidence that it frequently does not. In his popular software development blog Coding Horror, Jeff Atwood as written several articles on his own experiences with SSDs. He loves them. For their performance. But in the article "The Hot/Crazy Solid State Drive Scale" he says:
I feel ethically and morally obligated to let you in on a dirty little secret I've discovered in the last two years of full time SSD ownership. Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty.
The emphasis here is his. The comments for this article by his readership are similarly hair raising. There is a growing body of evidence that SSDs have to be treated with some care lest they be bricked, along with all your precious data. This is also in part why "Hard Power-Off Is Dead" will be a future article in my own blog.

But their reliability isn't my primary concern for this article. How do you deal with data remanence when using SSDs? They can't be degaussed like rotating disk drives because they aren't based on magnetic media. They can't be erased like NOR and NAND flash because even though they use NAND flash as their storage medium, they don't expose the low level flash erase function through their emulated disk drive interface.

Michael Wei and his colleagues at the Center for Magnetic Recording Research (CMRR) at U. C. San Diego very recently published a paper, "Reliably Erasing Data from Flash-Based Solid State Drives", trying to answer this question. What they found was that alternative methods of declassifying a disk drive, like overwriting all of the data on the drive, where not effective. You can overwrite every logical disk block on the drive. But thanks to RAM buffering, over provisioning, and background erasure, you can power off the SSD immediately after the overwrite operation completes, crack the SSD open, remove the flash components, plug them into a flash programming tool, and recover a significant portion of the supposedly overwritten data sitting in erase blocks pending erasure. Worse, because of the complex mapping between logical disk blocks and flash pages, sequentially overwriting the entire drive may cause erase blocks to be erased more than once, eating into the erase budget of the drive and reducing its longevity.

There is some hope. Many disk drive standards have some form of secure erase command that accomplishes the same thing as overwriting a disk block or even the entire disk. Most SSDs do not implement this command. Or, as Wei et al. point out, some say they do but it doesn't actually work. But as the SSD industry shakes out, a correct implementation of secure erase may become more common. Also, some disk drive standards have defined a trim command specifically in support of SSDs, where a logical disk block can be marked as no longer used without the necessity of writing over it. Few SSDs currently support this command, and as Wei et al. point out, even so this has no reliable effect on sanitizing an SSD. But again, as things shake out, an SSD controller might take all logical disk blocks being trimmed as a hint that all flash blocks used for logical disk block storage should be erased.

Until then, I find data remanence on solid state drives to be very problematic. SSDs are too new, alas, for the NSA to have an opinion, at least in the publicly available information to which I have access. I predict an industrial shredder will play a crucial role, making that expensive SSD even more expensive because it cannot be reused once it contains sensitive information. (I also question their use in mission critical applications because of questions concerning their reliability and stability, but that's a whole other soapbox.)

Flash-based file systems like JFFS2 and UBIFS used in Linux and other operating systems are FTLs implemented in software and that use raw flash devices. Although they have to deal with all the same issues as SSDs, data remanence isn't an problem because you have access to the low level erase functions on the flash device. You can simply unmount the file system and erase the flash using a standard utility program.

JFFS2 (and maybe UBIFS too, although I haven't had to dig into its internals) is a log-structured file system. It writes blocks not to a hierarchical structure but instead to a kind of sequential transaction journal. The file system that the application sees is the result of the file system software reading the journal and reconstructing the hierarchical file system in memory at mount time. This kind of architecture maps pretty well to flash memory. But how this is done by the FTL inside of SSDs is vendor proprietary and specific.

Given the growing popularity of SSDs, improving their performance, longevity, and reliability is an area of active research. At the 27th IEEE Symposium on Massive Storage Systems and Technologies (a.k.a. MSST 2011) conference I recently attended, there were many papers on the topic of improving FTLs or optimizing them for niche applications. Data remanence was only rarely mentioned, and then mostly by the cloud storage providers; never in the context of solid state drives.

Sources

O. Al Homaidi, Data Remanence: Secure Deletion of Data in SSDs, masters thesis, MCS-2009:3, Blekinge Institute of Technology, Sweden, 2009-02




Department of Commerce NIST Computer Security Division, Guidelines for Media Sanitization, Recommendations of the National Institute of Standards and Technology, NIST Special Publication 800-88, 2006-09

Department of Defense, National Industrial Security Program Operating Manual (NISPOM), DoD 5220.22-M, 2006-02-28

Department of Energy Cyber Security Program, Media Clearing, Purging, and Destruction Guidance, DOE CIO Guidance CS-11, 2007-01

D. Feenberg, Can Intelligence Agencies Read Overwritten Data?, National Bureau of Economic Research, 2004-05-14

P. Gutmann, "Secure Deletion of Data from Magnetic and Solid-State Memory", USENIX Security Symposium Proceedings, 1996-07

P. Gutmann, Data Remanence in Semiconductor Devices, IBM T.J. Watson Research Center

J. Handy, SSDs in Enterprise, Embedded, and Client Applications, Objective Analysis, AvNet SSD Seminar, Westminster Colorado, 2011-05-17

G. Hughes, T. Coughlin, Tutorial on Disk Drive Data Sanitization, UCSD

M. Murugan, D. Du, "Rejuvenator : A Static Wear Leveling Algorithm for NAND Flash Memory with Minimized Overhead", Proceedings of the 27th IEEE Symposium on Massive Storage Systems and Technologies, Denver Colorado, 2011-05

National Security Agency/Central Security Service, Storage Device Declassification Manual, 9-12, (supercedes NSA/CSS Manual 130-2), 2000-11-10

National Security Center, A Guide to Understanding Data Remanence in Automated Information Systems, NCSC-TG-025, Library No. 5-236,082, Version-2

Royal Canadian Mounted Police Technical Security Branch, IT Media Overwrite and Secure Erase Products, Lead Agency Publication B2-002, 2009-05

S. Skorobogatov, Data Remanence in Flash Memory Devices, University of Cambridge, Computer Laboratory, UK


M. Wei et al., "Reliably Erasing Data from Flash-Based Solid State Drives", Center for Magnetic Recording and Research, U. C. San Diego, 9th USENIX Conference on File and Storage Technologies, 2011-02

Wikipedia, Data remanence

Wikipedia, Flash memory

C. Wright, D. Kleiman, S. Sundhar R. S., Overwriting Hard Drive Data: The Great Wiping Controversy, Springer-Verlang, 2008

No comments: