HK1186537A

HK1186537A - Managing unreliable memory in data storage systems

Info

Publication number: HK1186537A
Application number: HK13113760.6A
Authority: HK
Inventors: J.布思; M-M．L．苏
Original assignee: 西部数据技术公司
Priority date: 2012-04-25
Filing date: 2013-12-11
Publication date: 2014-03-14

Description

Managing unreliable memory in a data storage system

Technical Field

The present disclosure relates to data storage systems, such as solid state drives, for computer systems. In particular, the present disclosure relates to managing unreliable memory in a data storage system.

Background

A non-volatile memory array may include defective locations, such as uncorrectable Error Correction Code (ECC) errors with high initial bit errors or pages that can correct ECC errors. These defects may be generated during the manufacture of the memory array or during the use of the memory array. For example, after a memory array is subjected to a large number of program-erase cycles (e.g., 30,000 cycles or more), a page of the memory array is more likely to encounter or generate a memory error. If the memory error is not resolved, the memory error may result in a loss of stored data. Accordingly, an improved apparatus and method for managing defective memory locations is desired.

Disclosure of Invention

Drawings

Systems and methods embodying various features of the present invention will now be described with reference to the following drawings, in which:

FIG. 1 illustrates a storage system that manages unreliable memory units according to one embodiment of the invention.

FIG. 2 is a flow diagram illustrating a process to manage unreliable memory units when performing memory access operations according to one embodiment of the invention.

FIG. 3 is a flow diagram illustrating a process for managing unreliable memory units when performing program operations according to one embodiment of the invention.

FIG. 4 is a graph illustrating voltage threshold distributions for memory cells in a page at a given level of program-erase cycles, according to one embodiment of the invention.

FIG. 5 is a graph illustrating voltage threshold distributions for memory cells in two pages of a block at two different program-erase cycle levels according to one embodiment of the present invention.

FIG. 6 is a graph illustrating voltage threshold distributions for memory cells in two pages of a block at a given level of a program-erase cycle, according to one embodiment of the invention.

Detailed Description

While specific embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the scope of protection.

SUMMARY

In some data storage systems (e.g., solid state storage systems), when a page of memory blocks becomes unreliable, the data storage system determines that such blocks should no longer be used for memory access operations. However, these blocks, which are deleted from current use, include a large number of reliable memory cells. Thus, the data storage system disclosed herein may track unreliable memory at a finer granularity than the block granularity, thereby enabling continued use of memory blocks that may otherwise be marked as unavailable or unreliable. The overall effect is to extend the usable life of the memory device. For example, the lifetime of some memory devices employing embodiments of the present invention may be extended beyond the number of manufacturer-guaranteed program/erase (PE) cycles. This may be particularly useful for the widespread use of multi-level cell (MLC) NAND, which has lower endurance (PE cycles) than single-level cell (SLC) NAND.

In some embodiments of the present invention, the data storage system manages unreliable memory units at a granularity of multiple pages, one page, or partial pages. The data storage system is configured to perform a memory access operation for a memory unit of the non-volatile memory array and detect a memory error indicating a failure to perform the memory access operation. In some embodiments, if a failure is detected, the data storage system adds an entry (entry) corresponding to the memory unit to the list of unreliable memory units, marking the memory unit as unreliable. Further, the data storage system may periodically copy (flush) the list of unreliable memory units from volatile memory to non-volatile memory.

In certain embodiments, the data storage system determines a total number of memory units in the memory block that are marked as unreliable. If the total number exceeds a selected threshold, the data storage system adds an entry corresponding to the block to a list of unreliable blocks, marking the block as unreliable.

In certain embodiments, a data storage system receives a request from a host system to perform a programming operation associated with a memory unit of a non-volatile memory array. In response, the data storage system selects a memory block containing the memory unit and utilizes the list of unreliable memory units and the list of unreliable blocks to determine whether the memory unit and the block are unreliable. If a memory unit or block is determined to be unreliable, the data storage system may select another reliable combination of memory unit and block to perform a programming operation (e.g., store data).

Overview of the System

FIG. 1 illustrates a storage system 120 that manages unreliable memory units, according to one embodiment of the invention. As shown, storage system 120 (e.g., a hybrid hard drive, a solid state drive, etc.) includes a controller 130 and a non-volatile memory array 150, the non-volatile memory array 150 including one or more memory blocks, identified as block "a" (152) through block "N" (154). Each block includes a plurality of pages. For example, block A (152) of FIG. 1 includes a plurality of pages, identified as pages A (153), B (155) through N (157). In some embodiments, each "block" is the smallest grouping of memory pages or locations of the non-volatile memory array 150 that are erasable in a single operation or unit, and each "page" is the smallest grouping of memory cells that can be programmed in a single operation or unit. (other embodiments may use differently defined blocks and pages). The term "memory unit" is used herein to refer to a group of memory locations that has fewer memory locations than memory blocks. For example, a unit of memory may include multiple pages, a page, or a partial page. However, in one embodiment, a memory unit may further refer to a group having a greater number of memory locations than memory blocks, e.g., 1.50, 2.0, 2.25 memory blocks, etc.

The controller 130 may be configured to receive data and/or storage access commands from the storage interface module 112 (e.g., a device driver) in the host system 110. The storage access commands passed by the storage interface 112 may include write commands and read commands issued by the host system 110. The read command and the write command may indicate logical block addresses in the storage system 120. The controller 130 may execute the received commands in the non-volatile memory array 150.

The controller 130 includes a memory management module 132. In one embodiment, the memory management module 132 manages the unreliable memory of the non-volatile memory array 150 at a finer level of granularity than the granularity of the memory block, such as the granularity of multiple pages, one page, or a partial page of memory (e.g., 4KB, 8KB, or 16 KB). In another embodiment, the memory management module 132 manages the unreliable memory of the non-volatile memory array 150 at a coarser granularity than the granularity of the blocks. To facilitate management of the unreliable memory, the controller 130 and/or the memory management module 132 maintains a list 134 of unreliable memory units, the list 134 of unreliable memory units including a plurality of entries corresponding to memory units labeled as reliable or unreliable. Further, the controller 130 and/or the memory management module 132 maintains an unreliable block list 136, the unreliable block list 136 including a plurality of entries corresponding to blocks marked as reliable or unreliable. The unreliable memory unit list 134 and the unreliable block list 136 may be stored external to the controller 130 (as shown in FIG. 1), internal to the controller 130, or partially internal to the controller 130 and partially external to the controller 130.

The controller 130 and/or the memory management module 132 may copy the unreliable memory unit list 134 and the unreliable block list 136 from volatile memory to non-volatile memory (e.g., the non-volatile memory array 150) to prevent loss of the unreliable memory unit list 134 and the unreliable block list 136 when power is lost to the volatile memory. For example, the unreliable memory list 134 may be copied periodically at certain intervals, or periodically in response to certain events, such as detecting an unreliable power source, finding an unreliable memory unit, finding several unreliable memory units, etc. In one embodiment, the unreliable memory unit list 134 is copied from one non-volatile memory to another non-volatile memory.

The non-volatile memory array 150 may be implemented using NAND flash memory devices. Other types of solid state memory devices may alternatively be used, such as flash integrated circuit arrays, chalcogenide RAM (C-RAM), phase change memory (PC-RAM or PRAM), programmable metallization cell RAM (PMC-RAM or PMCM), Ovonic Unified Memory (OUM), Resistive RAM (RRAM), NOR memory, EEPROM, ferroelectric memory (FeRAM), Magnetoresistive RAM (MRAM), other discrete NVM (non-volatile memory) chips, or any combination of the above. In one embodiment, non-volatile memory array 150 preferably includes a multi-level cell (MLC) device having multi-level cells capable of storing more than a single bit of information, although single-level cell (SLC) memory devices, or a combination of SLC and MLC devices, may also be used. In one embodiment, the storage system 120 may include other memory modules, such as one or more magnetic memory modules.

The storage system 120 may store data received from the host system 110. That is, the memory system 120 may be stored as a memory of the host system 110. To facilitate this functionality, the controller 130 may implement a logical interface. The logical interface may present the storage system memory to the host system 110 as a set of logical addresses (e.g., contiguous addresses) at which data may be stored. Internally, the controller 130 may map logical addresses to various physical memory addresses in the non-volatile memory array 150 and/or other memory modules.

Management of unreliable memory

FIG. 2 is a flow diagram illustrating a process 200 for managing unreliable memory units when performing memory access operations, according to one embodiment of the invention. Process 200 may be performed by controller 130 and/or memory management module 132. Advantageously, process 200 can extend the operational life of data storage system 120. After a block of the non-volatile memory array 150 has been determined to include unreliable memory units, the process 200 allows memory access operations to be directed to certain reliable memory units of the block.

At block 202, the process 200 performs a memory access operation for a unit of memory. The memory access operation may include a program operation or a read operation.

At block 204, the process 200 determines whether the execution of the memory access operation resulted in a memory error indicating an execution failure. For example, a memory error, such as an ECC error, a read error, or a program error, may be detected that indicates a failure to perform a memory access operation.

If execution does not result in a memory error indicating execution failure, at block 206, the process 200 continues with normal operations, such as performing a next memory access operation.

Alternatively, if the execution results in a memory error indicating an execution failure, the process 200 adds an entry corresponding to a memory unit to the unreliable memory unit list 134 at block 208. In one embodiment, the process 200 adds an entry to the unreliable memory unit list 134 that corresponds to the memory unit and other related memory units that may also become unreliable. For example, it can be seen through experimentation that certain memory units may become unreliable in pairs or groups, and thus, if one memory unit becomes unreliable, entries corresponding to other memory units in a pair or group may be added to the unreliable memory list 134. Further, the process 200 may also trigger the copying of the unreliable memory unit list 134 from volatile memory to non-volatile memory.

The unreliable memory unit list 134 may include a plurality of entries corresponding to memory units that are marked as reliable or unreliable. Advantageously, the list of unreliable memory units 134 is able to track unreliable memory at a finer level of granularity than the minimum level of granularity at which the non-volatile memory array can be erased as a unit. For example, the unreliable memory unit list 134 may correspond to multiple pages, one page, or partial pages of memory (e.g., 4KB, 8KB, or 16 KB). As another example, the unreliable memory unit list 134 may include entries corresponding to partial pages where the size of the partial pages matches the granularity of the ECC process of the data storage system 120 (e.g., the ECC process granularity may be 2KB and the page size may be 16 KB).

In one embodiment, the unreliable memory unit list 134 comprises a table. Each entry in the table corresponds to a unit of memory in the non-volatile memory array 150 that is labeled as reliable or unreliable. For example, the table may be stored as a bitmap, where each bit corresponds to a unit of memory. If a 0 value is stored, the corresponding memory unit may be marked as reliable. If a 1 value is stored, the corresponding memory unit may be marked as unreliable. In other cases, the labels for the 0 and 1 bit values may be reversed. Advantageously, this table design allows for fast access to data from the unreliable memory unit list 134. In some cases, the table may be compressed, thereby reducing the storage required to maintain the table.

In one embodiment, the unreliable memory unit list 134 includes entries corresponding to certain memory units of the non-volatile memory array 150. In this case, the unreliable memory unit list 134 may be stored as a series of linked lists, where blocks containing unreliable memory units are included in the linked lists. An example data structure according to an embodiment may include an 8-byte value encoding, as shown below.

The first byte (i.e., byte 0) may store a channel number and a chip number corresponding to the unreliable memory unit (e.g., bits 0 through 3 may store the channel number and bits 4 through 7 may store the chip number). The next two bytes (i.e., bytes 1-2) may store the block number corresponding to the unreliable memory unit. The next byte (i.e., byte 3) may store the first unreliable memory unit or offset of the block (e.g., if the NAND block includes 256 pages, the value 224 may represent the starting page number 224 in the block). The last four bytes (i.e., bytes 4-7) may store a bitmap of reliable or unreliable memory units in the block starting at the first unreliable memory unit or offset (e.g., the bitmap may include entries corresponding to reliable and unreliable pages, where each page starts at page 224 and ends at page 256). Advantageously, such a linked-list design may use less storage than a bitmap that may be used to store entries including all memory units corresponding to non-volatile memory array 150. Further, in one embodiment, dedicated software or hardware may be used to access the unreliable memory unit list 134, such as a linked list, to increase the speed of each lookup in the linked list.

In some embodiments, other storage and/or search methods may be employed to store and/or search the list of unreliable memory units 134. For example, a hash lookup, a balanced tree, or a binary tree may be used. Also, the unreliable memory unit list 134 may include entries corresponding only to reliable memory units or unreliable memory units, rather than entries corresponding to reliable and unreliable memory units.

At block 210, the process 200 determines a total number of unreliable memory units in the memory block corresponding to the memory unit. For example, the process 200 may reference the list 134 of unreliable memory units and calculate the total number of memory units within the block that are marked as unreliable.

At block 212, the process 200 determines whether the total number of unreliable memory units exceeds a threshold. The threshold may be an experimentally determined threshold where the access time to the remaining number of reliable memory units in the block does not determine (justify) the continued use of the block. In some cases, the threshold may be arbitrarily selected based on the percentage of pages in the block that are determined to be unreliable (e.g., when 25%, 50%, or 75% of the pages in the block are unreliable). Further, the threshold may vary from block to block depending on the rate of increase of the number of unreliable memory units in the block or in adjacent blocks.

If the total number of unreliable memory units does not exceed the threshold, process 200 moves to block 214. At block 214, the process 200 may continue with normal operations, such as performing a memory access operation on another memory unit that does not include an entry in the unreliable memory unit list 134 that marks the other memory unit as unreliable.

If the total number of unreliable memory units exceeds the threshold, process 200 moves to block 216. At block 216, the process 200 adds an entry corresponding to the memory block to the unreliable block list or bad block list 136. The unreliable block list 136 may include a number of entries corresponding to blocks marked as reliable or unreliable. The process 200 may then continue with normal operations, such as performing a memory access operation on another memory unit or block that does not have an entry in the unreliable memory unit list 134 or the unreliable block list 136 that marks the memory unit or block as unreliable.

FIG. 3 is a flow diagram illustrating a process 300 of managing unreliable memory units when performing a programming operation, according to one embodiment of the invention. A programming operation may be received from host system 100 and may be directed to or associated with a memory unit of a block of non-volatile memory array 150 that is selected for programming. Process 300 may be performed by controller 130 and/or memory management module 132.

At block 302, the process 300 determines whether the unreliable block list 136 includes an entry marking the block of the unit of memory as unreliable. If the unreliable block list 136 includes an entry marking the block as unreliable, the process 300 moves to block 304. At block 304, the process 300 performs a program operation in a different memory block (which is reliable). For example, the process 300 may select another memory unit from the different block that performs the programming operation. To determine the reliability of the different block, the process 300 may restart at block 302 and determine whether the unreliable block list 136 includes an entry marking the different block as unreliable.

If the unreliable block list 136 does not include an entry marking the block as unreliable, the process 300 moves to block 306. At block 306, the process 300 determines whether the list of unreliable memory units 134 includes an entry marking the memory unit as unreliable.

If the list of unreliable memory units 134 includes an entry marking the memory unit as unreliable, the process 300 moves to block 308. At block 308, the process 300 performs a program operation in the memory unit.

On the other hand, if the list of unreliable memory units 134 does not include an entry marking the memory unit as unreliable, the process 300 moves to block 310. At block 310, the process 300 performs a program operation in different, reliable memory units. The different, reliable units of memory may include units of memory of a block that do not have an entry in the unreliable memory unit list 134 or the unreliable block list 136 that labels the units of memory or block as unreliable. In one embodiment, the process 300 may perform the programming operation in a different, reliable memory unit that is in the same block and replaces the original memory unit. In another embodiment, the process 300 may perform the programming operation in different, reliable memory units in different blocks and replacing the original memory units. Once other units of memory or blocks of memory are selected, the process 300 may restart at block 302 and determine whether the unreliable block list 136 includes an entry marking the selected block as unreliable.

FIG. 4 is a graph illustrating voltage threshold distributions for memory cells in a page at a given program-erase cycle level (cycle level) according to one embodiment of the invention. Graph 400 shows the voltage threshold distribution of memory cells in a page at the 1000 program-erase cycle level in an MLC NAND flash memory after a random data pattern is programmed. The x-axis is the voltage code axis corresponding to the voltage level. The y-axis corresponds to the probability distribution of the cells in the page. As shown in the graph, the voltage threshold distribution of the cell forms relatively defined, narrow, discrete peaks at three approximate voltage reference levels, indicating an overall higher quality and level of reliability or endurance of the page.

FIG. 5 shows graphs of voltage threshold distributions for memory cells in two pages of an example block at two different program-erase cycle levels, according to one embodiment of the present invention. Graph 500 shows the voltage threshold distributions of the memory cells in two pages in an MLC NAND flash memory after a random data pattern is programmed. The x-axis is the voltage code axis corresponding to the voltage level. The y-axis corresponds to the probability distribution of the cells in the page. Series 1 and series 3 show page 0 of the block at the 1000 and 30000 program-erase cycle levels, respectively. Series 2 and series 4 show the page 250 of the block at the 1000 and 30000 program-erase cycle levels, respectively. In one embodiment, the same random data pattern has been written to page 0 and page 250. In another embodiment, different random data patterns have been written to page 0 and page 250.

As can be seen from series 1 and series 2, the voltage threshold distributions of the cells of page 0 and page 250 form relatively defined, narrow, separate peaks at three approximate voltage reference levels at 1000 program-erase cycle levels. This indicates an overall higher quality and level of reliability or endurance for page 0 and page 250, due to, among other reasons, these voltage threshold levels allowing further adjustment of when data is acquired. However, as seen from series 3 and series 4, the peak of the voltage threshold distributions of page 0 and page 250 become wider and shorter at the 30000 program-erase cycle level, indicating an overall reduced quality and level of reliability or endurance of the cell. Specifically, the peak of series 4 shows a wider and shorter peak than the peak of series 3, indicating that page 250 exhibits a lower quality and level of reliability or durability than page 0. Thus, some pages may advantageously be included in the unreliable memory unit list 134 before other pages, as some pages may exhibit a lower quality and level of reliability or endurance than other pages. For example, pages that are physically located closer to the end of a block may exhibit a lower quality and level of reliability or durability than other pages of the same block, as shown in FIG. 5. As discussed above, the unreliable memory list 134 enables pages with higher quality to continue to be used even though they may be in blocks with lower quality pages that can no longer be reliably used.

FIG. 6 is a graph illustrating voltage threshold distributions of memory cells in two pages of a block at a given level of a program-erase cycle, according to one embodiment of the present invention. Graph 600 shows the voltage threshold distributions of the memory cells in two pages in an MLC NAND flash memory after a random data pattern is programmed. The x-axis is the voltage code axis corresponding to the voltage level. The y-axis corresponds to the probability distribution of the cells in the page. Series 1 shows page 4 of the block at the 30000 program-erase cycle level. Series 2 shows a page 254 of the block at a level of 30000 program-erase cycles. In one embodiment, the same random data pattern has been written to page 4 and page 254. In another embodiment, different random data patterns have been written to page 4 and page 254.

As seen from series 1 and series 2, the peak of series 2 appears wider and shorter than the peak of series 1, indicating that sheet 254 exhibits a lower quality and level of reliability or durability than sheet 4. Thus, some pages may advantageously be included in the unreliable memory unit list 134 before other pages, as some pages may exhibit a lower quality and level of reliability or endurance than other pages. For example, pages that are physically located closer to the end of a block may exhibit a lower quality and level of reliability or durability than other pages in the same block, as shown in FIG. 6.

Other variants

It will be appreciated by those skilled in the art that other approaches and methods may be used to store and manage the unreliable memory unit list 134 of the non-volatile memory array 150 in some embodiments. Further, other events besides exceeding a threshold number of unreliable memory units may be used to determine when to include a block in the unreliable block list 136. For example, a block may be included at a particular time after an erase error or a significantly large number of determined unreliable memory units in the block. Additional system components may also be used, and the disclosed system components may be combined or omitted. For example, the host system 110 may be configured to store a copy of the unreliable memory unit list 134, or to cause the unreliable memory unit list 134 to be copied from volatile memory to non-volatile memory. Furthermore, the actual steps employed in the disclosed processes (e.g., the processes shown in fig. 2 and 3) may differ from those shown in the figures. Some of the above steps may be deleted and other steps may be added depending on the embodiment. Accordingly, the scope of the present disclosure is intended to be limited only by the following claims.

While specific embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Also, omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the protection. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit. For example, the systems and methods disclosed herein may be applied to hard drives, hybrid hard drives, and the like. In addition, other forms of storage may additionally or alternatively be used (e.g., DRAM or SRAM, battery backed volatile DRAM or SRAM devices, EPROM, EEPROM memory, etc.). As another example, the various components shown in the figures may be implemented as software and/or firmware on a processor, ASIC/FPGA, or dedicated hardware. Likewise, the features and attributes of the specific embodiments described above can be combined in various ways to form additional embodiments, all of which fall within the scope of the present disclosure. While the present disclosure provides certain preferred embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments that do not provide all of the features and advantages set forth herein, are also within the scope of the present disclosure. Accordingly, the scope of the present disclosure is to be limited only by the following claims.

Claims

1. A data storage system, comprising:

a non-volatile memory array; and

a controller configured to:

performing a memory access operation for a memory unit of the non-volatile memory array, the memory unit having fewer memory locations than a memory block, wherein the memory block is a minimum number of memory locations that are erasable as a unit;

detecting a memory error indicating a failure to perform the memory access operation; and

in response to detecting a memory error indicating a failure to perform the memory access operation;

adding an entry corresponding to the memory unit to a list of unreliable memory units, the list of unreliable memory units comprising a plurality of entries corresponding to memory units marked as unreliable,

thereby being erasable at a minimum level of granularity than the non-volatile memory array

Finer levels of granularity track unreliable memory,

wherein the operational lifetime of the data storage system is extended by allowing memory access operations to be directed to some reliable memory units in a memory block after the block has been determined to include unreliable memory units.

2. The data storage system of claim 1, wherein a block of memory comprises a plurality of pages, and wherein each entry in the unreliable memory unit list corresponds to a plurality of pages of memory, a page of memory, or a partial page of memory.

3. The data storage system of claim 2, wherein the size of the partial page of memory matches the granularity of error correction code processing.

4. The data storage system of claim 1, wherein the list of unreliable memory units comprises a table, and each entry in the table corresponds to a memory unit labeled as reliable or unreliable.

5. The data storage system of claim 1, wherein the list of unreliable memory units comprises a list, and each entry in the list corresponds to a memory unit labeled as reliable or unreliable.

6. The data storage system of claim 1, wherein the memory access operation comprises a program operation or a read operation.

7. The data storage system of claim 1, wherein in response to detecting a memory error indicating a failure to perform the memory access operation, the controller is further configured to:

determining, using the list of unreliable memory units, a total number of memory units within a memory block corresponding to the memory units that are marked as unreliable; and

in response to determining that the total number exceeds a threshold, adding an entry corresponding to the memory block to an unreliable block list.

8. The data storage system of claim 1, wherein the controller is further configured to:

storing the list of unreliable memory units in volatile memory; and

periodically copying the list of unreliable memory units from the volatile memory to the non-volatile memory array.

9. In a data storage system including a controller and a non-volatile memory array, a method for managing unreliable memory units, the method comprising:

performing a memory access operation for a memory unit of the non-volatile memory array, the memory unit having fewer memory locations than memory blocks, wherein a memory block is the minimum number of memory locations that are erasable as a unit;

thereby tracking unreliable memory at a finer level of granularity than the minimum level of granularity at which the non-volatile memory array can be erased.

10. The method of claim 9, wherein a block of memory comprises a plurality of pages, and wherein each entry in the unreliable memory unit list corresponds to a plurality of pages of memory, one page of memory, or a partial page of memory.

11. The method of claim 10, wherein the size of the partial page of the memory matches the granularity of error correction code processing.

12. The method of claim 9, wherein the list of unreliable memory units comprises a table, and each entry in the table corresponds to a memory unit labeled as reliable or unreliable.

13. The method of claim 9, wherein the list of unreliable memory units comprises a list, and each entry in the list corresponds to a memory unit labeled as reliable or unreliable.

14. The method of claim 9, wherein the memory access operation comprises a program operation or a read operation.

15. The method of claim 9, further comprising, in response to detecting a memory error indicating a failure to perform the memory access operation:

16. The method of claim 9, wherein the list of unreliable memory units is stored in volatile memory and periodically copied from the volatile memory to the non-volatile memory array.

17. In a data storage system including a controller and a non-volatile memory array, a method for storing data, the method comprising:

receiving a program operation associated with a first memory unit of the non-volatile memory array and data to be programmed, wherein the first memory unit has fewer memory locations than a minimum number of memory locations that are erasable as a unit;

selecting, for programming, a first memory block including the first memory unit;

determining whether a list of unreliable memory units includes an entry indicating that the first memory unit is unreliable; and

in response to determining that the list of unreliable memory units includes the entry:

selecting a second memory unit from the first memory block that does not have an associated entry on the list of unreliable memory units; and

storing the data in the second memory unit,

wherein the method is performed under control of the controller.

18. The method of claim 17, further comprising, prior to determining whether the list of unreliable memory units includes an entry indicating that the first memory unit is unreliable:

determining whether an unreliable block list includes an entry indicating that the first memory block is unreliable; and

in response to determining that the unreliable block list includes the entry:

selecting a second memory block that does not have an associated entry on the list of unreliable blocks; and

replacing the first memory unit with a third memory unit of the second memory block.

19. A data storage system, comprising:

a non-volatile memory array; and

a controller configured to:

storing the data in the second memory unit.

20. The data storage system of claim 19, wherein the controller is further configured to:

in response to determining that the unreliable block list includes the entry: