Dual-Mode SSD for Optimal Hyper-Scale Infrastructure Performance

With the proliferation of new applications like artificial intelligence (AI), Internet of Things (IoT), big data, and cloud computing, today’s hyper-scale data centers run far more diversified and complex workloads than ever before. These different applications can have drastically different I/O patterns, performance or Quality of Service (QoS) targets, usage models, and sometimes require different sets of features from storage devices. Traditional architectures, consisting of only elastic computing and object storage resources, can no longer support these high demands.

Limitations of Traditional Hardware

Traditional storage architecture is designed around standardized hardware that provides generic block I/O interface, and software stack that is built on top of such abstract, generic block device. While this architecture has advantages of portability and backward compatibility, it is however, facing serious challenges in today’s hyper-scale data centers.

  • Standard hardware must conform to certain specs and has limited room for customization. It is difficult to adapt to many different I/O patterns or use.
  • In traditional architecture, hardware and software are designed and optimized separately without knowing each other. This separation creates a major obstacle for further optimization.
  • Standard hardware is mostly a black box to host software – it conceals most of its internal mechanisms in order to create an illusion of “generic block device” to host software. The drawback of such encapsulation is that software has no control on performance once I/O reaches the device.

What Is Open Channel SSD?

Despite the superior performance of SSDs over HDDs, SSDs still suffer from shortcomings, such as tail latencies and resource under-utilization. In response to these shortcomings, a new type of SSDs, dubbed “Open Channel SSD” has been proposed. The main idea of Open Channel SSD is to open up hardware capabilities and physical address space to the host. By doing so, the host software can improve utilization of hardware resource for better performance. To achieve this goal, some functionalities that used to be performed by the SSD firmware are moved to the host.

In conventional Open Channel software architecture, the Open Channel SSD interacts with applications through a set of kernel-space drivers – the Open Channel NVMe driver, Open Channel subsystem driver, and typically a block device target driver. The traditional Open Channel software stack relies heavily on Linux kernel and kernel-mode drivers. Although reliable, this architecture can be disadvantageous, particularly for high-speed applications.

A New Standard for Cloud Storage

To solve this problem, the Alibaba Infrastructure Services team has developed a Dual-mode SSD (Solid State Drive), a storage device that supports both customized Open-Channel mode and native NVMe mode.

  • Native NVMe Mode (Device-based Mode). This means the device behaves just like a standard NVMe SSD, as mentioned in the previous section.
  • Open Channel Mode. In this mode, device talks with host through Alibaba Cloud’s customized Open Channel command interface.

An optimal software/hardware integrated solution based on this Dual-mode SSD is also currently being deployed to Alibaba’s internal servers. It is expected that this novel storage system will lead to a 75% reduction in read latency and enhance the overall storage performance of data centers by as many as five times.

The increasing proliferation of AI and cloud computing has led to more sophisticated demands in data centers, while traditional storage systems face severe limitation in meeting such demands. In light of these challenges, Alibaba has pioneered the research and development of a new storage system, the Dual-mode SSD infrastructure. This underscores our commitment to driving the innovation and optimization of technology infrastructure in a new AI and cloud era,” said Shu Li, Senior Staff Engineer at Alibaba Infrastructure Services.

By creating and sharing the Dual-mode SSD specification, we are also working with different manufacturers on related firmware and hardware products, leading to the fast development of SSD-centered infrastructure and ecosystems.” Shu Li, Senior Staff Engineer at Alibaba Infrastructure Services.

The Leading Edge in Cloud Storage

Taking Alibaba Cloud as an example, we have numerous different applications serving our business units and customers, such as E-Commerce, Search, Online Promotion, Multimedia, Financial Service, Logistic Service, and Cloud Computing. Some of our applications demand features that are not available in standard SSDs. Our application requirements also change frequently, therefore storage system must be agile and quick-responding.

Alibaba Cloud tackled these challenges with hardware/software co-design approach using the Dual-Mode SSD. We combine in-house SSD hardware with first-hand understanding of applications and use cases, and work closely with business teams to design and optimize the entire I/O stack. The result is a set of hardware/software integrated solutions that are highly optimized for applications in our data center.

The dual-mode SSD demonstrates Alibaba Cloud’s consistent effort to pursue performance improvements in hyperscale data centers with hardware/software co-design approach. We develop the in-house dual-mode SSD that supports both NVMe device-based mode and Open Channel mode. On software side, we develop User-Space Open Channel I/O stack that closely integrates SSD hardware, firmware, driver, operation system together with our applications.

Furthermore, dual-mode SSD demonstrates the promising potential of hardware/software jointed optimization with the use case of advanced I/O scheduling. Evaluation results show the proposed dual-mode SSD deployed in hyperscale infrastructure reduces access latency by 75%, and improves 99th percentile latency by 5.8 times.

