Phison CEO Discusses Memory Bottlenecks in AI and the Future of 244TB SSDs

Phison's CEO shares insights on memory limitations in AI models, the significance of 244TB SSDs, and the challenges of high-bandwidth flash technology.

Updated Jan 14, 2026
Phison CEO Discusses Memory Bottlenecks in AI and the Future of 244TB SSDs
Sarah Collins

Sarah Collins

Computing Editor

Specializes in PCs, laptops, components, and productivity-focused computing tech.

The technology sector is increasingly focused on GPUs as the backbone of AI infrastructure, yet the true limiting factor for model performance is memory.

In an extensive interview, Phison CEO Pua Khein Seng, the inventor of the first single-chip USB flash drive, explained to our publication that the emphasis on computational power has overshadowed a fundamental constraint that affects everything from local AI inference on laptops to large-scale AI data centers.

“In AI models, the real bottleneck isn’t computing power - it’s memory,” Pua stated. “If you don’t have enough memory, the system crashes.”

Compensating for DRAM Limits

This insight underpins Phison’s aiDAPTIV+ initiative, which the company unveiled at CES 2026. This technology aims to enhance AI processing on integrated GPU systems by utilizing NAND flash as a memory pool.

Pua described it as leveraging SSD capacity to offset DRAM limitations, allowing GPUs to concentrate on computation rather than waiting for memory access.

“Our invention uses SSDs as a complement to DRAM memory,” he explained. “We use this as memory expansion.”

A key objective is to enhance responsiveness during inference, particularly the Time to First Token (TTFT), which measures the delay between submitting a prompt and receiving the first output. Pua argues that prolonged TTFT can make local AI feel ineffective, even if the model ultimately completes the task.

“If you ask your device something and have to wait 60 seconds for the first word, would you wait?” he questioned. “When I ask something, I can wait two seconds. But if it takes 10 seconds, users will think it’s garbage.”

Pua connects improvements in TTFT to better utilization of memory-intensive inference data, particularly KV cache, likening it to a doctor repeating the same instructions to every patient due to a lack of saved information between visits.

“In AI inference, there’s something called KV cache - it’s like cookies in web browsing,” he elaborated. “Most systems don’t have enough DRAM, so every time you ask the same question, it has to recompute everything.”

Phison’s strategy, Pua added, is to “store frequently used cache in the storage” so that the system can quickly retrieve it when a user repeats or revisits a query.

This memory-centric approach extends beyond laptops to how companies design GPU servers. Pua noted that many organizations purchase additional GPUs not for computational throughput, but to increase VRAM, leading to inefficient use of resources.

“Without our solution, people buy multiple GPU cards primarily to aggregate memory, not for compute power,” he remarked. “Most of those expensive GPUs end up idle because they’re just being used for their memory.”

If SSDs can provide a larger memory pool, Pua asserts that GPUs can be acquired and scaled for computation instead. “Once you have enough memory, then you can focus on compute speed,” he noted, “if one GPU is slow, you can add two, four, or eight GPUs to enhance computing power.”

244TB SSDs

Pua then shifted focus to the economics of hyperscalers and AI infrastructure, describing the current surge in GPU investments as essential but incomplete, as the business case for AI relies on inference, which in turn depends on data storage.

“CSPs have invested over $200 billion in GPUs,” he stated. “They’re not making money directly from GPUs. The revenue comes from inference, which requires massive data storage.”

He encapsulated the situation with a recurring statement: “CSP profit equals storage capacity.”

This perspective also aligns with Phison’s push for high-capacity enterprise SSDs. The company has announced a 244TB model, with Pua explaining, "Our current 122TB drive utilizes our X2 controller with 16-layer NAND stacking. To achieve 244TB, we simply need 32-layer stacking. The design is complete, but the challenge lies in manufacturing yield.”

He also mentioned an intriguing alternative: higher-density NAND dies. “We’re awaiting 4Tb NAND dies; with those, we could reach 244TB with just 16 layers,” he said, noting that the timeline would depend on manufacturing advancements.

Regarding PLC NAND, Pua clarified that Phison does not control its release, but he expressed intent to support it once manufacturers can deliver it reliably.

“PLC is five-bit NAND, which is primarily a decision for NAND manufacturers, not ours,” he stated. “When NAND companies mature their PLC technology, our SSD designs will be ready to support it.”

He was more skeptical about another storage trend: integrating flash directly into GPU memory stacks, often referred to as high-bandwidth flash. Pua argued that the endurance mismatch creates significant risks.

“The challenge with integrating NAND directly with GPUs is the write cycle limitation,” he explained. “NAND has finite program/erase cycles. If you integrate them, when the NAND reaches end-of-life, you have to discard the entire expensive GPU card.”

Phison’s preferred approach is modular: “keeping SSDs as replaceable, plug-and-play components. When an SSD wears out, you simply replace it while retaining the costly GPU.”

Overall, Pua envisions the future of AI hardware as less about pursuing ever-larger GPUs and more about creating systems where memory capacity is affordable, scalable, and easily replaceable.

Whether the goal is local inference on an integrated GPU or large-scale inference in a hyperscaler, the company believes that storage density and memory expansion will dictate what is feasible long before another leap in computational power occurs.

React to this story

Related Posts