As artificial intelligence continues to advance, the demand for powerful computing resources has surged. AI data centres, which house the sophisticated hardware required for AI tasks, face significant challenges in managing the heat generated by these high-density compute environments. A large hyperscale data centre consuming 100MW of power would emit over 300 million BTUs of heat per hour. As chips become more advanced the need for more efficient cooling systems increases. Effective cooling solutions are not just a luxury but a necessity to ensure operational efficiency, reliability, and sustainability. Among the various cooling technologies, liquid cooling has emerged as a critical solution for advanced AI data centres.
AI workloads, such as deep learning and neural network training, involve extensive computational processes that generate substantial heat. Servers and GPUs (graphics processing units), the workhorses of AI data centres, can quickly overheat without adequate cooling. Overheating not only reduces the performance of these components but also significantly shortens their lifespan. Therefore, maintaining an optimal temperature is essential to prevent thermal throttling, which can degrade performance and cause hardware failures.
Effective cooling systems ensure that the temperature within the data centre remains within safe operating limits. Traditional air-based cooling methods, like Computer Room Air Conditioners (CRAC) and hot aisle/cold aisle configurations, have been the backbone of data centre cooling. However, as AI hardware continues to evolve, these traditional methods are often insufficient to handle the heat densities produced by modern AI workloads.
The Rise of Liquid Cooling
Liquid cooling has become increasingly popular in advanced AI data centres due to its superior thermal management capabilities. Unlike air cooling, which relies on circulating air to dissipate heat, liquid cooling uses a coolant (often water or a special fluid) that can absorb and transfer heat much more efficiently.
1. Direct Liquid Cooling (DLC)
One of the most effective forms of liquid cooling is Direct Liquid Cooling (DLC), where coolant is brought directly to the heat-producing components. Cold plates or cooling jackets are attached to CPUs and GPUs, and coolant is circulated through these plates to absorb heat. DLC systems can handle high heat loads with greater efficiency than air cooling, making them ideal for AI data centres with densely packed servers.
2. Immersion Cooling
Another advanced liquid cooling technique is immersion cooling, where entire servers are submerged in a dielectric fluid. This fluid, being non-conductive, allows for safe and efficient heat transfer. Immersion cooling can manage extremely high heat densities, providing uniform cooling across all components and eliminating hotspots. This method is particularly beneficial for AI applications that require intensive computations and generate significant amounts of heat.
The use of liquid cooling provides significant benefits:
- Higher efficiency: liquid cooling systems can dissipate heat more effectively than air-based systems, leading to higher operational efficiency and reduced energy consumption. This efficiency translates to lower operational costs and a smaller carbon footprint, aligning with sustainability goals.
- Increased density: with liquid cooling, data centres can support higher server densities without the risk of overheating. This capability is crucial for AI data centres that need to maximise their computational power within limited physical space.
- Enhanced reliability: maintaining optimal temperatures with liquid cooling improves hardware reliability and longevity, reducing the frequency of maintenance and replacements. This reliability is critical for AI data centres, where downtime can lead to significant disruptions and financial losses.
- Scalability: liquid cooling solutions can be scaled to meet the growing demands of AI workloads. As AI models become more complex and require more computational power, liquid cooling systems can be expanded to accommodate these needs without extensive modifications.
The importance of effective cooling in AI data centres cannot be overstated. As AI technologies advance, the heat generated by high-density computing environments presents significant challenges. Liquid cooling offers a robust solution, providing superior thermal management, higher efficiency, and enhanced reliability. By adopting liquid cooling technologies, AI data centres can ensure optimal performance, sustainability, and scalability, paving the way for continued innovation and growth in the field of artificial intelligence.
Over the past year, we have been looking for various ways to play the AI theme and have invested in various companies at the forefront of liquid cooling.