Client Overview:
A bank, on a path to an intensive use of AI for various applications from different departments
Challenges:
Today’s cognitional approach is to buy heavy-duty servers, each loaded with a minimum of 4 GPUs. It means having silos of AI machines that could be used partly on a weekly basis, and data scientists being locked with a certain capacity of processing. Not talking about the additional costs from big servers with big chassis, need for cooling, more energy in some racks..
Solution: Hardware disaggregation by Drut
Building a pool of GPUs that can be shared between multiple servers (each having a smaller rack footprint) means each team/organization having the capacity to get from 1 to N GPUs when needed.
Advantages of Using Drut:
- CAPEX savings (20%+ on servers), from the number of GPUs required, from the re-use of existing GPUs
More real-estate available in servers rack On-demand GPU composition for applications, thus enabling data scientists to speed up their jobs depending upon availability
Capacity to mix and match servers and GPUs from different vendors, no lock-in Opportunity to dedicate a rack in a specific part of the DC just for GPUs, then simplifying cooling/the need for servers liquid cooling capacity to mix and match servers and GPUs from different vendors, no lock-in Opportunity to dedicate a rack in a specific part of the DC just for GPUs, then simplifying cooling/the need for servers liquid cooling
Outcome:
By engaging on the path of HW disaggregation with Drut, the company managed to engage on a path where adding capacity comes when there is visibility of the limits of the current pool, and the re-use of existing assets