Second Green AI Summit at Harvard and Boston University Successfully Convened
How the World’s Largest Tech Companies Are Designing the Next Generation of Data Centers
The rise of artificial intelligence has dramatically increased the demand for high-density, high-efficiency data centers. Hyperscale cloud providers, Meta, Amazon, Microsoft, and Google, are leading a transformation in infrastructure design, focused on supporting AI workloads while reducing environmental impact.
This chapter explores how these companies are evolving their physical and digital infrastructure to meet the challenges of AI, including hardware architecture, cooling systems, and power sourcing strategies. It also highlights their efforts to scale renewable energy adoption and explore advanced nuclear options to ensure long-term energy resilience.
Meta
Meta is scaling its infrastructure to support business-facing AI services and language model development (e.g. LLaMA 3). In 2024, Meta announced a $10 billion hyperscale data center project in Louisiana, designed specifically for AI. Like its Project Cosmo in Wyoming, the facility will run entirely on renewable energy.
Meta’s latest clusters include over 24,000 Nvidia H100 GPUs, integrated with high-speed storage and advanced networking fabric. The company uses both RoCE-based Ethernet and NVIDIA’s InfiniBand to optimize for AI training performance. Meta’s architecture combines its Grand Teton server design with custom file systems and a robust PyTorch environment, all built for large-scale distributed AI workloads.
Amazon (AWS)
Amazon’s AI infrastructure is powered by its custom Trainium chips. In late 2024, AWS released Trainium2 and launched EC2 Trn2 Ultra servers, which deliver 20.8 peak petaflops of compute per instance. These servers are engineered for training large language models at scale, with 30–40% better price-performance than GPU-based alternatives.
Amazon’s internal network leverages ultra-fast NeuronLink interconnects, and its strategy centers on custom silicon, vertical integration, and regional infrastructure clusters designed for efficient scaling of AI applications.
Microsoft
Microsoft has embedded AI deeply into its Azure platform. In 2024, it launched new ND H200 v5 virtual machines, which include Nvidia H200 GPUs and expanded high-bandwidth memory. These clusters increase inference throughput for large models by up to 35% compared to the previous generation.
Microsoft also offers specialized clusters for OpenAI's models and is investing in silicon innovation, workload scheduling optimization, and thermal efficiency improvements to support increasingly complex AI deployments.
Google’s AI Hypercomputer platform supports advanced training and inference workloads through A3 Ultra VMs and Hypercomputer Clusters. These new instances use Nvidia H200 GPUs and feature up to 3.2 Tbps of GPU-to-GPU communication via RDMA over Converged Ethernet.
Google’s infrastructure emphasizes tightly integrated compute, networking, and thermal management layers. Hypercomputer Clusters allow clients to deploy AI at massive scale while controlling resource placement and minimizing latency across accelerator nodes.
Microsoft
Microsoft is deploying closed-loop liquid cooling systems in water-scarce regions like Phoenix and Wisconsin. These systems are projected to save 125 million liters of water per year per site. Microsoft is also investing heavily in nuclear energy, including a 20-year agreement with Constellation Energy for 835 MW of zero-carbon electricity and a fusion research partnership with Helion Energy.
Additionally, the company operates Project Natick, an experimental underwater data center that uses ocean water for passive cooling. It has achieved a power usage effectiveness (PUE) of 1.12, making it one of the most efficient systems in the industry.
Amazon
Amazon has built one of the world’s largest corporate renewable energy portfolios, with over 500 solar and wind projects globally. Its solar-battery hybrid facilities, especially in California, are designed to support AI workloads with 24/7 availability.
Amazon is also testing hydrogen fuel cells for backup power and exploring small modular reactors (SMRs) as part of its long-term strategy to secure scalable, clean electricity for future hyperscale facilities.
Google’s energy strategy includes a landmark geothermal power agreement with NV Energy to support its Nevada data centers. Geothermal energy offers a consistent power profile, unlike intermittent solar or wind, making it suitable for round-the-clock AI operations.
On the hardware side, Google’s Tensor Processing Units (TPUs) offer up to 40% better energy efficiency than general-purpose GPUs. Each TPU generation has improved performance-per-watt, aligning with Google’s commitment to operate on 24/7 carbon-free energy by 2030.
Meta
Meta supports industry-wide sustainability innovation through the Open Compute Project, sharing open-source designs for energy-efficient server hardware, cooling systems, and power delivery frameworks.
The company continues to focus on water conservation, closed-loop cooling, and long-term clean energy contracts to decouple its data center expansion from fossil fuel dependency.
The AI era has pushed hyperscalers to reinvent data center infrastructure from the ground up. These companies are not only building larger and more powerful computing clusters but are also committing to decarbonized energy systems, more efficient cooling, and custom hardware.
Together, their innovations offer a blueprint for how the global digital infrastructure sector can support the growth of AI without compromising on sustainability goals.