IC Acceleration Cluster

Hardware Acceleration Cluster Solution for an Integrated Circuits Research Institute

AI-assisted integrated circuit (IC) placement and routing has emerged as a high-potential research direction in advanced semiconductor design. By integrating artificial intelligence algorithms with EDA methodologies, this approach aims to improve layout quality, routing efficiency, convergence performance, and overall design productivity through intelligent optimization techniques.

As device complexity increases and process nodes continue to scale, AI-driven EDA workflows demand significantly higher computational resources than traditional rule-based or manual design approaches. To support this shift, a dedicated hardware acceleration cluster was deployed to enable large-scale AI-assisted placement and routing research.

Project Background:

Placement and routing tasks in modern IC design are becoming increasingly complex due to:

Rapid growth in circuit scale and transistor density
Shortened product development cycles
Stricter requirements on power, performance, and signal integrity
Advanced process technologies down to 5nm and below

Compared to conventional methodologies, AI-assisted placement and routing requires:

Large-scale data processing
Graph-based feature extraction
Model training and iterative optimization
High-speed SPICE simulation validation

These workloads demand substantial computing power, memory capacity, storage throughput, and GPU-based parallel acceleration. To address these requirements, a scalable heterogeneous hardware acceleration cluster was designed to support AI-EDA tool development and simulation tasks.

Core requirements:

High-Performance Computing

AI-assisted placement and routing involves compute-intensive processes such as model training, circuit data preprocessing, and optimization algorithms. Multi-core CPUs combined with GPU acceleration provide the necessary performance and parallel efficiency.

Large Memory Capacity

Massive circuit datasets, model parameters, and intermediate results must be processed in memory. Sufficient RAM capacity ensures stable execution and minimizes data-swapping bottlenecks.

High-Speed Storage

Simulation outputs, intermediate datasets, and model checkpoints generate significant storage demands. Enterprise-grade SSDs and high-performance storage arrays ensure fast data access and reliability.

Parallel Acceleration

GPU-based parallel computing significantly improves model training speed and SPICE simulation efficiency. Heterogeneous CPU-GPU architecture is essential to achieve scalable performance gains.

Solution Architecture

Structural diagram:

Hardware-Accelerated Cluster Node (Single Node Configuration):

Integrated with 4 × NVIDIA A100 40GB PCIe GPUs
Designed for high-density AI training and SPICE simulation workloads
Optimized for CPU-GPU heterogeneous acceleration

Each GPU delivers high double-precision and mixed-precision floating-point performance suitable for AI-EDA and circuit simulation tasks.

*The entire system integrates four Nvidia A100 40G GPU computing cards, each with a floating-point computing capability of [missing information].


FP64	9.7 TFLOPS
FP64 Tensor Core	19.5 TFLOPS
FP32	19.5 TFLOPS
Tensor Float 32 (TF32)	156 TFLOPS \| 312 TFLOPS*
BFLOAT16 Tensor Core	312 TFLOPS \| 624 TFLOPS*
FP16 Tensor Core	312 TFLOPS \| 624 TFLOPS*
INT8 Tensor Core	624 TFLOPS \| 1248 TFLOPS*

Empyrean ALPS-GT Heterogeneous Accelerator Platform:

The system integrates eight Nvidia Nvlink HGX-A100 GPU computing cards.

Integrated with 8 × NVIDIA HGX A100 GPUs via NVLink interconnect
High-bandwidth GPU-to-GPU communication architecture
Designed for large-scale distributed AI model training and high-capacity SPICE simulation
This configuration enables enhanced scalability and supports advanced research workloads requiring high parallel efficiency.

*The entire system integrates four Nvidia A100 40G GPU computing cards, each with a floating-point computing capability of [missing information].


FP64	9.7 TFLOPS
FP64 Tensor Core	19.5 TFLOPS
FP32	19.5 TFLOPS
Tensor Float 32 (TF32)	156 TFLOPS \| 312 TFLOPS*
BFLOAT16 Tensor Core	312 TFLOPS \| 624 TFLOPS*
FP16 Tensor Core	312 TFLOPS \| 624 TFLOPS*
INT8 Tensor Core	624 TFLOPS \| 1248 TFLOPS*

Features of the Heterogeneous Acceleration Solution:

SPICE Circuit Simulation

Simulation capacity exceeding 100 million devices
Proprietary CPU/GPU parallel simulation technology
Up to 10× acceleration compared to CPU-based commercial parallel SPICE simulators
Mixed-signal simulation support
Co-simulation with industry-leading digital simulators
Strong convergence and performance for power management circuits
Comprehensive static and dynamic circuit checking
Save/Recover breakpoint resume functionality
Support for advanced 5nm process nodes

SMS-GT Matrix Solver

GPU-native matrix solving architecture
Fully leverages GPU computational resources
Lossless numerical precision
More than 10× acceleration compared to CPU-based parallel matrix solvers

This solver significantly enhances simulation throughput for large-scale analog and mixed-signal designs.

Measurement example	CPU-Tools (Hours)	Hardware-accelerated cluster simulation (hours)	acceleration ratio
Transmitter	157.4	18.1	8.7X
Serdes_vco	135.1	9.8	13.8X
ADC	174.1	16.3	10.7X
Serdes_TX	2752.8	12.0	22.9X

Conclusion:

AI-assisted placement and routing represents a strategic advancement in modern IC design research. As circuit complexity and AI model scale continue to increase, scalable heterogeneous acceleration infrastructure becomes a critical enabler.

Through the deployment of a high-performance hardware acceleration cluster, the research institute achieved a substantial upgrade in computational capability, enabling the transition toward next-generation AI-driven intelligent layout planning systems. This enhancement supports improved design quality, faster iteration cycles, and more efficient development workflows.

EmpowerX provides integrated hardware acceleration solutions tailored for AI-EDA and high-performance semiconductor research environments, enabling institutions and enterprises to build scalable, future-ready computing infrastructure.