AMD

Full Time

Posted 2 months ago

To apply for this job please visit careers.amd.com.

The Role

We are looking for a highly skilled Linux / HPC Systems Engineer to design, operate, and scale high-performance computing (HPC) environments alongside modern DevOps infrastructure. This role blends hands-on expertise in Slurm-managed HPC clusters, GPU compute platforms, and Kubernetes-based orchestration with strong automation and CI/CD practices.

The ideal candidate is comfortable working in fast-paced, collaborative environments, takes ownership of complex systems with minimal supervision, and is passionate about building reliable, scalable, and high-performance infrastructure.

The Person

You are an experienced infrastructure engineer with a strong foundation in DevOps, Site Reliability Engineering (SRE), or platform engineering. You bring deep technical expertise in Linux systems, Kubernetes, and automation, along with practical experience supporting GPU-accelerated workloads and HPC environments.

You thrive on solving complex problems, communicate clearly across technical teams, and consistently drive execution from design through production.

Key Responsibilities

Deploy, configure, and operate HPC clusters using Slurm
Manage GPU compute environments, high-speed interconnects, and parallel storage systems
Design, build, and maintain CI/CD pipelines using tools such as Buildkite, GitHub Actions, and Jenkins
Automate infrastructure provisioning and configuration using Ansible, Terraform, Python, and Bash
Deploy and manage containerized workloads using Docker, Kubernetes, and Helm
Monitor system health, performance, and reliability using Grafana, Prometheus, and Checkmk
Collaborate with cross-functional teams to optimize workflows, resolve issues, and document best practices

Preferred Experience & Skills

Strong hands-on experience with Slurm or equivalent HPC schedulers
Proven expertise in DevOps, CI/CD pipelines, and infrastructure automation
Experience managing GPU compute stacks (CUDA and/or ROCm)
Advanced Linux administration, shell scripting, and distributed systems troubleshooting
Containerization and orchestration experience with Docker, Kubernetes, and Helm
Agile, collaborative mindset with excellent verbal and written communication skills

Education & Experience

Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field

Job Overview

Industry
- Information Technology
Experience
- 0-2 Years
Qualification
- Bachelor Degree

AMD

The Role

The Person

Key Responsibilities

Preferred Experience & Skills

Education & Experience

Job Overview

Industry

Experience

Qualification