Systems Engineer – Linux/HPC

AMD

  • Full Time

To apply for this job please visit careers.amd.com.

The Role

We are looking for a highly skilled Linux / HPC Systems Engineer to design, operate, and scale high-performance computing (HPC) environments alongside modern DevOps infrastructure. This role blends hands-on expertise in Slurm-managed HPC clusters, GPU compute platforms, and Kubernetes-based orchestration with strong automation and CI/CD practices.

The ideal candidate is comfortable working in fast-paced, collaborative environments, takes ownership of complex systems with minimal supervision, and is passionate about building reliable, scalable, and high-performance infrastructure.

The Person

You are an experienced infrastructure engineer with a strong foundation in DevOps, Site Reliability Engineering (SRE), or platform engineering. You bring deep technical expertise in Linux systems, Kubernetes, and automation, along with practical experience supporting GPU-accelerated workloads and HPC environments.

You thrive on solving complex problems, communicate clearly across technical teams, and consistently drive execution from design through production.

Key Responsibilities

  • Deploy, configure, and operate HPC clusters using Slurm

  • Manage GPU compute environments, high-speed interconnects, and parallel storage systems

  • Design, build, and maintain CI/CD pipelines using tools such as Buildkite, GitHub Actions, and Jenkins

  • Automate infrastructure provisioning and configuration using Ansible, Terraform, Python, and Bash

  • Deploy and manage containerized workloads using Docker, Kubernetes, and Helm

  • Monitor system health, performance, and reliability using Grafana, Prometheus, and Checkmk

  • Collaborate with cross-functional teams to optimize workflows, resolve issues, and document best practices

Preferred Experience & Skills

  • Strong hands-on experience with Slurm or equivalent HPC schedulers

  • Proven expertise in DevOps, CI/CD pipelines, and infrastructure automation

  • Experience managing GPU compute stacks (CUDA and/or ROCm)

  • Advanced Linux administration, shell scripting, and distributed systems troubleshooting

  • Containerization and orchestration experience with Docker, Kubernetes, and Helm

  • Agile, collaborative mindset with excellent verbal and written communication skills

Education & Experience

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field

Job Overview