Devops Engineer

NVIDIA

  • Full Time

To apply for this job please visit nvidia.wd5.myworkdayjobs.com.

Senior DevOps Engineer

Location: Yokneam, Israel

Join NVIDIAs cutting-edge team driving the future of computing. As a Senior DevOps Engineer, you will play a pivotal role in ensuring the smooth operation and continuous innovation of our groundbreaking technologies. Our mission is to harness the power of AI to deliver pioneering solutions that make a meaningful global impact.

What You will Do

  • Take ownership of the solutions you develop, working closely with cross-functional teams to ensure successful implementation and delivery.
  • Collaborate effectively in a dynamic, fast-paced environment to guarantee seamless project execution.
  • Continuously enhance automation processes to streamline provisioning and management of solutions.
  • Identify and troubleshoot performance issues, recommending improvements to uphold exceptional service quality.
  • Conduct capacity planning and management to support evolving operational requirements.
  • Participate in incident reviews, help pinpoint root causes, and author detailed RCA (Root Cause Analysis) reports.
  • Deliver Site Reliability Engineering (SRE) solutions across a global, multi-cloud hybrid environment including AWS, GCP, and on-premises infrastructure.
  • Contribute to the teams on-call rotation to maintain system reliability.

What We are Looking For

  • Bachelors degree in Computer Science or a related technical field—or equivalent practical experience.
  • 10+ years experience supporting and building critical services, with at least 5 years coding/scripting proficiency in two or more languages such as Python, Go, Ruby, or Groovy.
  • Strong expertise in Kubernetes administration, modern CI/CD pipelines, and Infrastructure as Code (IaC) practices.
  • In-depth knowledge of Linux operating systems and TCP/IP networking fundamentals.
  • Hands-on experience with at least one leading cloud provider—AWS, GCP, or Azure.
  • Proven end-to-end SRE expertise, including observability and monitoring.
  • Skilled in metrics collection, application performance monitoring (APM), container orchestration, and log aggregation tools.
  • Excellent problem-solving, debugging, communication, and documentation skills.

What Will Make You Stand Out

  • Linux certification from recognized vendors such as RedHat or Oracle.
  • Experience managing large-scale Kubernetes deployments in production environments.
  • Strong understanding of modern container networking and storage architectures.
  • Industry-recognized cloud certifications.
  • Hands-on experience with Slurm or LSF cluster management systems.
Job Overview