Senior DevOps Engineer
Location: Yokneam, Israel
Join NVIDIAs cutting-edge team driving the future of computing. As a Senior DevOps Engineer, you will play a pivotal role in ensuring the smooth operation and continuous innovation of our groundbreaking technologies. Our mission is to harness the power of AI to deliver pioneering solutions that make a meaningful global impact.
What You will Do
- Take ownership of the solutions you develop, working closely with cross-functional teams to ensure successful implementation and delivery.
- Collaborate effectively in a dynamic, fast-paced environment to guarantee seamless project execution.
- Continuously enhance automation processes to streamline provisioning and management of solutions.
- Identify and troubleshoot performance issues, recommending improvements to uphold exceptional service quality.
- Conduct capacity planning and management to support evolving operational requirements.
- Participate in incident reviews, help pinpoint root causes, and author detailed RCA (Root Cause Analysis) reports.
- Deliver Site Reliability Engineering (SRE) solutions across a global, multi-cloud hybrid environment including AWS, GCP, and on-premises infrastructure.
- Contribute to the teams on-call rotation to maintain system reliability.
What We are Looking For
- Bachelors degree in Computer Science or a related technical field—or equivalent practical experience.
- 10+ years experience supporting and building critical services, with at least 5 years coding/scripting proficiency in two or more languages such as Python, Go, Ruby, or Groovy.
- Strong expertise in Kubernetes administration, modern CI/CD pipelines, and Infrastructure as Code (IaC) practices.
- In-depth knowledge of Linux operating systems and TCP/IP networking fundamentals.
- Hands-on experience with at least one leading cloud provider—AWS, GCP, or Azure.
- Proven end-to-end SRE expertise, including observability and monitoring.
- Skilled in metrics collection, application performance monitoring (APM), container orchestration, and log aggregation tools.
- Excellent problem-solving, debugging, communication, and documentation skills.
What Will Make You Stand Out
- Linux certification from recognized vendors such as RedHat or Oracle.
- Experience managing large-scale Kubernetes deployments in production environments.
- Strong understanding of modern container networking and storage architectures.
- Industry-recognized cloud certifications.
- Hands-on experience with Slurm or LSF cluster management systems.
