Cloud – DevOps / SRE Engineer – Remote

Binance

  • Full Time

To apply for this job please visit www.binance.com.

Responsibilities

  • Lead production incident handling and conduct post-mortem analyses to drive system stability and continuous improvement.
  • Design, deploy, monitor, and troubleshoot Kafka and Redis clusters in production environments, ensuring optimal performance, scalability, and reliability.
  • Collaborate closely with development teams to ensure smooth, reliable, and automated application/system deployments.
  • Manage and optimize cloud infrastructure (AWS / AliCloud) for performance, cost efficiency, and operational resilience.
  • Build and enhance internal DevOps platforms, including online load-testing systems and change-management tools.
  • Continuously explore and apply AI-driven insights to improve reliability, reduce alert noise, and enable intelligent decision-making across engineering operations.
  • Bonus: Utilize LLMs and AI frameworks (OpenAI, Dify, Agno, LangChain) to automate DevOps workflows such as intelligent alert triage, root-cause analysis (RCA), and chat-based operations (ChatOps).

Requirements

  • 5+ years of hands-on experience operating Kafka and Redis in large-scale production environments, with the ability to work with developers to optimize application code.
  • Experience using or integrating tools such as Dify, Agno, or LangChain into operational or automation workflows.
  • Proficiency in at least one programming language (Python or Go) and solid SQL skills.
  • Strong hands-on experience with containerization and orchestration technologies (Docker, Kubernetes).
  • Proficient with CI/CD and automation tools such as GitHub Actions, Ansible, Terraform, etc.
  • Bonus: Experience designing or operating AIOps systems (e.g., anomaly detection, alert correlation, auto-healing, or RCA automation).
  • Bonus: Familiarity with LLM-powered DevOps automation (e.g., ChatOps assistants, AI-driven observability workflows).
Job Overview