Site Reliability Engineer

1 week ago


ClujNapoca, Cluj, Romania Human Direct Full time 50,000 - 70,000 per year

Role Summary

This is a hybrid role that balances proactive engineering projects—such as enhancing automation and scaling Kubernetes—with a strong focus on operational excellence. You'll contribute to both the day-to-day stability and the long-term reliability of production systems.

It's an exciting opportunity to make a real impact: our client is in the process of formally adopting SRE principles, and you'll be a key player in defining and implementing these practices. The role is well-suited for a proactive problem-solver who is passionate about building resilient systems and eager to stay ahead in the fast-evolving cloud landscape.

Responsibilities

  • Reliability & Availability:
    Design, implement, and test High Availability, Backup, and Disaster Recovery strategies.
  • Monitoring & Observability:
    Build a comprehensive monitoring and alerting strategy using Azure Monitor, Application Insights, and related tools.
  • SRE Practice:
    Help establish and implement SRE best practices, define SLOs/SLIs, and drive data-informed decisions.
  • Kubernetes Management:
    Deploy, manage, and scale applications on Azure Kubernetes Service (AKS).
  • Infrastructure & Automation:
    Build and maintain Azure infrastructure using IaC (Bicep, Azure DevOps) and enhance CI/CD pipelines.
  • Cloud Governance:
    Implement best practices for security, cost optimization, and compliance.
  • Operational Support:
    Participate in an on-call rotation, driving a blameless post-mortem culture.
  • Collaboration:
    Work closely with developers to ensure services are reliable, scalable, and secure from the start.

Qualifications

  • 3+ years of experience as a Cloud Engineer, DevOps, or SRE.
  • Hands-on experience with Microsoft Azure (App Service, VMs, AKS, networking).
  • Infrastructure as Code expertise, especially
    Bicep
    .
  • Experience with monitoring and alerting (Azure Monitor, Application Insights, Log Analytics, Zabbix).
  • Strong troubleshooting, root cause analysis, and telemetry analysis skills.
  • Experience with CI/CD concepts and tools, especially Azure DevOps.
  • Proactive, problem-solving mindset with a passion for automation.

Nice-to-Have Skills

  • Hands-on experience with AKS containerization and orchestration.
  • Curiosity to learn about VoIP technologies (SIP, Asterisk).
  • Familiarity with Azure AI services (OpenAI, Cognitive Services, AI Foundry).
  • Prior exposure to SRE frameworks (SLOs/SLIs, error budgets).
  • Experience with databases like Azure SQL, CosmosDB, MySQL, PostgreSQL.


  • Cluj-Napoca, Cluj, Romania Newxel Full time 60,000 - 120,000 per year

    We're looking for a Site Reliability Engineer (SRE) to design, build, and maintain scalable cloud infrastructure. You'll work with modern cloud technologies, automation, and infrastructure-as-code to ensure high system performance and reliability. Your role will involve optimizing existing systems, building resilient infrastructure, and automating processes...


  • Cluj-Napoca, Cluj, Romania ING Hubs Romania Full time €40,000 - €80,000 per year

    Discover ING Hubs RomaniaING Hubs Romania offers 130 services in software development, data management, non-financial risk andamp; compliance, audit, and retail operations to 24 ING units worldwide, with the help of over 2000 hi h-performin en ineers, risk, and operations professionals.We started out in 2015 as ING's software development hub, then steadily...


  • Cluj-Napoca, Cluj, Romania AppNiv Full time €30,000 - €60,000 per year

    About us:The product directly controls and optimizes refineries and chemical plants with AI to add millions of dollars to the plant's bottom line while managing safe operating limits, energy efficiency, and sustainability objectives. The Closed Loop Neural Network platform allows customers to leverage Reinforcement Learning (RL) developed by world-class...


  • Cluj-Napoca, Cluj, Romania Flutter International Full time €40,000 - €80,000 per year

    Site Reliability Engineer - Flutter Functions, HybridAnti-Money Laundry 1About Betfair Romania Development​:Betfair Romania Development is the largest technology hub of Flutter Entertainment, with over 2,000 people powering the world's leading sports betting and iGaming brands. Exciting, immersive and safe experiences are delivered to over 18 million...


  • Cluj-Napoca, Cluj, Romania Garmin Cluj Full time 30,000 - 60,000 per year

    At Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Cloud Platform Technology (CPT) is the core of the company which helps all segments and engineers to have reliable and sustainable tools to perform the day-to-day business. The CPT...


  • Cluj-Napoca, Cluj, Romania Garmin Full time 15,000 - 30,000 per year

    At Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Cloud Platform Technology (CPT) is the core of the company which helps all segments and engineers to have reliable and sustainable tools to perform the day-to-day business. The CPT...


  • Cluj-Napoca, Cluj, Romania OpenText Full time €80,000 - €120,000 per year

    OPENTEXTOpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of...


  • Cluj-Napoca, Cluj, Romania Garmin Cluj Full time €40,000 - €80,000 per year

    At Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Garmin Private Cloud (GPC) will be our internal cloud, developed entirely using open-source technologies such as OpenStack and Kubernetes. GPC will enable Garmin to fully manage the...


  • Cluj-Napoca, Cluj, Romania Emerson Full time €40,000 - €80,000 per year

    As the Machinery Health Management Systems Engineer (MHM Engineer), you will be responsible for properly completing the projects based on customer specifications and keeping them in line with the project strategy. In this role, this person will also be responsible for implementing, installing, modifying, testing, and validating condition monitoring & machine...


  • Cluj-Napoca, Cluj, Romania Emerson Full time 30,000 - 60,000 per year

    DescriptionAs the Machinery Health Management Systems Engineer (MHM Engineer), you will be responsible for properly completing the projects based on customer specifications and keeping them in line with the project strategy. In this role, this person will also be responsible for implementing, installing, modifying, testing, and validating condition...