Site Reliability Engineer
7 days ago
About Betfair Romania Development:
Betfair Romania Development is the largest technology hub of Flutter Entertainment, with over 2,000 people powering the world's leading sports betting and iGaming brands. Exciting, immersive and safe experiences are delivered to over 18 million customers worldwide, from our office in Cluj-Napoca. Driven by relentless innovation and commitment to excellence, we operate our own unbeatable portfolio of diverse proprietary brands such as FanDuel, PokerStars, SportsBet, Betfair, Paddy Power, or Sky Betting & Gaming.
Our Values:
The values we share at Betfair Romania Development define what makes us unique as a team. They empower us by giving meaning to our contributions, and they ensure that we consistently strive for excellence in everything we do. We are looking for passionate individuals who align with our values and are committed to making a difference.
Win together | Raise the bar | Got your back | Own it | Positive impact
About Flutter Functions:
The Flutter Functions division is a key component of Flutter Entertainment, responsible for providing essential support and services across the organization. The division encompasses various corporate functions, including finance, legal, human resources, technology, and more, ensuring seamless operations and strategic alignment throughout the company.
Role Overview:
The Site Reliability Engineer will be responsible for ensuring the reliability, availability, and performance of Flutter Entertainment's critical gaming and betting platforms across our global operations. This role combines software engineering expertise with operational excellence to maintain 24/7/365 service availability for millions of customers worldwide. As part of the Service Management Function within Flutter Functions, you will collaborate closely with development teams, infrastructure specialists, and business stakeholders to maintain the high-performance, scalable systems that power our iGaming & Sport platforms across multiple markets. Your role will involve implementing automation, monitoring, and incident response procedures to support Flutter's mission of delivering world-class entertainment experiences.
You understand and embrace the philosophy of continuous improvements and have experience of leading teams operating within a CI culture. You don't complain about recurring incidents – you drive process improvements and implement preventative measures to eliminate root causes. You work with internal and external teams to drive best in class to develop real-world solutions and positive user experiences for every interaction.
This role requires exceptional communication skills, as interaction and engagement with senior management during incident escalations and post-incident reviews will be a regular aspect of the role.
Key Accountabilities & Responsibilities:
Maintain 99.9%+ uptime for critical gaming and betting platforms serving millions of concurrent users
Design and implement monitoring, alerting, and observability solutions using tools such as Grafana, Splunk & CloudWatch
Conduct capacity planning and performance optimization to ensure systems can handle peak loads during major sporting events
Establish and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services with support from Service Management
Support ProdOps and Service Management teams during P1/P2 incident response, providing technical expertise and facilitating cross-functional coordination to minimize customer impact
Collaborate with Service Management on post-incident reviews, contributing technical insights and supporting the implementation of preventative measures to reduce repeat occurrences
Assist in developing and maintaining comprehensive runbooks and incident response procedures in partnership with Service Management teams
Grafana Stack Management: Design, deploy, and maintain comprehensive Grafana dashboards for real-time system visibility across all Flutter platforms
Advanced Visualization: Create custom Grafana panels and dashboards for business metrics, technical KPIs, and operational insights tailored to different stakeholder needs
Multi-Source Data Integration: Configure and optimize Grafana data sources including Prometheus, InfluxDB, Elasticsearch, CloudWatch, and custom APIs
Alerting Strategy: Implement intelligent alerting rules using Grafana Alerting, reducing alert fatigue while ensuring critical issues are promptly escalated
Performance Monitoring: Establish application performance monitoring (APM) using Grafana Agent and integrate with existing observability stack
Custom Metrics Development: Work with development teams to implement custom business and technical metrics that provide actionable insights
Partner with development teams to improve application reliability and deployment practices
Mentor junior team members and contribute to the development of SRE practices across Flutter
Participate in architecture reviews and provide reliability expertise for new system designs
Document procedures, troubleshooting guides, and system architecture for knowledge sharing
Look for ways to use AI to triage and investigate alerts allowing for more rapid resolution
Use AI to find root cause by connecting the dots between code changes, alerts and past incidents
Investigate the use of AI to provide more collaboration and identify possible resolutions to incidents
Skills, Capabilities & Experience Required:
Cloud Platforms: Advanced experience with AWS, Azure, or Google Cloud Platform services and architecture
Containerization: Proficiency with Docker and Kubernetes for container orchestration and management
Programming: Strong scripting abilities in Python, Go, Bash, or PowerShell; familiarity with Java or .NET advantageous
Monitoring & Observability: Hands-on experience with Prometheus, Grafana, ELK stack, or similar monitoring solutions
CI/CD: Proficiency with Jenkins, GitLab CI, Azure DevOps, or similar continuous integration tools
Database Technologies: Working knowledge of SQL databases (PostgreSQL, MySQL) and NoSQL solutions
Networking: Understanding of load balancers, CDNs, DNS, and network security principles
Benefits:
Hybrid & remote working options
€1,000 per year for self-development
Company share scheme
25 days of annual leave per year
20 days per year to work abroad
5 personal days/year
Flexible benefits: travel, sports, hobbies
Extended health, dental and travel insurances
Customized well-being programmes
Career growth sessions
Thousands of online courses through Udemy
A variety of engaging office events
Disclaimer:
We are an inclusive employer. By embracing diverse experiences and perspectives, we create a lasting, positive impact for our employees, customers, and the communities we're part of. You don't have to meet all the requirements listed to apply for this role. If you need any adjustments to make this role work for you, let us know, and we'll see how we can accommodate them.
We thank all applicants for their interest; however, only the candidates who best meet the job requirements will be contacted for an interview.
By submitting your application online, you agree that your details will be used to progress your application for employment. If your application is successful, your details will be used to administer your personnel record. If your application is unsuccessful, we will retain your details for a period no longer than three years, to consider you for prospective roles within the company.
-
Site Reliability Engineer
5 hours ago
Cluj-Napoca, Cluj, Romania Newxel Full time 60,000 - 120,000 per yearWe're looking for a Site Reliability Engineer (SRE) to design, build, and maintain scalable cloud infrastructure. You'll work with modern cloud technologies, automation, and infrastructure-as-code to ensure high system performance and reliability. Your role will involve optimizing existing systems, building resilient infrastructure, and automating processes...
-
Senior Site Reliability Engineer
1 week ago
Cluj-Napoca, Cluj, Romania ING Hubs Romania Full time €40,000 - €80,000 per yearDiscover ING Hubs RomaniaING Hubs Romania offers 130 services in software development, data management, non-financial risk andamp; compliance, audit, and retail operations to 24 ING units worldwide, with the help of over 2000 hi h-performin en ineers, risk, and operations professionals.We started out in 2015 as ING's software development hub, then steadily...
-
Site Reliability Engineer
2 days ago
Cluj-Napoca, Cluj, Romania Garmin Full time 15,000 - 30,000 per yearAt Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Cloud Platform Technology (CPT) is the core of the company which helps all segments and engineers to have reliable and sustainable tools to perform the day-to-day business. The CPT...
-
Site Reliability Engineer
7 days ago
Cluj-Napoca, Cluj, Romania Human Direct Full time 50,000 - 70,000 per yearRole SummaryThis is a hybrid role that balances proactive engineering projects—such as enhancing automation and scaling Kubernetes—with a strong focus on operational excellence. You'll contribute to both the day-to-day stability and the long-term reliability of production systems.It's an exciting opportunity to make a real impact: our client is in the...
-
Reliability & Condition Monitoring Engineer
2 weeks ago
Cluj-Napoca, Cluj, Romania Emerson Career Site Full time €40,000 - €80,000 per yearAs the Machinery Health Management Systems Engineer (MHM Engineer), you will be responsible for properly completing the projects based on customer specifications and keeping them in line with the project strategy. In this role, this person will also be responsible for implementing, installing, modifying, testing, and validating condition monitoring & machine...
-
Junior Project Engineer
5 days ago
Cluj-Napoca, Cluj, Romania Emerson Career Site Full time €30,000 - €60,000 per yearAs a Junior Project Engineer, your role will involve completing projects that deliver Distributed Control Systems for process industry automation. These industries include Pharmaceutical, Chemical, Oil & Gas, and Refining. In this role, you will be responsible for the implementation and factory acceptance test phases. You will also be responsible for...
-
OpenStack | Senior Site Reliability Engineer
7 days ago
Cluj-Napoca, Cluj, Romania OpenText Full time €40,000 - €80,000 per yearOPENTEXTOpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of...
-
Proposal Engineer
7 days ago
Cluj-Napoca, Cluj, Romania Emerson Career Site Full time 15,000 - 30,000 per yearAs the Project Proposal Engineer in Cluj-Napoca, you will be responsible for end-to-end delivery of Technical & Commercial Project Proposals to our customers in Europe across various process industries (such as Life Sciences, Chemical, Energy, Power and Renewables) within automation projects business portfolio and support Emerson sales teams in the pursuit...
-
Proposal Engineer
2 days ago
Cluj-Napoca, Cluj, Romania Emerson Career Site Full time €30,000 - €90,000 per yearAs the Proposal Engineer (24 months) in Cluj-Napoca, you will be responsible for end-to-end delivery of Technical & Commercial Project Proposals to our customers in Europe across various process industries (such as Life Sciences, Chemical, Energy, Power and Renewables) within automation projects business portfolio and support Emerson sales teams in the...
-
Senior Project Engineer
2 days ago
Cluj-Napoca, Cluj, Romania Aspen Technology Full time €30,000 - €60,000 per yearThe driving force behind our success has always been the people of AspenTech. What drives us, is our aspiration, our desire and ambition to keep pushing the envelope, overcoming any hurdle, challenging the status quo to continually find a better way. You will experience these qualities of passion, pride and aspiration in many ways — from a rich set of...