Site Reliability Engineer
6 days ago
About us:
The product directly controls and optimizes refineries and chemical plants with AI to add millions of dollars to the plant's bottom line while managing safe operating limits, energy efficiency, and sustainability objectives. The Closed Loop Neural Network platform allows customers to leverage Reinforcement Learning (RL) developed by world-class machine learning scientists. Through our patented approach to applying RL for industrial processes, industry leaders have been able to fundamentally change the way they optimize their plants and improve profitability in real-time. The solution is currently optimizing the manufacturing facilities of Fortune-500 companies and it has combined the industry expertise from companies like Exxon and Shell with award-winning data scientists endorsed by Google. The company is backed by tier-1 venture capital firms such as Insight Partners.
We are looking for:
A top-notch Site Reliability Engineer who will design and support our cloud infrastructure. You will work with a variety of cloud technologies, automation, and infrastructure-as-code. Additionally, our SREs keep an ever-watchful eye on our system's capacity and performance. Much of our time is spent optimizing existing systems, building infrastructure and reducing repetitive work through automation.
You will also play a critical role in incident management, swiftly identifying and resolving issues to minimize downtime and ensure seamless operations. Collaboration is key in this role, as you will work closely with software developers, DevOps engineers, and other stakeholders to implement robust solutions and drive continuous improvement. As a proactive member of our team, you will stay updated with the latest industry trends and best practices, applying this knowledge to enhance our infrastructure's resilience and scalability. Your contributions will directly impact the reliability and efficiency of our services, making you an integral part of our success.
In Your Role You Will:
- Design, deploy and maintain cloud infrastructure to provide high uptime, scalability and security.
- Leverage public cloud services and tools to improve efficiency and reliability of our services and workflows.
- Architect and manage cross-cloud network infrastructure (e.g. subnets, routing tables, IPSec VPNs, Transit Gateways, firewall rules).
- Engage in and improve the whole lifecycle of services, from inception and design, through deployment, operation and refinement.
- Participate in infrastructure on-call rotation and respond in a timely manner.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Our toolbox includes: Kubernetes (GKE and AKS), Lambda (AWS and GCP Cloud Functions), Git, Docker, Terraform and Ansible, Helm, Python and Bash, Gitlab, Jenkins.
Requirements:
- 5 years experience maintaining production level cloud infrastructure, including public cloud services (e.g. AWS, GCP).
- Preferred BA/B.Sc. in Computer Science or equivalent.
- Experience with a programming language such as Python or Go.
- Experience deploying and supporting services in Kubernetes, including GitOps management tools such as ArgoCD.
- Familiarity with software development principles/concepts (e.g. Version control (Git), software development lifecycle).
- Experience implementing and utilizing monitoring tools (e.g New Relic, Splunk, Grafana, Prometheus).
- Experience managing production databases (e.g. PostgreSQL), including managed services (e.g. AWS RDS).
- Experience with Infrastructure-as-code concepts and tools (e.g. Terraform, Ansible).
- Cluj based and open to hybrid work 1-2 times per week
Nice to have:
- Development experience, preferably in Python
- Hands-on production troubleshooting experience
- Experience in maintaining web-based products
- Experience in handling on-premise infrastructure
- Experience building out VPC
What we offer:
- Great salary (CIM/PFA/SRL), long-term and stable employment, regular reviews
- Possibility to work fully flexible regarding working hours
- Temporary remote work from anywhere in the world for a limited amount of time (after 6 months of employment)
- Awesome employee referral program - between 1,000 and 2,000 euros for each successful referral
- Work with Cutting-edge technologies
-
Senior Site Reliability Engineer
6 days ago
Cluj-Napoca, Cluj, Romania ING Hubs Romania Full time €40,000 - €80,000 per yearDiscover ING Hubs RomaniaING Hubs Romania offers 130 services in software development, data management, non-financial risk andamp; compliance, audit, and retail operations to 24 ING units worldwide, with the help of over 2000 hi h-performin en ineers, risk, and operations professionals.We started out in 2015 as ING's software development hub, then steadily...
-
Site Reliability
2 weeks ago
Cluj-Napoca, Cluj, Romania Canonical - Jobs Full time €40,000 - €80,000 per yearCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
-
Site Reliability Engineer
4 days ago
Cluj-Napoca, Cluj, Romania Flutter International Full time €40,000 - €80,000 per yearSite Reliability Engineer - Flutter Functions, HybridAnti-Money Laundry 1About Betfair Romania Development:Betfair Romania Development is the largest technology hub of Flutter Entertainment, with over 2,000 people powering the world's leading sports betting and iGaming brands. Exciting, immersive and safe experiences are delivered to over 18 million...
-
Site Reliability Engineer
2 days ago
Cluj-Napoca, Cluj, Romania Garmin Cluj Full time 30,000 - 60,000 per yearAt Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Cloud Platform Technology (CPT) is the core of the company which helps all segments and engineers to have reliable and sustainable tools to perform the day-to-day business. The CPT...
-
Site Reliability Engineer
4 days ago
Cluj-Napoca, Cluj, Romania Human Direct Full time 50,000 - 70,000 per yearRole SummaryThis is a hybrid role that balances proactive engineering projects—such as enhancing automation and scaling Kubernetes—with a strong focus on operational excellence. You'll contribute to both the day-to-day stability and the long-term reliability of production systems.It's an exciting opportunity to make a real impact: our client is in the...
-
Senior Site Reliability
2 weeks ago
Cluj-Napoca, Cluj, Romania Canonical - Jobs Full time €80,000 - €120,000 per yearCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
-
OpenStack | Site Reliability Engineer
4 days ago
Cluj-Napoca, Cluj, Romania OpenText Full time €80,000 - €120,000 per yearOPENTEXTOpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of...
-
Site Reliability Engineer
2 weeks ago
Cluj-Napoca, Cluj, Romania Garmin Cluj Full time 30,000 - 60,000 per yearAt Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Garmin Private Cloud (GPC) will be our internal cloud, developed entirely using open-source technologies such as OpenStack and Kubernetes. GPC will enable Garmin to fully manage the...
-
Site Reliability Engineer
2 days ago
Cluj-Napoca, Cluj, Romania Garmin Cluj Full time €40,000 - €80,000 per yearAt Garmin we create products that are designed indoors for outdoor activities. We do this to enable our customers to make the most of their time spent pursuing their passions.Garmin Private Cloud (GPC) will be our internal cloud, developed entirely using open-source technologies such as OpenStack and Kubernetes. GPC will enable Garmin to fully manage the...
-
Reliability & Condition Monitoring Engineer
1 week ago
Cluj-Napoca, Cluj, Romania Emerson Full time €40,000 - €80,000 per yearAs the Machinery Health Management Systems Engineer (MHM Engineer), you will be responsible for properly completing the projects based on customer specifications and keeping them in line with the project strategy. In this role, this person will also be responsible for implementing, installing, modifying, testing, and validating condition monitoring & machine...