Senior Site Reliability Engineer
Galway Remote
Applications have closed
Oomnitza
Enterprise technology management solutions which increases IT agility and mitigates security risks with key business process automations.At Oomnitza, we’re passionate about building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions - using DevSecOps methodologies. Our dynamic and innovative team is growing and we are looking to add a highly motivated and experienced Site Reliability Engineer to the team. As an experienced DevSecOps practitioner we will look to you to operate and deliver working systems based on insights gathered from massive scale data in real time, ensuring Oomnitza’s internal and external services are reliable while keeping an ever-watchful eye on our systems, capacity, and performance. You’ll have the opportunity to experience the complex challenges of building and running large-scale, fault tolerant, and secure distributed microservice based systems worldwide. Specifically, we are searching for someone who:- Brings fresh ideas to the table, and demonstrates a unique and informed viewpoint- Enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction
Responsibilities:
- Gather and analyze metrics from our platform and applications to continually improve our performance tuning and fault finding
- Partner with our world-class engineering teams to improve services through rigorous testing and release procedures
- Create sustainable systems and services through automation and uplifts while working closely with engineering professionals within the company to enable projects to be completed efficiently
- Develop, monitor, and manage the entire system landscape by balancing feature development speed and reliability with well-defined service level objectives, ensuring minimal downtime and maximum availability.
- Participate in the development and implementation of practices, procedures, and technology to ensure our system landscapes are operating within our Security, Compliance, and Availability commitments.
- Plan, prepare, and execute system upgrades.
- Mentor and train other engineers throughout the company and seek to continually improve processes company-wide
- This position will be part of an on-call rotation
Qualifications:
- Kubernetes: Extensive experience with container orchestration and managing production clusters, focusing on deployment, scaling, and troubleshooting within Kubernetes environments. Proven ability to set up and manage Kubernetes clusters effectively for enterprise applications. Experience with Amazon EKS is a plus.
- Configuration Management: Proficiency in tools like Ansible, Helm, and Kustomize for automating infrastructure provisioning, configuration, and deployment. Skilled in managing Kubernetes manifests and application releases to streamline processes and ensure consistency across various deployment environments.
- Monitoring: Experience with Prometheus, Grafana, or similar to proactively track system health, detect anomalies, and optimize performance across the platform.
- AWS Cloud Services: Deep knowledge of the AWS ecosystem, including EC2, S3, IAM, VPC, and other essential services for building and managing scalable infrastructure.
- Infrastructure as Code (IaC): Hands-on experience with Terraform to provision and manage cloud resources, ensuring version control, repeatability, and efficiency in infrastructure deployment.
- Queuing Systems: Familiarity with message queuing systems like RabbitMQ and Kafka, as well as managed queuing services such as AmazonMQ. Skilled in setting up, managing, and optimizing message brokers for high-throughput, reliable communication between distributed systems.
- Database Management: Strong background in managing MySQL databases and leveraging Amazon RDS for high availability, performance tuning, and secure database management in cloud environments.
- Networking and Security Best Practices: Understanding of network design and security protocols to protect systems, enforce compliance, and meet industry-standard audit requirements.
- High-Uptime / Low-Downtime Environments: Experience ensuring high uptime agreements for critical systems, implementing strategies for fault tolerance, disaster recovery, and proactive monitoring to maintain service availability and minimize downtime.
- Cross-functional Collaboration: Proven ability to work effectively with cross-functional teams from multiple departments to achieve project goals and execute project plans in an orderly and efficient manner.
- Programming Skills: Ability to develop and maintain code in one or more high-level programming languages such as Python, Go, or JavaScript. Familiarity with modern development tools and CI/CD pipelines to automate testing, deployment, and monitoring.
- Problem Solving and Performance Optimization: A proactive mindset towards identifying system issues, areas for process improvement, and resolving performance bottlenecks.
What We Can Offer You:
- Healthcare for dependents and spouse
- A progressive, healthy work culture with excellent opportunities for professional and personal development.
- Top performers will have an opportunity to help shape the team. Working directly with the founders to drive initiatives and create a structure that scales.
- A once-in-a-lifetime career opportunity to get onboard a fast-growing business that is venture-backed by C5 Capital, Shasta Ventures, Riverside Acceleration Capital, and Hummer Winblad
Our Benefits Package:
- Dental & Vision Insurance
- Employee equity plan
- Health Insurance for your spouse and dependents
- Pension, Life insurance and Income protection
- Remote working & flexible work schedules Working from home equipment allowance
- Choice of preferred equipment, Mac or PC.
- Regular, fun social events and workshops.
Oomnitza recruits, employs, trains, compensates and promotes regardless of race, religion, color, national origin, sex, disability, age, veteran status, and other protected status as required by applicable law.
* Salary range is an estimate based on our InfoSec / Cybersecurity Salary Index 💰
Tags: Ansible Audits Automation AWS CI/CD Cloud Compliance DevSecOps EC2 Grafana Helm IAM JavaScript Kafka Kubernetes Monitoring MySQL Prometheus Python RabbitMQ S3 SaaS Terraform
Perks/benefits: Career development Equity / stock options Flex hours Health care Insurance Startup environment Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.