Cloud Site Reliability Staff Engineer

Ottawa, ON, Canada

Full Time Senior-level / Expert CAD 122K - 162K

Barracuda Networks Inc.

Barracuda Networks is the worldwide leader in Email Protection, Application Protection, Network Security, and Data Protection Solutions

View all jobs at Barracuda Networks Inc.

Apply now Apply later

Posted 3 weeks ago

Req ID: 26-124 Managed Service Provider (MSP) and Managed Extended Detection and Response (XDR)Come join our passionate team! Barracuda is a leading cybersecurity company providing complete protection against complex threats. Our platform protects email, data, applications, and networks with innovative solutions, and a managed XDR service, to strengthen cyber resilience. Hundreds of thousands of IT professionals and managed service providers worldwide trust us to protect and support them with solutions that are easy to buy, deploy, and use. We are committed to a candidate selection process and work environment that is inclusive and barrier free. To ensure candidates are assessed in a fair and equitable manner, accommodations will be provided to prospective employees in accordance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code.Envision yourself at Barracuda We seek a passionate and experienced Site Reliability Staff Engineer (SRE) for the Managed Service Provider (MSP)and Managed XDR business units with great technical acumen and a strong background in operations, automation, implementation, and development. As a Staff SRE, you will be responsible for ensuring the availability of high volume, critical SaaS applications and seamless scaling. The application portfolio ranges from a broad spectrum of MSP and XDR products. What will you be working on:

Application Infrastructure Design: Engage with internal customers to understand application design and cloud infrastructure needs, focusing on scalability, security, and reliability
Infrastructure Automation: Create and design templates, tools, and accelerators for deployment infrastructure to support development teams
Architectural Leadership: Lead architectural decisions and approve major system design changes, implementing contemporary architectural patterns
Platform Development: Design and develop self-service platforms for Product Engineering teams
Service Level Management: Define, implement, and track SLIs, SLOs, and SLAs across services
Incident Management: Lead incident response processes and conduct post-incident learning reviews
Disaster Recovery: Develop and maintain disaster recovery and business continuity plans
Technical Design: Plan and implement non-functional requirements including security, performance, deployment frequency, and monitoring
Solution Architecture: Oversee architecture snapshots, solution design, prototyping, and code reviews
Technology Stack Implementation: Drive modern solutions using AWS, Kubernetes, GitHub Actions, Jenkins, Terraform, Pulumi, and other current technologies
Data Infrastructure: Build support infrastructure for global data pipeline and storage using Databricks, Spark, and ELK stack
Deployment Automation: Lead initiatives to convert manual deployments to automated processes
Observability Systems: Build and enhance monitoring and reliability systems
On-Call Duties: Participate in on-call rotation to ensure 24/7 system reliability
Team Development: Mentor junior team members and foster a positive team culture

What you bring to the role:

Technical Expertise: 10+ years hands-on infrastructure design experience, including 5+ years cloud development and 3+ years in SRE/DevOps roles
Cloud Infrastructure: Deep expertise in AWS cloud infrastructure development, security, and operations with proven success in large-scale production environments
Infrastructure as Code: Extensive experience with Terraform, CloudFormation, Pulumi, and Crossplane for cloud infrastructure automation
CI/CD & Automation: Strong background with GitHub, GitHub Actions, Jenkins, Packer, Ansible, and Puppet
Deployment Patterns: Expertise in blue/green, canary, rolling deployments, and draining strategies
Container Orchestration: Comprehensive experience with Docker, Kubernetes, and EKS in AWS environments
Programming: Strong coding abilities in Python, Go, Ruby etc.
Operating Systems: Advanced Linux knowledge including system internals
Observability: Extensive experience with New Relic, Elastic APM, CloudWatch, Prometheus, and Grafana...
Data Engineering: Experience with Databricks, Apache Spark, Kafka, and DataStage
Problem Solving: Strong systematic debugging and troubleshooting capabilities
Communication: Excellent verbal and written communication skills
Certifications: AWS certifications (Solutions Architect, DevOps) and Kubernetes certifications (CKA, CKAD, CKS) a plus

What you’ll get from us:A team where you can voice your opinion, make an impact, and where you and your experience are valued. Internal mobility – there are opportunities for cross training and the ability to attain your next career step within Barracuda. In addition, you will receive equity, in the form of non-qualifying options. The anticipated on-target earnings range for this role is CAD 122,000 to CAD 162,000. Actual compensation offered will be dependent upon the individual's skills, experience, and qualifications as they directly relate to the requirements of the position, the budget for the position, and applicable employment laws. #LI-hybrid