Cloud Site Reliability Staff Engineer

Ottawa, ON, Canada

Barracuda Networks Inc.

Barracuda Networks is the worldwide leader in Email Protection, Application Protection, Network Security, and Data Protection Solutions

View all jobs at Barracuda Networks Inc.

Apply now Apply later

Req ID: 26-124 Managed Service Provider (MSP) and Managed Extended Detection and Response (XDR)Come join our passionate team! Barracuda is a leading cybersecurity company providing complete protection against complex threats. Our platform protects email, data, applications, and networks with innovative solutions, and a managed XDR service, to strengthen cyber resilience. Hundreds of thousands of IT professionals and managed service providers worldwide trust us to protect and support them with solutions that are easy to buy, deploy, and use. We are committed to a candidate selection process and work environment that is inclusive and barrier free. To ensure candidates are assessed in a fair and equitable manner, accommodations will be provided to prospective employees in accordance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code.Envision yourself at Barracuda We seek a passionate and experienced Site Reliability Staff Engineer (SRE) for the Managed Service Provider (MSP)and Managed XDR business units with great technical acumen and a strong background in operations, automation, implementation, and development.  As a Staff SRE, you will be responsible for ensuring the availability of high volume, critical SaaS applications and seamless scaling. The application portfolio ranges from a broad spectrum of MSP and XDR products. What will you be working on: 
  • Application Infrastructure Design: Engage with internal customers to understand application design and cloud infrastructure needs, focusing on scalability, security, and reliability
  • Infrastructure Automation: Create and design templates, tools, and accelerators for deployment infrastructure to support development teams
  • Architectural Leadership: Lead architectural decisions and approve major system design changes, implementing contemporary architectural patterns
  • Platform Development: Design and develop self-service platforms for Product Engineering teams
  • Service Level Management: Define, implement, and track SLIs, SLOs, and SLAs across services
  • Incident Management: Lead incident response processes and conduct post-incident learning reviews
  • Disaster Recovery: Develop and maintain disaster recovery and business continuity plans
  • Technical Design: Plan and implement non-functional requirements including security, performance, deployment frequency, and monitoring
  • Solution Architecture: Oversee architecture snapshots, solution design, prototyping, and code reviews
  • Technology Stack Implementation: Drive modern solutions using AWS, Kubernetes, GitHub Actions, Jenkins, Terraform, Pulumi, and other current technologies
  • Data Infrastructure: Build support infrastructure for global data pipeline and storage using Databricks, Spark, and ELK stack
  • Deployment Automation: Lead initiatives to convert manual deployments to automated processes
  • Observability Systems: Build and enhance monitoring and reliability systems
  • On-Call Duties: Participate in on-call rotation to ensure 24/7 system reliability
  • Team Development: Mentor junior team members and foster a positive team culture
What you bring to the role:
  • Technical Expertise: 10+ years hands-on infrastructure design experience, including 5+ years cloud development and 3+ years in SRE/DevOps roles 
  • Cloud Infrastructure: Deep expertise in AWS cloud infrastructure development, security, and operations with proven success in large-scale production environments 
  • Infrastructure as Code: Extensive experience with Terraform, CloudFormation, Pulumi, and Crossplane for cloud infrastructure automation 
  • CI/CD & Automation: Strong background with GitHub, GitHub Actions, Jenkins, Packer, Ansible, and Puppet 
  • Deployment Patterns: Expertise in blue/green, canary, rolling deployments, and draining strategies 
  • Container Orchestration: Comprehensive experience with Docker, Kubernetes, and EKS in AWS environments 
  • Programming: Strong coding abilities in Python, Go, Ruby etc.  
  • Operating Systems: Advanced Linux knowledge including system internals 
  • Observability: Extensive experience with New Relic, Elastic APM, CloudWatch, Prometheus, and Grafana... 
  • Data Engineering: Experience with Databricks, Apache Spark, Kafka, and DataStage 
  • Problem Solving: Strong systematic debugging and troubleshooting capabilities 
  • Communication: Excellent verbal and written communication skills 
  • Certifications: AWS certifications (Solutions Architect, DevOps) and Kubernetes certifications (CKA, CKAD, CKS) a plus 
What you’ll get from us:A team where you can voice your opinion, make an impact, and where you and your experience are valued. Internal mobility – there are opportunities for cross training and the ability to attain your next career step within Barracuda. In addition, you will receive equity, in the form of non-qualifying options. The anticipated on-target earnings range for this role is CAD 122,000 to CAD 162,000. Actual compensation offered will be dependent upon the individual's skills, experience, and qualifications as they directly relate to the requirements of the position, the budget for the position, and applicable employment laws. #LI-hybrid
Apply now Apply later
Job stats:  1  0  0

Tags: Ansible Automation AWS CI/CD Cloud Databricks DevOps Docker ELK GitHub Grafana Incident response Jenkins Kafka Kubernetes Linux Monitoring Prometheus Prototyping Puppet Python Ruby SaaS SLAs SLOs Terraform XDR

Perks/benefits: Career development Equity / stock options

Region: North America
Country: Canada

More jobs like this

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.