CloudOps Site Reliability Engineer

Tacoma, Washington, United States; Austin, Texas, United States; Alpharetta, Georgia, United States

Applications have closed

Infoblox

Infoblox unites networking and security, empowering customers to deliver better performance and protection and ensure their businesses thrive.

View all jobs at Infoblox

It’s an exciting time to be at Infoblox. Named a Top 25 Cyber Security Company by The Software Report and one of Inc. magazine’s Best Workplaces for 2020, Infoblox is the leader in cloud-first networking and security services. Our solutions empower organizations to take full advantage of the cloud to deliver network experiences that are inherently simple, scalable, and reliable for everyone. Infoblox customers are among the largest enterprises in the world and include 70% of the Fortune 500, and our success depends on bright, energetic, talented people who share a passion for building the next generation of networking technologies—and having fun along the way. 

We are looking for CloudOps Site Reliability Engineer to join our Incident Management Engineering team located in Tacoma, WA, or remote, reporting to the manager of Cloud Operations. In this role, you will be part of the Incident Management team responsible for the monitoring and support of Infoblox cloud-based services. You will monitor and maintain the infrastructure that runs our SaaS services, as well as ensure these services are running at peak performance. You will also be responsible for maintaining the services and assisting in the automation that enables Infoblox services in the cloud.

You are the ideal candidate if you are a proactive, hands-on professional who picks up new technology quickly, has excellent interpersonal skills, and is driven to find solutions while collaborating across teams.

What you’ll do:

  • Provide real-time monitoring, triage, and escalation of critical and major issues and incoming alarms within the environment
  • Participate in incident management calls and coordinate response, triage, recovery, and reporting of incidents
  • Actively engage through the service restoration and ensure senior leadership is aware of activities being carried out
  • Expand and mature existing incident response processes and activities, including managing and administering the runbook
  • Partner with Engineering and NOC to prepare and present RCA reports for incidents, their impact, and resolution
  • Implement and utilize SRE developed tools for incident response
  • Assist in the development of resilient and self-scaling systems
  • Lead complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility

What you’ll bring:

  • Minimum 5 years of combined experience in DevOps, SRE, and/or incident management and monitoring tools
  • Hands-on experience with cloud architecture and deploying infrastructure in a cloud environment
  • Solid networking experience, such as TCP/IP, BGP routing, load balancing, and DNS
  • Experience with monitoring tools, such as Grafana, Loki, PagerDuty, AWS Lambda, etc.
  • Experience with Linux distributions, including CentOS, Ubuntu, and Amazon Linux
  • Experience with Amazon Web Services, including EC2, VPC, ELB, S3, RDS, CloudFormation, etc.
  • Experience with configuration management, such as Terraform, Chef, Puppet, Ansible, and/or Salt
  • Experience with monitoring tools and CI/CD toolchain, like Git, Jenkins, or Spinnaker
  • Experience with Python, Java, Golang, Kubernetes, Linux Containers, and Docker is preferred
  • Bachelor's degree in computer science, information security, computer engineering or electrical engineering is required

What success looks like:

After six months, you will…

  • Provide real-time monitoring, triage, and escalation of critical and major issues and incoming alarms within the environment
  • Participate in incident management calls and coordinate response, triage, recovery, and reporting of incidents

After about a year, you will…

  • Partner with SRE/DevOps to resolve infrastructure maintenance tasks, internal access request/issues and management of monitoring and CI/CD tools
  • Use knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization
  • Partner with Engineering stakeholders to develop runbooks and implement application monitoring and RCA action items

We’ve got you covered:

In the spirit of pay transparency, we are excited to share our compensation philosophy. At Infoblox, we believe in paying for performance. You can expect our employment offers to take many factors into consideration, including but not limited to the location of the role, internal equity, applicable past experience, individual skill set, education, and professional certifications. Please keep in mind that the range mentioned is the base salary range for the role. The typical base salary range for this position is $96,500 -$140,690 plus corporate bonus.

Our holistic benefits package includes coverage of your health, wealth, and wellness—as well as a great work environment, employee programs, and company culture. We offer a competitive salary and benefits package, including a 401k with company match and generous paid time off to help you balance your life. We have a strong culture and live our values every day—we believe in transparency, curiosity, respect, and above all, having fun while delighting our customers. 

Speaking of a great work environment, here are just a few of the perks you may enjoy, depending on your location…

  • Onsite massages, clubs, farmers market, and fitness classes
  • Delicious and healthy snacks and beverages
  • Electric vehicle charging stations
  • Outdoor amenities, seating, and courtyard BBQ
  • Dog park and pet-friendly programs
  • Newly remodeled offices with state-of-the-art amenities

Why Infoblox?

We’ve created a culture that embraces diversity, equity, and inclusion and rewards innovation, curiosity, and creativity. We achieve remarkable results by working together in a supportive environment that focuses on continuous learning and embraces change. So, whether you’re a software engineer, marketing manager, customer care pro, or product specialist, you belong here, where you will have the opportunity to grow and develop your career. Check out what it’s like to be a Bloxer. We think you’ll be excited to join our team. 

 

#LI-ME1   

#LI-Hybrid 

Job stats:  4  1  0

Tags: Ansible Automation AWS CI/CD Cloud Computer Science DevOps DNS Docker EC2 Golang Grafana Incident response Java Jenkins Kubernetes Lambda Linux Loki Monitoring Puppet Python S3 SaaS TCP/IP Terraform Ubuntu

Perks/benefits: 401(k) matching Career development Competitive pay Equity / stock options Health care Salary bonus Snacks / Drinks Team events Wellness

Region: North America
Country: United States

More jobs like this

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.