Manager, Reliability and Observability

US MO Remote, United States

Zelis

Discover the connected platform that's bridging gaps and aligning interests of healthcare payers, providers, and healthcare consumers.

View all jobs at Zelis

Apply now Apply later

Zelis is looking for an experienced and visionary Manager, Reliability and Observability to lead a team of specialized Site Reliability Engineers, including the Disaster Recovery Generalist, Vulnerability Specialist, Golden Signals Lead, and Database Operations Engineer. This role focuses on driving excellence in system reliability, observability, and operational efficiency across the organization. The Manager, Reliability & Observability will establish strategic direction, manage day-to-day operations, and foster collaboration between IT, engineering, and infrastructure teams to achieve measurable results in platform reliability and resilience.

What You'll Do:

Team Leadership and Development:

  • Oversee the Reliability and Observability team, providing leadership, mentorship, and career development opportunities.

  • Align team goals with organizational objectives, ensuring each specialist is effectively contributing to system reliability and operational priorities.

  • Foster a collaborative and high-performing team culture that encourages innovation and continuous improvement.

Strategic Planning and Execution:

  • Define and execute the strategic roadmap for SRE practices, focusing on disaster recovery, vulnerability management, observability, and database health.

  • Ensure alignment between the Reliability and Observability team and enterprise infrastructure teams, bridging gaps and reducing silos.

  • Set clear KPIs for reliability, performance, and incident response, and hold the team accountable for achieving them.

Cross-Functional Collaboration:

  • Partner with engineering, production, and infrastructure teams to improve system resilience and scalability.

  • Serve as the escalation point for major incidents and ensure timely and effective resolution.

  • Advocate for best practices in system architecture, monitoring, and security across all platforms.

Operational Excellence:

  • Oversee disaster recovery planning, ensuring robust processes for backup, testing, and response.

  • Drive initiatives to reduce and prevent vulnerabilities in code and infrastructure, collaborating with the Vulnerability Specialist.

  • Support the Golden Signals Lead in defining consistent observability practices and optimizing monitoring tools.

  • Ensure the Data Engineer has the resources and support needed to maintain database health and resolve production issues.

Reporting and Continuous Improvement:

  • Develop and deliver regular reports on system reliability metrics, vulnerabilities, and incident trends to senior leadership.

  • Identify opportunities for automation, tooling improvements, and resource optimization.

  • Champion a culture of learning from incidents, implementing feedback to prevent recurring issues.

What You'll Bring to Zelis:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent experience).

  • 8+ years of experience in IT Operations, Site Reliability Engineering, or related fields.

  • 3+ years of experience in managing technical teams, preferably with diverse areas of expertise.

  • Experience with disaster recovery, observability practices, vulnerability management, and database optimization.

  • Strong understanding of SRE principles, including golden signals, incident management, and post-incident reviews.

  • Familiarity with monitoring tools (e.g., Splunk, Grafana, Prometheus), cloud platforms (AWS, Azure, GCP), and CI/CD pipelines.

  • Knowledge of database systems and performance optimization techniques.

  • Experience with vulnerability management tools and frameworks (e.g., OWASP, Nessus).

  • Exceptional leadership and team management skills, with the ability to inspire and motivate diverse teams.

  • Strong problem-solving and analytical abilities, with a focus on data-driven decision-making.

  • Excellent communication and interpersonal skills, with the ability to influence stakeholders at all levels.

  • Certifications in cloud platforms, DevOps, or security (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator, CISSP).

  • Experience managing globally distributed teams helpful.

  • Knowledge of automation tools and scripting languages (e.g., Python, Terraform).

Location and Workplace Flexibility
We have offices in Atlanta GA, Boston MA, Morristown NJ, Plano TX, St. Louis MO, St. Petersburg FL, and Hyderabad, India. We foster a hybrid and remote friendly culture, and all our employee's work locations are based on the needs of the position and determined by the Leadership team. In-office work and activities, if applicable, vary based on the work and team objectives in accordance with Company policies.  
 

Zelis is modernizing the healthcare financial experience by providing a connected platform that bridges the gaps and aligns interests across payers, providers, and healthcare consumers. This platform serves more than 750 payers, including the top 5 national health plans, BCBS insurers, regional health plans, TPAs and self-insured employers, and millions of healthcare providers and consumers. Zelis sees across the system to identify, optimize, and solve problems holistically with technology built by healthcare experts – driving real, measurable results for clients.

Commitment to Diversity, Equity, Inclusion, and Belonging 
At Zelis, we champion diversity, equity, inclusion, and belonging in all aspects of our operations. We embrace the power of diversity and create an environment where people can bring their authentic and best selves to work. We know that a sense of belonging is key not only to your success at Zelis, but also to your ability to bring your best each day.

Equal Employment Opportunity  
Zelis is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. 

We encourage members of traditionally underrepresented communities to apply, even if you do not believe you 100% fit the qualifications of the position, including women, LGBTQIA people, people of color, and people with disabilities.  

Accessibility Support 

We are dedicated to ensuring our application process is accessible to all candidates. If you are a qualified individual with a disability or a disabled veteran and require a reasonable accommodation with any part of the application and/or interview process, please email TalentAcquisition@zelis.com.  

SCAM ALERT: There is an active nationwide employment scam which is now using Zelis to garner personal information or financial scams. This site is secure, and any applications made here are with our legitimate partner. If you’re contacted by a Zelis Recruiter, please ensure whomever is contacting you truly represents Zelis Healthcare. We will never asked for the exchange of any money or credit card details during the recruitment process. Please be aware of any suspicious email activity from people who could be pretending to be recruiters or senior professionals at Zelis.

Apply now Apply later

* Salary range is an estimate based on our InfoSec / Cybersecurity Salary Index 💰

Job stats:  2  1  0
Category: Leadership Jobs

Tags: Automation AWS Azure CI/CD CISSP Cloud Computer Science DevOps GCP Grafana Incident response KPIs Kubernetes Monitoring Nessus OWASP Prometheus Python Scripting Splunk Terraform Vulnerabilities Vulnerability management

Perks/benefits: Career development

Regions: Remote/Anywhere North America
Country: United States

More jobs like this

Explore more career opportunities

Find even more open roles below ordered by popularity of job title or skills/products/technologies used.