Observability Management Lead
Charlotte NC - 2320 Cascade Pointe Boulevard, United States
Truist
Your journey to better banking starts with Truist. Checking and savings accounts, credit cards, mortgages, small business, commercial banking, and more.The position is described below. If you want to apply, click the Apply Now button at the top or bottom of this page. After you click Apply Now and complete your application, you'll be invited to create a profile, which will let you see your application status and any communications. If you already have a profile with us, you can log in to check status.
If you have a disability and need assistance with the application, you can request a reasonable accommodation. Send an email to Accessibility (accommodation requests only; other inquiries won't receive a response).
Regular or Temporary:
RegularLanguage Fluency: English (Required)
Work Shift:
1st shift (United States of America)Please review the following job description:
The Head of Observability and Monitoring will lead the strategy, architecture, and implementation of observability, monitoring, and telemetry capabilities within a regulated banking environment. This role is critical to ensuring the resilience, performance, and security of the Bank’s technology landscape. The ideal candidate will possess deep technical expertise, a strategic mindset, and strong collaboration skills to drive best-in-class monitoring solutions that align with regulatory and business requirements.ESSENTIAL DUTIES AND RESPONSIBILITIES
Technical Leadership & Expertise:
Develop and execute a comprehensive observability strategy, integrating logging, metrics, and distributed tracing across the Bank’s technology stack.
Lead the design and deployment of monitoring platforms, ensuring real-time visibility into system performance, availability, and security threats.
Own the end-to-end observability architecture, including tools selection, automation, and integration with cloud, on-prem, and hybrid environments.
Drive the adoption of AI/ML-powered monitoring to enhance anomaly detection, predictive analytics, and automated incident response.
Ensure robust service level indicators (SLIs), service level objectives (SLOs), and error budgets are established and tracked for critical services.
Strategic Planning & Governance:
Define and implement observability governance frameworks, ensuring compliance with regulatory requirements (e.g., FFIEC, OCC, Basel III, GDPR).
Develop strategies to support real-time monitoring, root cause analysis, and proactive remediation to minimize downtime and business impact.
Partner with engineering, security, business unit, risk, and compliance teams to align observability initiatives with operational stability and performance targets, continuity and disaster recovery plans.
Champion operational resilience by ensuring monitoring covers end-to-end customer journeys, critical business services, and third-party dependencies.
Establish and maintain a centralized observability platform, standardizing logging and metrics collection across microservices, APIs, databases, and infrastructure.
Collaboration & Stakeholder Management:
Work closely with platform teams to embed observability best practices into CI/CD pipelines and software development lifecycles.
Partner with Cybersecurity to integrate security monitoring, anomaly detection, and threat intelligence into observability solutions.
Engage with business and operations teams to ensure monitoring capabilities support customer experience, regulatory reporting, and incident management.
Serve as the Bank’s SME on observability, engaging with industry forums, vendors, and regulatory bodies to stay ahead of trends and compliance needs.
Technical Skills:
Proven expertise in modern observability stacks, including Splunk, Dynatrace, AppDynamics, ThousandEyes, ServiceNow AIOps or Datadog.
Deep understanding of cloud-native monitoring across AWS, Azure, and Google Cloud, including serverless, Kubernetes, and container-based architectures.
Strong hands-on experience with log aggregation, tracing (Jaeger, Zipkin), and APM (Application Performance Monitoring).
Knowledge of AI-driven monitoring, automated remediation, and self-healing infrastructure.
Familiarity with SIEM tools and security monitoring, ensuring alignment with SOC and threat detection capabilities.
Experience in API monitoring, network telemetry, and database performance tuning.
Leadership & Strategic Experience:
10+ years of experience in observability, monitoring, or infrastructure resilience roles within regulated financial services or banking environments.
Proven track record of designing and implementing enterprise-scale observability platforms in a complex, multi-cloud environment.
Experience leading cross-functional teams to drive cultural adoption of observability and monitoring best practices.
Strong knowledge of regulatory and compliance requirements related to operational resilience, incident management, and monitoring.
Soft Skills & Collaboration:
Ability to translate complex technical monitoring data into actionable insights for senior executives and non-technical stakeholders.
Strong problem-solving skills with a proactive and forward-thinking approach to technology and resilience.
Excellent communication and leadership abilities, fostering collaboration across engineering, risk, and business teams.
Compliance and Regulatory Knowledge:
In-depth understanding of compliance in regulated industries (e.g., financial services, healthcare).
Experience working with audit and risk management processes.
Stakeholder Engagement & Communication:
Facilitate collaboration between application, infrastructure, and business teams to drive efficiency and innovation.
Demonstrated ability to partner with line-of-business leaders, security teams, and developers to drive collaborative outcomes.
Excellent communication and influence skills to balance business, technology, and compliance needs.
QUALIFICATIONS
Required Qualifications:
The requirements listed below are representative of the knowledge, skill and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
1. Bachelor’s degree and 20 to 30 y ears related experience or equivalent combination.
2. Managed Technology or Technology Process Teams for more than 15 years or teams of 30 or more technologists.
3. Excellent knowledge of technical management and data governance.
4. Knowledge of current trends in IT hardware and systems software field.
5. Database management skills with the ability to produce reports.
6. Familiarity with the support and troubleshooting of personal computers and tablet devices.
7. Training ability and experience is a plus.
8. The position requires strong problem solving and analytical skills with the ability to work independently and exercise sound judgment
9. The ability to make commitments and be willing to be held accountable against them, organizing workloads to meet deadlines
10. Exhibit adaptability to accept or bring about change when needed
11. Strong written and verbal communication skills
12. The ability to excel in a team environment and advance overall team objectives
13. The ability to ensure customer satisfaction by delivering excellence in products and service
14. Ability to work and communicate with peers, vendors, internal staff, including software program leadership and others
15. Consistently demonstrate professional, positive, and approachable attitude, demeanor and discretion
16. Demonstrate sensitivity in handling confidential information
17. Formulate and clearly communicate ideas to others
OTHER JOB REQUIREMENTS / WORKING CONDITIONS
Sitting / Standing / Walking / Bending / Lifting
Able to sit for extended periods of time and periodically move about during the work day.
Visual / Audio / Speaking
Able to access and interpret client information received from the computer and be able to hear and speak with individuals in person and on the phone
Manual Dexterity / Keyboarding
Able to work standard office equipment, including PC keyboard and mouse, copy/fax machines, and printers.
Mental
Able to focus, interpret information logically to solve problems, and answer customers’ questions appropriately.
Availability
Able to work all hours scheduled, including overtime as directed by manager/supervisor and required by business need.
Travel
Up to 50%
Physical Conditions / Environment
Normal office environment where there is little or no discomfort due to temperature, dust, noise, or other disagreeable elements.
General Description of Available Benefits for Eligible Employees of Truist Financial Corporation: All regular teammates (not temporary or contingent workers) working 20 hours or more per week are eligible for benefits, though eligibility for specific benefits may be determined by the division of Truist offering the position. Truist offers medical, dental, vision, life insurance, disability, accidental death and dismemberment, tax-preferred savings accounts, and a 401k plan to teammates. Teammates also receive no less than 10 days of vacation (prorated based on date of hire and by full-time or part-time status) during their first year of employment, along with 10 sick days (also prorated), and paid holidays. For more details on Truist’s generous benefit plans, please visit our Benefits site. Depending on the position and division, this job may also be eligible for Truist’s defined benefit pension plan, restricted stock units, and/or a deferred compensation plan. As you advance through the hiring process, you will also learn more about the specific benefits available for any non-temporary position for which you apply, based on full-time or part-time status, position, and division of work.
Truist is an Equal Opportunity Employer that does not discriminate on the basis of race, gender, color, religion, citizenship or national origin, age, sexual orientation, gender identity, disability, veteran status, or other classification protected by law. Truist is a Drug Free Workplace.
EEO is the Law Pay Transparency Nondiscrimination Provision E-Verify
* Salary range is an estimate based on our InfoSec / Cybersecurity Salary Index 💰
Tags: Analytics APIs Automation AWS Azure Banking CI/CD Cloud Compliance FFIEC GCP GDPR Governance Incident response Kubernetes Microservices Monitoring Risk management SIEM SLOs SOC Splunk Strategy Threat detection Threat intelligence
Perks/benefits: 401(k) matching Career development Equity / stock options Health care Insurance Transparency
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.