Senior Site Reliability Engineer (SRE) / Team Lead
San Antonio, TX
Full Time Senior-level / Expert Clearance required USD 155K - 190K
Dark Wolf Solutions
The Alpha of technology Dark Wolf Solutions operates at the nexus of mission and technology to meet our Nation’s most challenging missions. JOIN THE PACK Connect Our Background About Us We combine the most innovative emerging technologies with...Dark Wolf Solutions is seeking a Senior Site Reliability Engineer (SRE) / Team Lead to support the Unified Platform Cyber Operations & Security Center (COSC) in San Antonio, TX. The Senior SRE / Team Lead will be responsible for leading a multi-disciplinary team focused on platform reliability, operational resilience, performance optimization, and cloud infrastructure automation across multi-tenant, classified, and hybrid mission environments.This role blends deep technical expertise in cloud-native operations, Infrastructure as Code (IaC), observability, and automation with leadership responsibilities for mentoring, guiding, and growing a team of SREs.
Key Responsibilities
- Lead the Site Reliability Engineering team supporting platform monitoring, incident response automation, service resilience, and performance optimization for COSC environments.
- Architect and oversee deployment of observability solutions including logging, monitoring, alerting, telemetry ingestion, and performance dashboards.
- Design and maintain Infrastructure as Code (IaC) pipelines to automate provisioning, scaling, and configuration of critical platform components.
- Implement and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to drive operational excellence.
- Develop and optimize incident response workflows and playbooks; lead root cause analysis (RCA) and post-incident reviews (PIRs).
- Manage cloud-native infrastructure in AWS GovCloud, Azure Government, and/or classified cloud environments.
- Collaborate with Platform Engineers, Cloud Security Engineers, and DevSecOps teams to continuously improve system availability and performance.
- Integrate monitoring and telemetry into the COSC SIEM and observability frameworks.
- Support compliance efforts by ensuring observability and operational artifacts align with RMF, STIGs, and NIST standards.
- Mentor junior SREs and cloud engineers, providing technical leadership and professional development support.
Basic Qualifications
- Bachelor’s degree in Computer Science, Cybersecurity, Engineering, Information Technology, or a related technical field, or equivalent industry experience.
- Minimum of 8–10 years of experience in system engineering, cloud engineering, or platform operations.
- Minimum of 3 years experience in a leadership or technical team lead role.
- Strong experience operating cloud-native environments with AWS, Azure, and/or Kubernetes orchestration.
- Deep understanding of Infrastructure as Code (IaC) tooling (e.g., Terraform, CloudFormation, Ansible) and GitOps practices (e.g., ArgoCD).
- Expertise in monitoring and observability frameworks (Elastic Stack, Prometheus, Grafana, Fluentd, Loki, or equivalent).
- Strong knowledge of SRE principles, service reliability engineering, and resilience patterns.
- Experience with security compliance alignment including NIST 800-53, DoD STIGs, and Continuous ATO practices.
- Experience leading incident management activities and conducting root cause analysis (RCA).
- US Citizenship required with an active Secret clearance and eligibility for Top Secret/SCI.
Desired Qualifications
- Certifications such as AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), or Certified DevOps Engineer.
- Familiarity with performance testing, chaos engineering, and fault injection frameworks.
- Hands-on experience building or maintaining self-service internal developer platforms (IDPs).
- Experience supporting DoD or Intelligence Community environments.
- Knowledge of Zero Trust Architecture (ZTA) implementation in cloud-native environments.
The estimated salary range is $155,000.00 - $190,000.00, commensurate on experience, technical expertise, certifications, and clearance level.
Primary work location is San Antonio, TX. Hybrid model with a mix of remote and on-site support; on-site presence required for classified system activities.
We are proud to be an EEO/AA employer Minorities/Women/Veterans/Disabled and other protected categories.
In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification form upon hire.
Tags: Ansible Automation AWS Azure Clearance Cloud Compliance Computer Science DevOps DevSecOps DoD Grafana Incident response Kubernetes Loki Monitoring NIST NIST 800-53 Prometheus RMF SIEM SLOs STIGs Terraform Top Secret TS/SCI Zero Trust
Perks/benefits: Team events
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.