Senior Site Reliability Engineer
San Antonio, TX
Full Time Senior-level / Expert Clearance required USD 125K - 233K * est.
BridgePhase
BridgePhase is a software engineering company focused on designing, building, securing, and operating cutting-edge software solutions that drive mission success and operational excellence for Federal Government organizations. We are dedicated to supporting the Air Force’s technological edge by delivering innovative software engineering services that directly support cyber operations, threat defense, and mission assurance. Our goal is to be a trusted mission partner in enabling cyber readiness and resilience across the Air Force and U.S. Cyber Command.
We’re looking for a skilled Site Reliability Engineer (SRE) to support the mission of the U.S. Air Force’s Unified Platform Software Factory in San Antonio. In this role, you’ll help ensure the stability, scalability, and security of mission-critical cyber systems. As part of a collaborative, agile team, you’ll be responsible for building observability, managing performance, automating operations, and enhancing the resilience of cloud-native platforms that support cyber operations across the Department of Defense.
Ideal candidates are comfortable bridging the gap between software development and infrastructure operations, and bring a mindset of continuous improvement, automation, and secure-by-design architecture to everything they do. You will play a key role in maintaining platform reliability under demanding mission conditions.
We are hiring for both fully remote positions and hybrid roles based in San Antonio, TX. Candidates local to San Antonio should expect a mix of onsite and remote work as part of a hybrid schedule.
In this position, you can expect to:
- Build and maintain scalable, resilient infrastructure using Infrastructure as Code (IaC) and Configuration as Code (CaC) tools such as Terraform, Ansible, and Helm.
- Design, implement, and maintain robust observability solutions—logging, metrics, and tracing—to support 24/7 mission awareness.
- Automate platform operations, including system provisioning, patching, and recovery, to reduce manual effort and increase uptime.
- Monitor system performance and lead root cause analysis and incident response for infrastructure-related issues.
- Collaborate with development and cybersecurity teams to ensure deployments are secure, compliant, and aligned with COSC and DoD requirements.
- Apply system hardening techniques and continuously evaluate system health, threat posture, and availability.
- Manage containerized workloads using Kubernetes and Docker; optimize Kubernetes clusters for security and performance.
- Support integration of cyber defense tools and event monitoring systems aligned with COSC mission operations.
- Assist in load testing, chaos engineering, and fault injection to validate platform resiliency.
- Maintain compliance with DoD cybersecurity controls including DISA STIGs, NIST SP 800-53, and the DoD DevSecOps Reference Design.
As with any technical environment, the exact role responsibilities will evolve with the changing needs of our client. We are looking for versatile engineers who thrive on new challenges and can readily adapt to additional responsibilities beyond those listed above.
Preferred Experience and Qualifications:
- Hands-on experience in Site Reliability Engineering, Cloud Infrastructure, DevSecOps, or System Administration within secure, mission-critical environments.
- Strong expertise in observability tools such as Prometheus, Grafana, ELK Stack, Fluent Bit, or OpenTelemetry.
- Deep knowledge of containerization (Docker) and orchestration (Kubernetes), including optimization and troubleshooting.
- Proficiency in AWS services and cloud-native architecture for high-availability systems.
- Experience developing Infrastructure as Code (IaC) and Configuration as Code (CaC) with tools like Terraform, Ansible, and Helm.
- Familiarity with secure CI/CD pipelines and automation practices in compliance with DoD standards.
- Strong scripting skills in Bash or Python for automation and monitoring.
- Experience implementing DoD cybersecurity controls and supporting cyber operations or SOC environments.
- Exceptional communication skills and desire to work in a ‘badgeless’ collaborative team environment consisting of Government clients, other Contractors, and Stakeholders.
- Strong desire to work in a mission-driven team and continuously build new skills.
- Active secret clearance is required, with TS/SCI preferred.
- B.S. in Engineering, Computer Science, or technical degree or industry experience equivalent.
- This is a hybrid position – with both weekly on-site support in San Antonio, TX and remote work supported.
While we've outlined our ideal candidate, we recognize that talent comes in many forms. If you don't check every box but possess a strong technical aptitude, a passion for building resilient systems, and a drive to learn and grow, we strongly encourage you to apply. We value engineers who demonstrate curiosity, adaptability, and a solid foundation in operational engineering principles. If you're excited about supporting critical Air Force cyber missions and are ready to help scale secure, high-availability systems, we want to hear from you.
About Our Company:
At BridgePhase, our values shape our culture and guide our actions. We act with integrity, honesty, and respect, earning trust and fostering collective success. We are critical thinkers and problem solvers, driving innovation and positive disruption to solve hard challenges at speed and scale. Our work is characterized by courage, compassion, commitment, and teamwork. We apply disciplined engineering principles and a proven agile approach that deliver flexible, simplified, durable and performant solutions that drive continuous improvement and have lasting impact and sustained value. Additionally, we invest in our communities through strategic charitable initiatives, empowering our employees to make meaningful contributions to causes they are passionate about.
Our Benefits:
We pride ourselves on providing top-tier benefits that rival those found in larger organizations. Below are some of the perks our team enjoys:
- Competitive compensation based on experience
- Flexible PTO plan
- Paid Sick Leave
- 100% Paid Paternal Leave (16 weeks Maternity, 6 weeks Paternity)
- 401k plan with 6% employer matching (zero vesting period)
- Excellent health, dental, and vision benefits
- Professional development budget that can be used for certifications and training
- Paid community service days
* Salary range is an estimate based on our InfoSec / Cybersecurity Salary Index 💰
Tags: Agile Ansible Automation AWS Bash CI/CD Clearance Cloud Compliance Computer Science Cyber defense DevSecOps DISA Docker DoD ELK Grafana Helm Incident response Kubernetes Monitoring NIST NIST 800-53 Prometheus Python Scripting SOC STIGs Terraform TS/SCI
Perks/benefits: 401(k) matching Career development Competitive pay Flex hours Flex vacation Health care Parental leave
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.