Site Reliability Architect
A USD 155K-190K (estimate) Senior-level Full Time
Tasks
- Define and manage SLIs and SLOs
- Design and implement unified observability dashboards
- Enable GenAI for incident summarization and runbook recommendations
- Implement error budgets
- Implement static and dynamic alerting
- Integrate OpenTelemetry telemetry pipelines
- Leverage AI ML for anomaly detection and incident prediction
- Monitor microservices and downstream APIs
- Operate Dynatrace for metrics traces and logs
- Perform root cause analysis
- Propose auto remediation suggestions
- Reduce alert noise using alert correlation
- Troubleshoot distributed system dependency issues
- Use ELK or EFK for log analytics
- Use Prometheus and Grafana for monitoring
Perks/Benefits
- N/A
Skills/Tech-stack
AIOps | AWS | Alerting | Anomaly Detection | Azure | Baseline Modeling | Cause analysis | Data platforms | Dependency Mapping | Distributed Systems | Dynamic Thresholds | EFK | ELK | Error budget | GenAI | Google Cloud | Grafana | Infrastructure as Code | JSON | Kafka | Language Models | Large Language Models | Log Analytics | Machine Learning | Microservices | OpenTelemetry | Prometheus | Reliability Engineering | Root Cause Analysis | Root cause | Runbook Automation | SLI | SLO | Seasonality Detection | Series analysis | Site Reliability | Site Reliability Engineering | Streaming Data | Streaming Data Platforms | Telemetry enrichment | Terraform | Time Series | Time Series Analysis | Trace Correlation | Unified Observability | “as-code”
Education
N/A
Related jobs
-
Security Engineer, Access Security Team USD 123K-174KAccess Control | Access Management | Attack Path | Attack path modeling | Automated remediationEntry-level Full TimeNew York, NY, USA4h ago
-
API Design | CI/CD | Component design | Distributed Systems | Frontend architectureDental insurance | Financial benefits | Medical insurance | Vision insuranceSenior-level Full TimeSanta Clara, California, United States R11h ago
-
Senior Frontend Software Engineer – React, TypeScript, GraphQL (Machine Identity Management) - hybrid USD 150K-194KAPI Design | Debugging | Distributed Systems | GraphQL | MicroservicesDental insurance | Hybrid work | Medical insurance | On-call rotation | Vision insuranceSenior-level Full TimeSanta Clara, California, United States R11h ago
-
Senior Frontend Software Engineer – React, TypeScript, GraphQL (Machine Identity Management) - hybrid USD 150K-194KAPI Design | Distributed Systems | GraphQL | Microservices | Performance optimizationSenior-level Full TimeSanta Clara, California, United States R11h ago
-
API Integration | AWS | Ansible | Azure | CI/CDSenior-level Full TimeSanta Clara, California, United States11h ago
-
AWS | Ansible | Azure | CI/CD | CertificatesSenior-level Full TimeSanta Clara, California, United States12h ago
-
Ansible | Autoscaling | CI/CD | Cloud Native | ContainerizationSenior-level Full TimeSanta Clara, California, United States R12h ago
-
Senior Machine Learning Engineer - Cybersecurity USD 80K-200KAnomaly Detection | Behavioral analytics | Cyber Threat | Cyber Threat Detection | CybersecuritySenior-level Full TimeSan Jose, CA, United States17h ago
-
Software Engineer SME (TS/SCI with Poly Required) USD 187K-318KAWS | Apache NiFi | CI/CD | DevOps | ETLSenior-level Full TimeDulles, Virginia, United States18h ago
-
Software Engineer SME (TS/SCI with Poly Required) USD 187K-318KAPI | API Key | Amazon Kinesis | Amazon Web Services | Apache AirflowMid-level Full TimeChantilly, Virginia, United States18h ago
-
Software Engineer SME (TS/SCI with Poly Required) USD 187K-318KAWS | Anaconda | Apache NiFi | Azure | Azure MicroservicesMid-level Full TimeMcLean, Virginia, United States18h ago
-
Mid-level Full TimeOmaha, NE18h ago
-
Staff Production Engineer (Cloud Platform & Reliability – Machine Identity Security) - hybrid USD 181K-226KAKS | AWS | Ansible | ArgoCD | AzureSenior-level Full TimeSanta Clara, California, United States R19h ago
-
API Design | AWS | Azure | CI/CD | Certificate managementSenior-level Full TimeSanta Clara, California, United States19h ago
-
Mid-level Full TimeFrisco, TX, United States R19h ago
-
Senior AI Security & Automation Engineer USD 110K-120KAccess Management | Azure Logic | Azure Logic Apps | Azure OpenAI | Bash401k matching | Dental insurance | Flexible working arrangements | Health insurance | Paid time offSenior-level Full TimeNew York, NY, United States20h ago
-
APIs | Distributed Systems | Go | Integration Testing | JavaDental insurance | Financial benefits | Health insurance | Mentorship | Vision insuranceMid-level Full TimeSanta Clara, California, United States20h ago
-
DevSecOps and Cloud Integration Engineer, Senior USD 170K-170K.NET | AWS CodePipeline | AWS Lambda | Agile | Amazon ECS401k match | Dental coverage | Employee assistance program | Flexible spending account | Health insuranceSenior-level Full TimeUT, US20h ago
-
Senior-level Full TimePortland, Oregon, United States21h ago
-
ACME | API | Concurrency | Data Modeling | Distributed SystemsDental insurance | Equity benefits | Medical insurance | Vision insuranceSenior-level Full TimeSanta Clara, California, United States23h ago
-
Senior-level Full TimeNeedham, Massachusetts, United States23h ago
-
IAM Software Engineer USD 100K-130KAdaptiveAuthentication | AuthenticationPolicies | Bash | CI/CD | JSONMid-level Full TimeChicago, Illinois23h ago
-
IAM Software Engineer USD 110K-135KAccess Management | Adaptive Authentication | Bash | CI/CD | Identity and Access ManagementMid-level Full TimeAtlanta, Georgia23h ago
-
Senior Data Scientist/Engineer USD 130K-168KAWS GovCloud | Azure Government | Data Cleansing | Docker | ETLOn-site parkingSenior-level Full TimeChantilly, VA23h ago
-
Cyber Security Engineer USD 117K-155KAccess Control | Access Management | Cause analysis | CyberArk | DLPHybrid workMid-level Full TimeAtlanta, Georgia, United States1d ago