Staff Software Engineer, ML Infrastructure
Tasks
- Build feedback loops between cloud inference edge devices and data pipelines
- Define SLOs and observability standards for ML services
- Design and evolve real time cloud inference systems
- Drive architecture decisions for Kubernetes based ML platform
- Establish best practices for model lifecycle management and on call
- Identify and remove bottlenecks in ML deployment infrastructure
- Improve throughput latency and cost for production CV models
- Lead incident response and postmortems for ML systems
- Lead technical reviews for system design and capacity planning
- Mentor engineers through design reviews code reviews pairing and guidance
- Set technical direction for ML infrastructure
- Shape production LLM serving patterns and evaluation pipelines
- Write documentation runbooks and architecture decision records
Perks/Benefits
- Employee resource groups
- Free home security system
- Hybrid work model
- Inclusive safe work environment
- Professional home monitoring
- Wellness support
Skills/Tech-stack
AWS EKS | AWS IAM | Amazon S3 | Amazon Web Services | Autoscaling | Batching | C plus plus | CI/CD | Caching | Capacity Planning | Distributed Systems | GPU scheduling | GPU-based inference | Go | High Throughput | High-Throughput Systems | Incident Response | Infrastructure as Code | KServe | KV cache | Kafka | Kubernetes | LLM serving | Load Balancing | Low Latency | Low-Latency Systems | Multi-tenancy | NVIDIA Triton | Observability | Postmortems | Python | Queuing | Ray | Rust | SLOs | Terraform | VLLM | Web Services | “as-code”
Education
N/A
Roles
Regions
Countries
States
Cities
Related jobs
-
Physical Security Systems Engineer USD 106K-155KAccess Control | Biometrics | Database Connectivity | Distributed Systems | Management System401k matching | Dental insurance | Disability coverage | Medical insurance | Paid HolidaysMid-level Full TimeNew York, NY, US3h ago
-
Software Engineer - Automation USD 108K-147KAnsible | Automated testing | Behavior-Driven Development | C# | C++Career growth | Competitive benefits | Flexible work environment | Hybrid work scheduleMid-level Full TimeReston,Virginia,United States7h ago
-
Senior Staff Engineer, Systems (R5012) USD 123K-197KCameo Systems Modeler | Cause analysis | Cross domain | Cross domain integration | Cyber ResilienceSenior-level Full TimeSan Diego, California9h ago
-
AI Software Engineer USD 181K-270KAWS | CI/CD | Docker | Edge Functions | GitHub CopilotComprehensive benefits | Equity | Learning stipend | Remote-first cultureSenior-level Full TimeUnited States or Canada R9h ago
-
Security Engineer USD 150K-260KApplication Security | Cloud Security | Data Security | Endpoint Security | GoSenior-level Full TimeBay Area10h ago
-
Senior-level Full TimeUnited States11h ago
-
Cyber Security Engineer USD 80K-135K365 Security | 800-171 | AI-assisted tooling | Active Directory | Awareness platformsSenior-level Full TimeTorrance, California, United States11h ago
-
Application Security Engineer II USD 130K-187KAPI Design | Authentication | Authorization | CI/CD | Cloud NativeSenior-level Full TimeRemote - USA R11h ago
-
Senior IT Infrastructure Engineer USD 100K-160K800-171 | Alerting | Amazon Web Services | Business Continuity | Disaster RecoverySenior-level Full TimeTorrance, California, United States11h ago
-
Senior Security Engineer, Application Security USD 146K-220KApplication Security | Automated security | Automated security checks | Design reviews | Development Lifecycle401k matching | Catered meals | Company events | Fertility benefits | Learning benefits spending accountSenior-level Full TimeBellevue, WA; Menlo Park, CA12h ago
-
Systems Engineer, Cloud Security USD 130K-145KActive Directory | Automation | Azure Privileged Identity Management | Cloud Security | Conditional Access401k match | Dental insurance | Employee community groups | Fitness reimbursements | Health insuranceMid-level Full TimeLos Angeles, CA, United States R12h ago
-
AI Security | AWS | Agentic Workflows | Application Security | Application Testing401k match | Counseling membership | Dental insurance | Flexible time off | Health insuranceSenior-level Full Time-REMOTE, USA- R13h ago
-
Information Security Engineer USD 100K-150K800-171 | AWS | Active Directory | Azure | Disaster RecoveryEmployee ownership | Professional growth opportunities | Remote eligible | Workplace flexibilityMid-level Full TimeEnglewood, CO; Greater Boston, MA; Los … R13h ago
-
Cybersecurity Systems Engineer USD 160K-175KArchitectural risk analysis | Auditing | CI/CD | Cameo Enterprise Architecture | Cloud ArchitectureHybrid OnsiteSenior-level Full TimeHerndon, VA or Colorado Springs, CO14h ago
-
Mid-Level DevSecOps SME / Cloud Security Engineer (ISSE) USD 130K-140KAWS | Alloy | Ansible | Ansible Playbooks | ArgoCDHybrid workMid-level Full TimeHerndon, VA or Colorado Springs, CO14h ago
-
Senior-level Full TimeArlington, VA14h ago
-
Staff Machine Learning Engineer, ML Infrastructure USD 183K-269KAWS EKS | Amazon IAM | Amazon S3 | Autoscaling | BatchingEmployee resource groups | Free home security system | Free professional monitoring | Hybrid work modelSenior-level Full TimeBoston, MA15h ago
-
Senior Detection & Response Engineer USD 166K-220KAWS | Application Security | Attacker TTPs | Automation | AzureSenior-level Full TimeCosta Mesa, California, United States16h ago
-
Principal Site Reliability Engineer USD 139K-304KAutomation | Blameless postmortems | Call Management | Chaos Engineering | Cloud infrastructureSenior-level Full TimeUSA-Remote R16h ago
-
Cryptography Research Scientist, FHE & PQC USD 169K-186KAlgorithm Optimization | BFV | BGV | C++ | CKKS401k matching | Company holidays | Fitness Goal Incentives | Health Dental Vision Prescription Premiums | Life and disability insuranceNone Full TimeSanta Clara, CA16h ago
-
Mid-level Full TimeLehi, Utah16h ago
-
Forward Deployed Engineer (West) USD 220K-250KAI Prototyping | API Integration | AWS | Automation | Cloud NetworkingMid-level Full TimePacific or Mountain Time Zone (Remote) R17h ago
-
Forward Deployed Security Engineer USD 293K-385KAccess Control | Adversary Tactics Techniques and Procedures | Adversary tactics | Amazon Web Services | AuthenticationHybrid work | On-site customer engagement | Relocation assistance | Travel to customer sites | US security clearance requiredMid-level Full TimeWashington, DC17h ago
-
Product Security Engineer USD 116K-154KAWS | Code review | DAST | Data leakage | Dependency Scanning401k | Commuter account | Dental insurance | Disability insurance | Emergency weather supportEntry-level Full TimeRemote - US R17h ago
-
Full Stack Engineer - Senior USD 118K-166KAccess Control | Agile | Authentication | Automated testing | CI/CDSenior-level Full TimeFAIRFAX, VA, United States18h ago