Staff Machine Learning Engineer, ML Infrastructure
Tasks
- Build feedback loops between cloud inference edge devices and data flywheel
- Define SLOs and observability standards for ML services
- Define model lifecycle best practices registry deployment monitoring rollback drift
- Design and evolve cloud inference systems for real time video and events
- Drive architecture decisions for Kubernetes based ML platform
- Identify and remove bottlenecks in ML deployment infrastructure
- Improve throughput latency and cost for production CV models
- Lead deep technical reviews for system design capacity planning reliability
- Lead incident response and postmortems for critical ML systems
- Set technical direction for ML infrastructure
- Shape LLM serving in production
Perks/Benefits
Skills/Tech-stack
AWS EKS | Amazon IAM | Amazon S3 | Autoscaling | Batching | C++ | CI/CD | Docker | Drift Detection | GPU scheduling | Go | Infrastructure as Code | KServe | KV cache | Kafka | Kubernetes | LLM serving | ML inference | MLflow | Model Monitoring | Model Registry | Multi-tenancy | NVIDIA Triton | Python | Quantization | Ray | Rust | Speculative decoding | VLLM | Weights and Biases | “as-code”
Education
N/A
Regions
Countries
States
Cities
Related jobs
-
DevSecOps Engineer - Senior USD 140K-170KAgile | Automation | CI/CD | Infrastructure as Code | ScriptingSenior-level Full TimeWashington, DC, United States6h ago
-
Senior Application Security Engineer USD 120K-187KAWS | Amazon QuickSight | JFrog Artifactory | JFrog Xray | JavaScriptHybrid workSenior-level Full TimeAlpharetta, GA, United States6h ago
-
Cybersecurity Engineer USD 140K-160KBash | CI/CD | DISA STIG | FIPS | Infrastructure as CodeMedical benefitsMid-level Full TimeColorado Springs, CO, USA6h ago
-
Information Technology Senior Consultant 5-BP-181 USD 78K-250KApache Kafka | CI/CD | Docker | Flux | Git401k company match | Bonus opportunities | Dental insurance | Diverse Inclusive Workplace Culture | Employee referral programSenior-level Full TimeHanover, MD7h ago
-
ElasticSearch Engineer USD 130K-190KAccess Control | Agile | Beats | CI/CD | ConfluenceCareer growth opportunities | Health benefits | Work-life balanceMid-level Full TimeChantilly, Va8h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI deployment | Access Management | Amazon Web Services | Azure | CI/CDFlextime | Mentorship | Office options | Personalized growth roadmaps | Remote optionsSenior-level Full TimeBoston, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | Amazon Web Services | Azure | CI/CD | CSPMFlexible schedule | Mentorship | Office options | Personalized growth roadmaps | Professional growthSenior-level Full TimeBoca Raton, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | Access Management | Amazon Web Services | Azure | CI/CDEducation budget | Fitness budget | Flexible schedule | Mentorship | Personalized growth roadmapsSenior-level Full TimeLos Angeles, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | ASPM | Amazon Web Services | Azure | CI/CDFlexible schedule | Mentorship | Office options | Professional growth | Remote work optionsSenior-level Full TimeWest Palm Beach, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | Amazon Web Services | CI/CD | Cloud Security | Cloud Security PostureCompetitive compensation | Education budget | Fitness budget | Flextime | MentorshipSenior-level Full TimeFort Lauderdale, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI deployment | AWS | Automated Runbooks | Azure | CI/CDEducation budget | Fitness budget | Flextime | Mentorship | Office optionsSenior-level Full TimeTampa, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI deployment | ASPM | Access Management | Amazon Web Services | Automated RunbooksEducation budget | Fitness budget | Flexible schedule | Mentorship | Professional growthSenior-level Full TimeJersey City, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI deployment | AWS | Access Management | Automated Runbooks | AzureEducation budget | Fitness budget | Flexible schedule | Mentorship | Office optionsSenior-level Full TimeHouston, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAccess Management | Amazon Web Services | Automated Runbooks | CI/CD | Cloud SecurityEducation budget | Exciting projects | Fitness budget | Flexible schedule | MentorshipSenior-level Full TimeMiami, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | ASPM | AWS | Access Management | AzureCompetitive compensation | Education budget | Fitness budget | Flexible schedule | MentorshipSenior-level Full TimeBaltimore, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAWS | Access Management | Automated testing | Azure | CI/CDEducation budget | Fitness budget | Flextime | Mentorship | Office optionsSenior-level Full TimeOrlando, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI deployment | Access Management | Amazon Web Services | Automation | AzureEducation budget | Fitness budget | Flexible schedule | Mentorship | Office optionsSenior-level Full TimeDallas, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | Access Management | Amazon Web Services | CI/CD | Cloud SecurityFlexible schedule | Mentorship | Office options | Professional growth | Remote optionsSenior-level Full TimeIrving, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPIs | Access Management | Amazon Web Services | Azure | CI/CDFlexible schedule | Mentorship | Remote and office options | Training and tech talksSenior-level Full TimeRichmond, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI Integration | AWS | Automated Runbooks | Azure | CI/CDEducation budget | Fitness budget | Flextime | Mentorship | Personalized growth roadmapsSenior-level Full TimeTallahassee, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAccess Management | Amazon Web Services | Azure | CI/CD | Cloud SecurityEducation budget | Fitness budget | Flexible schedule | Mentorship | Office work optionsSenior-level Full TimeAustin, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAccess Management | Amazon Web Services | Azure | CI/CD | Cloud SecurityFlexible schedule | Mentorship | Professional growth | Remote and office optionsSenior-level Full TimeAtlanta, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI deployment | ASPM | Access Management | Amazon Web Services | Application MonitoringEducation budget | Fitness budget | Flexible schedule | Mentorship | Office optionsSenior-level Full TimeJacksonville, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI | Amazon Web Services | Azure | CI/CD | CSPMEducation budget | Exciting projects | Fitness budget | Flexible schedule | MentorshipSenior-level Full TimeChicago, United States9h ago
-
DevOps / Site Reliability Engineer ID70127 USD 150K-190KAPI Integration | Access Management | Amazon Web Services | Azure | CI/CDEducation budget | Fitness budget | Flexible schedule | Mentorship | Personalized growth roadmapsSenior-level Full TimeSan Francisco, United States9h ago