Sr. ML Platform Engineer (Hybrid)
Tasks
- Build observability solutions
- Conduct post-mortems
- Configure alerting workflows
- Debug memory leaks
- Debug resource contention
- Debug scheduling conflicts
- Develop runbooks
- Diagnose distributed systems issues
- Implement automated health checks
- Improve HPC cluster utilization
- Maintain platform reliability metrics
- Mentor engineers on debugging techniques
- Optimize GPU allocation
- Optimize Ray clusters
- Optimize SLURM job scheduling
- Optimize Spark jobs
- Optimize resource allocation
- Perform root cause analysis
- Profile performance bottlenecks
- Resolve production incidents for inference pipelines
- Resolve production incidents for training pipelines
- Troubleshoot JupyterHub spawner issues
- Troubleshoot kernel crashes
Perks/Benefits
- Employee networks
- On-call support
- Paid adoption leave
- Paid parental leave
- Professional development
- Vacation and holidays
- Volunteer opportunities
- Wellness programs
Skills/Tech-stack
AWS | Airflow | Apache Spark | CUDA | Capacity Planning | Chaos Engineering | Debugging | Distributed tracing | Docker | Google Cloud | Grafana | JupyterHub | Kubeflow | Kubernetes | Linux | Log Aggregation | MLflow | Microsoft Azure | OCI | Observability | Performance Tuning | Profiling | Prometheus | Python | Ray | Slurm | Unix
Education
N/A
Related jobs
-
Cloud Engineering Specialist INR 1837K-3500KAPI Gateway | AWS Organizations | Alerting | Amazon Web Services | Auto ScalingSenior-level Full TimeBengaluru, IN, 56010318h ago
-
Sr. Staff Software Development Engineer - C/C++/Go INR 3000K-4000KAWS | Access Control List | Azure | C# | C++Education reimbursement | Health plans | Hybrid work model | Parental leave options | Retirement optionsSenior-level Full TimeBangalore, IND21h ago
-
Software Development in Test Engineer INR 2156K-2156KDocker | Integration Testing | Pytest | Python | Software validationBonus program | Employee assistance program | Employee stock plan | Flexible Paid Sick Days | Flexible time offSenior-level Full TimeBengaluru, Karnataka, India22h ago
-
Mid-level Full TimeGurgaon, Haryana, India22h ago
-
Firewalls And Load Balancer-Palo Alto INR 2500K-3600KAWS Security | Ansible | Automation | Azure Security | Blue CoatSenior-level Full TimePune, Maharashtra, India22h ago
-
Cybersecurity Engineer INR 850K-1250KFirewall | Hardening | IDS | IPS | IPSecCareer growth | Flexible working environment | Leadership track | Learning and development | Social coverageMid-level Full TimeBangalore, KA, IN1d ago
-
Offensive Security Engineer (Red Team) INR 1200K-1680KAPI Testing | AWS | Active Directory | Application Testing | BashAsynchronous work culture | Flexible autonomy | Remote workMid-level Full TimeRemote - REMOTE, India, India R1d ago
-
AI Research Engineer - Applied AI INR 2000K-3000KAPI Design | AWS SageMaker | Anomaly Detection | Azure Machine Learning | Bias auditingAsynchronous culture | Distributed team | Remote workMid-level Full TimeRemote - REMOTE, India, India R1d ago
-
Lead Information Security Engineer - Python Full Stack Developer INR 1567K-4000KAWS EKS | Access Management | Apache Kafka | Application Security | AsyncioSenior-level Full Time111443-IND-HYDERABAD-INTL HYD WF CENTRE BLK B8 …1d ago
-
Apache Tomcat | Automation | Bash | CI/CD | CachingRotational shifts | Weekend supportSenior-level Full Time110380-IND-BENGALURU-INTL BLR Twr-1&2 CARNATION, India1d ago
-
ARM Templates | AWS CloudFormation | Amazon Web Services | Ansible | Application SecurityFlexibility programmes | Inclusive benefits | Mentorship | Wellbeing supportMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM | Amazon Web Services | Ansible | Application Security | CASBMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM Templates | Amazon Web Services | Ansible | Application Security | CASBMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM | Amazon Web Services | Ansible | Application Security | CASBFlexibility programmes | Mentorship | Wellbeing supportMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM | Amazon Web Services | Ansible | Application Security | CASBFlexibility programs | Inclusive benefits | Mentorship | Wellbeing supportMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM | Amazon Web Services | Ansible | Application Security | CASBFlexibility programs | Inclusive benefits | Mentorship | Work-life balanceMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM Templates | Amazon Web Services | Ansible | Application Security | CASBMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM | Amazon Web Services | Ansible | Application Security | CASBFlexibility programs | Mentorship | Mentorship programs | Wellbeing supportMid-level Full TimeBengaluru Millenia, India1d ago
-
ARM | Amazon Web Services | Ansible | Application Security | CASBFlexibility programmes | Inclusive benefits | Mentorship | Wellbeing supportMid-level Full TimeBengaluru Millenia, India1d ago
-
Mid-level Full TimeBangalore, India1d ago
-
Senior-level Full TimeIND Pune, India1d ago
-
Senior-level Full TimeRIB India, Pune1d ago
-
Senior Product Security Engineer INR 3700K-4600KAI Agents | API Security | AWS | Authentication | AuthorizationCareer development | ESG initiatives | Headspace access | Hybrid work model | Mental health daysSenior-level Full TimeIndia, Bengaluru, Karnataka R1d ago
-
Site Reliability Engineer INR 2000K-2156KAWS | Ansible | Bash | CI/CD | ChefGuided career tracks | Healthcare benefits | Hybrid work setting | Mentorship | Online learning platformSenior-level Full TimeIND-Pune-Equifax Analytics-PEC, India1d ago
-
Senior Software Engineer INR 2800K-4200KAWS | Azure | CI/CD | Collections | ConcurrencyContinuous learning | Cross-functional collaboration | MentoringSenior-level Full TimeIND Pune, India1d ago