Senior HPC Cluster Administrator - Deep Learning Frameworks Infrastructure
Tasks
- Automate infrastructure with IaC
- Build CI CD infrastructure pipelines
- Design and scale storage solutions
- Evaluate and introduce new networking and storage technologies
- Maintain observability stacks
- Manage and optimize job scheduling with Slurm
- Mentor engineers and define engineering standards
- Own GPU compute cluster lifecycle
- Tune cluster configuration for distributed training
Perks/Benefits
- N/A
Skills/Tech-stack
Ansible | Apptainer | Bash | Cgroup | DCGM | Docker | EFA | GitLab | Grafana | IPMI | Infiniband | Kubernetes | Linux | Lustre | NFS | NVIDIA MIG | NVIDIA NVSwitch | NVLink | NVSwitch diagnostics | NVSwtich | Prometheus | Python | RDMA | Redfish | Singularity | Slurm | Terraform | WekaFS
Education
Related jobs
-
SQL Server Database Administrator PLN 103K-146KAccess Control | Active Directory | Always On | Ansible | AutomationHybrid work model | International environment | Knowledge sharing | Learning and development | On-call rotationMid-level Full TimePoland R1d ago
-
AWS RDS | Access Control | Ansible | Auto-failover | BashEmployee referral program | Generous time off | Health benefits | Pension benefits | Volunteer daysSenior-level Full TimeWarsaw, Mazowieckie, Poland R9d ago
-
SIEM / Splunk Administrator & Platform Engineer PLN 174K-190KArtificial Intelligence | DNS | Linux | Log Ingestion | Machine LearningBirthday day off | Hybrid working | Individual benefits package | Medical care package | MentoringMid-level Full TimeWarszawa - Polna 11, Poland R13d ago
-
Automation | Backup and Recovery | High Availability | Linux | MariaDBCareer growth opportunities | Collaborative environment | Fully remote work | Participation in open source projectsSenior-level Full TimePoland R1mo ago