AI HPC Cluster Administrator (f/m/d)

72076 Tübingen
Vollzeit
14.01.2025
Vollzeit
Universitätsklinikum Tübingen

AI HPC Cluster Administrator (f/m/d)

Stellenbeschreibung

The Faculty of Medicine is one of the four founding faculties of the Eberhard Karls University of Tübingen. With its non-clinical facilities as well as its research and teaching area corresponding to the organisational units of the University Hospital, it is one of the largest medical training and research institutions in Baden-Württemberg.

The "Hertie Institute for AI in Brain Health" (Hertie AI) is looking as soon as possible for a

AI HPC Cluster Administrator (f/m/d)

The position initially will be filled on a fixed-term basis until 31.01.2028 with a strong prospect of extension.

The "Hertie Institute for AI in Brain Health" (Hertie AI) is a research institute of the Faculty of Medicine, funded by the Gemeinnützige Hertie Stiftung, with the aim of detecting diseases of the nervous system earlier and treating them better with the help of artificial intelligence. Currently, Hertie AI is in a dynamic build-up phase. Hertie AI cooperates with the strong and innovative AI ecosystem in Tübingen (e.g. Cyber Valley, Cluster of Excellence “Machine Learning in Science”, Tübingen AI Center). Hertie AI uses and benefits greatly from shared infrastructures with these initiatives, like the Machine Learning Cloud (ML Cloud), but has special compute requirements due to its goal to analyze brain data and simulate neural circuits. The ML Cloud, is a state-of-the-art compute infrastructure with powerful AI CPU and GPU compute capacities, petabyte-scale storage volumes, used by more than 400 researchers and engineers.

About the role:

We are seeking a skilled and proactive Cluster System Administrator to join our team, responsible for managing and optimizing our high-performance computing environment specifically designed for AI workloads. In this role, you will work closely with a team of HPC experts, AI researchers, and IT specialists to ensure that our systems operate at peak performance, supporting AI and ML teams with reliable, scalable computing resources.

What you'll do:

  • Cluster Management: Oversee and manage daily operations of the compute infrastructure, including configuration, deployment, and optimization of nodes and networks to maximize performance for AI workloads
  • System Monitoring and Maintenance: Monitor system performance, storage, and network utilization to ensure the clusters operate efficiently. Address hardware and software issues as they arise
  • User Support: Provide technical assistance to AI researchers, data scientists, and developers on efficient use of cluster resources.
  • Documentation and Reporting: Create and maintain comprehensive documentation on system configuration, maintenance tasks, and troubleshooting procedures. Generate regular reports on system performance, uptime, and resource usage for management

What you will bring (position requirements):

  • Education and Experience: Specialist knowledge and professional experience in information technology, applied computer science or computer engineering equivalent to the level of a Master's degree
  • Technical Skills: Proficiency in HPC cluster management tools (e.g., SLURM, PBS, or Torque), Linux system administration
  • Scripting and Automation: Strong scripting skills in Python, Bash, or other languages to automate tasks, optimize processes, and improve system reliability
  • Networking and Storage: Solid understanding of high-speed networking, parallel file systems, and large-scale storage solutions (e.g., Lustre, Ceph)
  • Problem-Solving: Excellent troubleshooting abilities and a proactive approach to resolving system issues before they impact users. Interest in artificial intelligence and motivation to collaborate with scientists and professionals in the field of AI research
  • English proficiency

Relevant experience in some of the following technologies:

  • Experience with automation tools for configuration management (e.g. Ansible, Puppet, Chef) and revision control systems (e.g. Git)
  • Experience with containers (Docker/ Singularity/Podman / Kubernetes)

What we offer:

  • Collaboration in the multifaceted environment of a modern university hospital, which in addition to patient care, also focuses on medical research and teaching
  • Future-proof workplace and location as well as attractive remuneration including a company pension scheme (VBL) and at the same time the most flexible working hours possible
  • Subsidization of the job ticket for public transport and attractive discounts on employee offer platforms
  • Structured onboarding phase, clinic's own academy to develop professional, social and methodological skills
  • Preventive health care through a wide range of sports activities

Contact:

We offer remuneration in accordance with TV-L (collective wage agreement for the Public Service of the German Federal States). In line with its internationalization agenda, the University of Tübingen welcomes applications from outside Germany. The University of Tübingen is committed to equal opportunity, diversity and inclusion and wishes to enhance the share of women and under-represented categories employed in research. Applications from equally qualified candidates with disabilities will be given preference. Women are expressly encouraged to apply. In principle, the position can be shared. Employment is based on the relevant provisions of university law. Please observe the applicable vaccination regulations. Presentation costs can unfortunately not be covered.

To apply, please send a cover letter and your CV in English and all relevant certificates in your application as a single PDF file by 01.01.2025. For more information or questions about technical aspects of the position, please contact Dr. Kristina Kapanova at kristina.kapanova@uni-tuebingen.de.

If you have any questions, please contact:
Dr. Kristina Kapanova

hertieai@medizin.uni-tuebingen.de

Closing date for applications:
02.02.2025

We are looking forward to your application to Dr. Kristina Kapanova
including CV and cover letter under specification of the index number 5579.

Apply now


Apply with WhatsApp

Share job

Weitere offene Stellen