The Archival Infrastructure and Technology team is looking for an enthusiastic, highly motivated bioinformatician to join the FAIRPlus project in the BioSamples database team.
The project
The BioSamples database ( at EMBL-EBI is a resource that integrates biological samples from a wide variety of sources to provide a single location to apply standards and ontologies to sample data. It has grown from 14,000 samples in 2010 to over 6 million samples in 2019.
The FAIRPlus project ( aims to develop tools and guidelines for making life science data FAIR (Findable, Accessible, Interoperable, Reusable). It is a European collaboration between industrials and academic partners. It aims to increase the discovery, accessibility and reusability of data from selected projects funded by the EU’s Innovative Medicine Initiative, and internal data from pharmaceutical industry partners. The project started at the end of January 2019, so this is a timely opportunity to join and help shape it.
The Biosamples team is a multidisciplinary team of bioinformaticians and programmers with experience in ontologies, databases, backend java technologies and user interfaces.
Your role
The position is a fantastic opportunity to gain hands-on experience at working on a project with major scientific impact in a world-leading bioinformatics institute.

You will be responsible for establishing and assessing processes for storing of the FAIRplus data, leveraging and improving on existing resources. Those include other EMBL-EBI archives as well as our semantic as a service suite,

This might involve developing the necessary access tools to establish standardised data workflows to FAIRify incoming datasets, liaising with users to gather requirements and translating these into technical specifications and test plans, and developing new applications to complement and replace the existing BioSamples infrastructure. This might also involve diving deeper in the industrial partners data, or building system to broker it through public databases. Your duties will include teleconference calls and on-site events such as Bring Your Own Data workshops. You will present progress and demo pipelines to all projects members during those events, soliciting feedback and leveraging it to design the next processes to be applied to datasets.

We use a variety of frameworks and technologies, including Solr, MongoDB, Docker and Spring - we value matching the right solution to the right problem and you will have the opportunity to improve and contribute to further development of the architecture. We follow agile techniques in our approach to development, so if you’re the sort of person who likes to work in sprints, has worked with tools like Jira in the past to prioritise user requirements, or have ever tried pair-programming or code reviews, then you’ll be a good fit for our team. All of our software is built and published using continuous integration and version control, so you should at least be familiar with GitHub, and you should be confident in making your code public for others to install and run.
We are looking for a bioinformatician, or a software developer with experience in handling biological data and biological requirements. You should have a biological background but you should also be able to find your way around a terminal. Programming experience in at least one programming language (such as Python, Perl or Java) and experience with managing data or data processing pipelines would be desirable. Previous experience of working in a bioinformatics-focused environment would be beneficial. Data FAIRification involves ongoing work with semantic annotation, so if you’ve used ontologies in the past, or are just keen to know more, this would be ideal.

You’ll be working within the AIT team at EBI alongside developers, bioinformaticians and ontologists. As part of your day to day job, you can expect to interact with other groups at EMBL-EBI as well as our external collaborators in order to improve submission of and access to data in the BioSamples database.
You have

Experience handling biological data and requirements
Experience of at least one programming language
Practical experience of data management
Knowledge of common data formats such as XML, JSON and JSON schema
Good communication skills, ability to work as part of a team of people with a range of skills and a diversity of backgrounds.

You might also have

Several years experience with other programming languages, such as Python or Java

Familiarity with Semantic technologies such as ontologies
Experience of working in an agile development environment
Familiarity with Spring Boot, Solr, MongoDB
Interest in Cloud technologies

We value people who demonstrate that they’re eager to learn about specific aspects to support our users’ requirements.

Application Instructions: 

To apply please submit a covering letter and CV through our online system.

Application Closing Date: 
26 April