This course will introduce life scientists to practical data science topics used in life sciences, such as data visualisation, image analysis, application of AI, and use of publicly available databases.
You will initially be introduced to data science theory and practice, including best practices for undertaking analyses, data management, and reproducibility.
The course will provide hands-on training in tools and resources appropriate to your research, including introducing the use of Python for handling and visualising data, statistical analysis, and the application of machine learning.
Group projects
This course includes group projects where you will be placed in small groups to work together on a challenge set by trainers from EMBL-EBI. This allows you to explore the data sciences methods and resources you will learn about during the course and apply them to a set problem, providing you with hands-on experience. The group work will culminate in a flash talk session involving everyone on the final day of the course.
Groups are mentored and supported by the trainers who set the initial challenge, but the groups will be responsible for driving their projects forward, with all members expected to take an active role.
There are two different group project topics, gene expression and protein structure. Both these projects will provide an opportunity for participants to apply the knowledge and skills learnt during the other sessions of the course, including data handling in Python, data visualisation, statistics and machine learning. The projects will also allow participants to gain experience of using EMBL-EBI data resources, including Expression Atlas and the Single Cell Expression Atlas for the gene expression project, and the resources PDBe and AlphaFold for the protein structures project. You will be asked during your application to select the group project topic you would most benefit from.
The projects cover mammalian data sets, however, in many cases, the methods and approaches taught are transferable to data from various species.
Who is this course for?
Applicants are expected to be at an early stage of using data science in their research with the need to develop their knowledge and skills further. The course would most suit PhD students who are ready to start analysing their own data. No particular knowledge of programming is required for this course, however participants will be asked to complete some pre-course learning. We recommend this free tutorial to start learning Python: http://swcarpentry.github.io/python-novice-gapminder/.
After the course you should be able to:
- Use Python to handle and visualise biological data
- Describe and access data using EMBL-EBI data services
- Apply statistical methods to analyse biological data
- Discuss applications of machine learning in life sciences
- Use Cytoscape to explore networks
During this course you will learn about:
- Using Python for biological data handling and visualisation
- Accessing data from EMBL-EBI data services
- Statistical analysis of life sciences data
- Uses of machine learning for analysis of life sciences data
- Network analysis using Cytoscape
Wellcome Genome Campus
Hinxton
CB10 1SA
UK