Workshop
Introduction to reproducible data analysis in R
This two-day workshop provides a hands-on introduction to reproducible data analysis using R and RStudio, covering essential scientific computing concepts such as file systems, working directories, data organisation, and data visualisation. Through live coding exercises, participants will learn to import, analyse, and present data in R then apply their skills to biological datasets.
Background
An increase in the complexity and scale of biological data means biologists are increasingly expected to develop the data skills needed to design reproducible workflows for simulating, collecting, analysing and presenting data. Coding is at the heart of reproducibility because it explicitly describes everything you do with your raw data making your work completely transparent and reproducible.
The good news is that generative AI tools like ChatGPT and GitHub Copilot have transformed what is possible for non-programmers. Writing code is no longer the barrier it once was - if you can describe what you need you can get working code back in seconds. But working code is not enough - code can run and not be doing what you think it is. That ability to precisely describe what you want and knowing how to question and validate what AI produces comes from learning to code.
Research consistently shows that AI coding assistants work best for people who already understand the code being generated. Without that foundation, it’s hard to know when the output is wrong, incomplete, or quietly doing something other than what you intended. A little coding knowledge helps you write better prompts and know when to trust the result.
That’s where Introduction to reproducible data analysis in R comes in!
R is a free and open source language especially well-suited to data analysis and visualisation. It has a reputation for catering to users who do not see themselves as programmers, but then allowing them to slide gradually into programming.
About the Workshop
This two-day workshop will introduce you to R and RStudio, the most widely used interface for working with R.
- It will start with what they forgot to teach you about computers covering file systems, paths and working directories. These are threshold concepts in scientific computing which, if not known, block your ability to make progress.
- The workshop uses a live-coding format, allowing you to code alongside the instructor as you learn to navigate RStudio and perform core tasks such as creating, importing, summarising and plotting data and saving outputs
- We will cover how the type of variables we have matter in how we analyse and visualise them and how to organise data in spreadsheets.
- In the final part of the workshop you will be able to work with one (or more!) canonical biology examples: qPCR analysis, RNA sequence analysis, flow cytometry analysis or ImageJ files.
Programme
The workshop takes place over two days and combines teaching with live coding.
Tuesday 28th July
| Time | Session | Description |
|---|---|---|
| 10:00 - 12:30 | What they forgot to teach you about computers | File system organisation, file types, working directories and paths. |
| Introduction to R, RStudio and project organisation. | You will learn about data types such as “numerics” and “characters” and object types such as “vectors” and “dataframes” and create your first graph! These are the building blocks for the rest of your R journey. You will also learn about the layout of RStudio and a workflow using scripts and RStudio Projects to keep your work organised. | |
| 12:30 - 13:30 | Lunch | |
| 13:30 - 16:00 | Types of variable, summarising and plotting distributions | Revise the difference between continuous and discrete values and how we summarise and visualise them. Importing data from text files and excel files and developing your understanding of working directories and paths. |
Wednesday 30 July
| Time | Session | Description |
|---|---|---|
| 10:00–12:30 | Summarising data with several variables | Building on the previous day’s work exploring single variables, you will learn how to summarise and visualise datasets containing multiple variables. You will identify response and explanatory variables, explore the principles of “tidy” data, carry out a simple data tidying exercise, and learn how to save figures for publication and reporting. |
| 12:30–13:30 | Lunch | |
| 13:30–16:00 | Data organisation in spreadsheets | Learn how to recognise the underlying structure of your data and arrange it in a ‘tidy’ format to make your life easier. |
| R workflows | Explore and modify workflows for qPCR analysis, RNA sequence analysis, flow cytometry analysis or ImageJ files. |
Audience
This workshop is designed for researchers, technical staff, and postgraduate students in the life sciences who want to develop practical data analysis skills using R and RStudio.
The course is suitable for:
- PhD students, postdoctoral researchers, research assistants, and laboratory technicians working with biological data.
- Academic staff and industry scientists seeking to improve the reproducibility and efficiency of their data analysis workflows.
- Researchers who currently use spreadsheets or point-and-click software and want to transition to code-based analysis.
- Participants with little or no prior programming experience who need a supportive introduction to scientific computing concepts.
- Anyone who has attempted to learn R independently but found file management, working directories, data organisation, or coding workflows challenging.
Prerequisites
No previous programming experience is required.
There are Windows PCs at the venue and you are not required to bring your own machine. Participants from outside of York will be provided with a temporary IT account.
What you will learn
After this workshop the successful learner will be able to:
- explain what an operating system is and the organisation of files and directories in a file system
- explain root, home and working directories along with absolute and relative file paths
- find their way around the RStudio windows, use an RStudio Project to organise work and a script to run commands
- use the R command line to create and use the basic data types in R
- distinguish between continuous and discrete variables and be able to appropriately summarise and plot them in R import data in to RStudio from a variety of file types
- explain what is meant by ‘tidy’ data and organise data in spreadsheets.
- customise and save publication quality figures
- Use and modify R workflows for qPCR analysis, RNA sequence analysis, flow cytometry analysis or ImageJ files
Venue
The workshop will take place in the department of Biology at the University of York.
Room B/R/012 (PC classroom) is located on the Ground Floor of Biology Block R in the Biomedical and Natural Sciences Building at the University of York’s Campus West. Use the University of York Campus Map for directions.
Registration fees and travel scholarships
Fees: Standard £200 | Academic/student/not-for-profit £150
We are pleased to offer 10 travel scholarships of up to £200 to help cover travel and accommodation costs as follows:
- Standard Travel: economy rail or bus travel
- Accommodation: Up to £140 inc. VAT
Scholarships will be allocated on a first-come basis to researchers at academic institutions and not-for-profit organisations who contribute to research in the UK.
You will need to reclaim these expenses by completing an expense form and submitting receipts after participation at the event.
We will contact you within a few days of your registration to confirm whether you have been awarded a scholarship.