Big data wrangling: Bioinformatics and cancer research

12 Dec 2016

The Human Genome Project was just the beginning. We asked bioinformatician Chelsea Mayoh why big data wrangling is a growth area in medical research.

Last month, Chelsea went to Parliament House in Canberra as part of a mentoring program for women in IT. As a member of our Children’s Cancer Institute bioinformatics team, funded by the Kids Cancer Alliance, she works with researchers to get the most out of their data. We asked a few questions about working in this emerging field.

What is bioinformatics and why is it important?

Bioinformatics is formally defined as the combination of biology, computer science, mathematics and statistics to interpret and understand biological data. I see it as the field that takes the massive amounts of data collected from biology experiments and attempts to interpret the meaning behind the data. It uses techniques such as data mining, statistics, data-curation and analysis by creating and using special computer programmes designed to make sense of ‘big-data.’

Bioinformatics is fairly new and is very important in children’s cancer research. As we obtain more and more data from tumours, we can analyse them down to the genetics level and attempt to identify what is unique between the tumour and the patient’s ‘germline’ (normal) tissue. We look at the individual’s DNA and tumour DNA to identify mutations in genes/proteins, measure expression levels of each and see which parts of the DNA have been significantly altered by rearrangements, duplications, deletions and so on.

For a single tumour in a single patient we can generate terabytes (thousands of gigabytes) of data. We need to narrow down all that information to something that’s readable, manageable and interpretable. And we need to look beyond just one patient to large patient cohorts to find a commonality within a cancer type, then expand further to multiple cancer types and so on. So it works at many scales.

In the clinical setting, we also help to find specific clinical risk factors that may identify patients who’ll respond or not respond to treatment, or will develop particular side-effects. With all this data being generated, bioinformaticians are needed to help interpret it.

How did you get into bioinformatics?

I started out wanting to go into molecular biology because I’ve always been interested in the human body and how it works, and why some people develop diseases and other people don’t. When I was enrolling in first year university, I decided to take some computer science classes. I figured that understanding computers and technology would give me an advantage in the workforce. I started speaking to professors and mentors who introduced me to the field of bioinformatics. I did some research and ended up falling in love with it. So I ended up doing both a Molecular Biology and Computer Science Degree which leads you into a bioinformatics position.

What’s a typical day like?

My typical day starts with checking all the jobs I had running overnight and ensuring that they’ve completed and completed successfully. If a job failed then I would have to look into the reason why, fix it and run it again. I would then review my day and schedule any meetings with researchers who would like to discuss the results of their data analysis or those that have contacted me to review experimental design with them before they begin their experiments. Then I start working on the highest priority jobs.

I can work on several projects in a day and have to be very organised in terms of time management, ensuring I always have a project moving forward and my computer running jobs at all times.

Before I leave for the day, I make sure I’m running as many jobs in parallel as possible (generally jobs take 8+ hours to run) so my computer keeps working even though I’m not physically present. Then, when I come in next morning, the big jobs are done and I don’t have to spend actual working hours waiting for a job to complete and wasting CPU power. It’s why there’s a saying in bioinformatics that we actually work 24 hours a day – or our computers do.

What’s the future for bioinformatics?

Over the years, I’ve noticed a larger and larger amount of data being produced and with that comes that challenge of where to store it. Because of this, there are an increasing number of collaborative services such as The Cancer Genome Atlas (TCGA), TARGET, and the International Cancer Genome Consortium (ICGC) where previously-analysed cancer data is stored and made accessible to other cancer researchers.

New bioinformatics programs are constantly being developed by bioinformaticians to aid other bioinformaticians to analyse biological data. These programs are bringing new, improved and more accurate methods to analyse the data.

In future, I think both data sharing and bioinformatics programs will become more common and the field of bioinformatics will continue growing as more bioinformaticians work within the research space.

Read more about our childhood cancer research.