Sequencing Blood Cells

A community science initiative – HAEMCODE – has been welcomed for its contribution to our understanding of blood cells and ultimately, to the development of better treatments for leukaemia.

We curated more than 300 different studies from a wide range of mouse cell line models to create a compendium that covered 84 transcription factors

Professor Bertie Gottgens

A new initiative called HAEMCODE (developed by the Haematopoietic Stem Cell Lab at the Cambridge Institute for Medical Research and hosted by the Wellcome Trust and MRC Cambridge Stem Cell Institute) demonstrates how community-based science can improve our understanding of the functions of genes – and, in particular, the mechanisms that determine the function of normal and abnormal blood cells. 

Large scale DNA sequencing efforts were first initiated over 30 years ago following the conception of the Human Genome Project, with the Wellcome Trust Sanger Institute just outside of Cambridge making numerous key contributions to this initial era of genome research.

In the early days, genome sequencing was limited to large research organisations. But, by virtue of huge advances in technologies (so-called next generation sequencing), the sequencing of large amounts of DNA has become increasingly faster and cheaper, with the costs of sequencing a whole human genome having fallen from $95,000,000 in 2001 to less than $5,000 in 2013. 

“As the result of technology developed here in Cambridge, notably by a spin-out from the Chemistry Department led by Shankar Balasubramanian, biomedical research has become democratised. Rather than being the preserve of a small number of large centres, exciting DNA-sequencing based research is now being carried out by lots of small labs,” said Professor Bertie Gottgens, who heads the Haematopoietic Stem Cell Lab at the Cambridge Institute for Medical Research.

“These laboratories tend to have a high level of specialist expertise in specific areas – for example my lab specialises in the study of rare blood stem cells that have the ability to develop into all types of mature blood cells throughout adult life while others may focus on specific cell types involved in immune disorders or specific subtypes of cancer.”

Earlier this month Professor Gottgens’s lab launched an online web tool called HAEMCODE, hosted by the Wellcome Trust and MRC Cambridge Stem Cell Institute. The website has been acclaimed by researchers in the field. One reviewer commented that this initiative proved that the community approach to ‘curating’ datasets from small projects could outperform large consortia efforts at a much lower cost and that this finding was “important to policy makers, scientific administrators and individual scientists”.

HAEMCODE joins a number of pioneering initiatives which are reaping the benefits of a community approach to gathering and sharing scientific data in a spirit of open collaboration. Along with a number of other specialist groups world-wide, the Gottgens lab is dedicated to understanding the function of blood cells and, ultimately, to providing a scientific platform for the development of new and more effective treatments for leukaemia, a cancer of the blood or bone marrow which commonly reveals itself as an abnormal number of white blood cells.

Essentially, HAEMCODE is a repository, and point of access, for genomic information relating to mouse haematopoiesis – the process of production, multiplication, and specialisation of blood cells in the bone marrow. One of the keys to understanding how cells differ, and how to treat them when they are behaving abnormally, is to discover the regulatory mechanisms that make them different. Among these mechanisms are molecules called transcription factors (TFs). TFs are molecules with highly specific functions: they control the flow of information from DNA to messenger RNA. In other words, they are central regulators of the protein production activity of DNA.

The Gottgens Lab used the community approach to look at information about TFs available in public databases.  This enabled the researchers to get a big-science view of the topic. Professor Gottgens explained: “We manually curated more than 300 different studies from a wide range of mouse cell line models to create a compendium that covered 84 transcription factors. Currently available data from large consortium projects covers less than half of this.”

HAEMCODE also gives experimentalist and computational biologists access to a range of online analysis tools.  For example, analysis of patient samples may have identified a number of genes associated with a particular outcome, such as favourable response to a given treatment option.  HAEMCODE analysis tools provide streamlined ways for the identification of the likely regulators of such ‘prognostic’ genes.  This in turn offers a fast-track route towards performing experiments that can discover the underlying regulatory mechanisms, representing a critical first step for the development of new rationally-designed therapies.

In an editorial for the journal Science last year, editor-in-chief Bruce Alberts made a strong plea in support of small-scale projects during a time when funding is (at best) static and resources are tight.  He wrote: “A typical human cell contains approximately 10,000 different proteins, organised into hundreds of different complexes that function as protein machines…. To make sense of the biology and gain the likely health benefits from such an understanding, each of these proteins will need to be studied in detail by biochemists – work that typically takes place in small laboratories.”

Professor Gottgens says that one of the next steps is to create a resource equivalent to HAEMCODE with a special emphasis on both normal and leukaemic blood cells, with the specific aim of accelerating research into the causes and potential treatment options for leukaemia. He has been contacted by a number of investigators from across the globe, who want to incorporate the data compiled for HAEMCODE into other community science initiatives.

The HAEMCODE project was funded by the Biotechnology and Biological Sciences Research Council, Leukaemia and Lymphoma Research, the Medical Research Council (MRC), Cancer Research UK, the Cambridge National Institute for Health Research (NIHR) Biomedical Research Centre and core support grants from the Wellcome Trust–MRC Cambridge Stem Cell Institute.

For more information about this story, contact Alex Buxton, Office of Communications, University of Cambridge, amb206@admin.cam.ac.uk, 01223 761673.


This work is licensed under a Creative Commons Licence. If you use this content on your site please link back to this page.