The Institute for Molecular Medicine Finland (FIMM) at the University of Helsinki was founded with an ambitious mission: to understand the causes of human diseases, improve diagnostics, and develop new treatments for common health problems. In 2017, it embarked on a major new initiative called FinnGen, which advances the goals of personalized medicine by bringing together the genomic data and national health register data of 500,000 Finnish citizens—or a tenth of the country’s population. The field of personalized medicine aims to leverage large-scale biomedical data about populations into customized medical diagnoses and treatments for individual patients. FinnGen has partnered with nine international pharmaceutical companies as well as public institutions to become one of the largest collaborative biomedical research efforts to date.
Mari Kaunisto, Communications Director and geneticist at FinnGen, explains that the project draws on Finland’s unique genetic profile: “we have a limited amount of genetic variation in the Finnish population and very good reference genome databases.The combination of genotype information and health data enables genetic discoveries that improve our understanding of disease mechanisms, creating medical breakthroughs.” The project analyzes de-identified samples collected by participating biobanks from all over the nation, using microarrays to create genotypes. This data is then cross-referenced with comprehensive digital health records from multiple health registries that provide longitudinal information over an individual’s lifetime. The process of collecting and combining the biological and clinical information is designed to protect the privacy and ensure the consent of each individual donor, with coded records and granular controls for access and permissions.
"The combination of comprehensive genomic information from the Finnish founder population and national health register data enables genetic discoveries that improve our understanding of disease mechanisms and benefit global healthcare systems long into the future."Mari Kaunisto, Communications Director and Senior Researcher, FinnGen
The challenges of scaling and protecting genomic data
The project has already processed 100,000 individual records, generating three terabytes of raw data and adding 50,000 participants every six months. To achieve their goals, FinnGen needed a technological infrastructure that could scale to manage an estimated 1.5 petabytes of genomic data over the next three years. They also needed vigorous security and privacy controls to meet donor, institutional, and European Union requirements. To manage all of these demands, Jarmo Harju, Head of IT and Data Management at FinnGen, and his team turned to Google Cloud Platform (GCP). By building on Google’s Compute Engine and Cloud Storage and integrating other Google tools like Dataproc, AppEngine, Identity and Access Management (IAM), BigQuery, and CloudSQL, the team designed a seamless infrastructure for scaling data while keeping it secure and private.
The FinnGen solution depends on two separate structures: a Data Factory, which imports and processes the genomic and private health data, and a Data Library, which stores them. Data Factory information is identified by FinnGen IDs and can only be seen by in-house operators. The Data Library stores the coded data and releases it at regular intervals to researchers. “It’s a challenge to allow access to researchers to analyze the data without allowing them to download or copy them,” Harju points out. “So we also developed a Sandbox and provide customized access to our partners.” He explains that the Sandbox requires users to access the data through a service account, which is run on a virtual machine on an isolated network without external IP addresses. This creates an extra layer of privacy protection and security between the researcher and the data. The Data Library has already migrated to GCP; the Data Factory is next. Harju reports that “quite soon we should have all the FinnGen data storage and processing on Google Cloud, which will improve security and usability from the end users’ point of view. All personal data is stored within the EU, and in the future possibly in Finland.”
"Quite soon we should have all the FinnGen data storage and processing on Google Cloud, which will improve security and usability from the end users’ point of view."Jarmo Harju, Head of IT and Data Management, FinnGen
The promise of new medical breakthroughs
The new datasets will be made available to FinnGen’s international research team every six months, fueling more collaboration and potential discoveries. “We hope that FinnGen will reveal new biological or molecular mechanisms behind a number of diseases,” Kaunisto says. “This will help deliver better diagnostics and treatments, or even ways to prevent disease. For example, if someone learns they are at high risk for cardiovascular disease, can we help them make lifestyle changes to decrease the risk?” FinnGen’s pharmaceutical partners see the potential in using this research to improve their decision-making about which molecules to select for further study. Genetically-supported drug targets can lead to more successful drug trials, thus improving the odds of developing effective and safe drug treatments. With 500,000 participants, the study bears an enormous responsibility, Kaunisto says, which it takes very seriously: “we value the trust of our study participants and by ensuring their data privacy now we can all reap the long-term benefits of this research.”