Richard Linchangco, Jeremy J Jay and Cory Brouwer (2017). Linking Nutrition and Molecular Biology Using Data Mining and Graph Theory. The FASEB Journal 31(1).
Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC
Cardiovascular disease, cancer, and diabetes are among the leading causes of death in the United States. Factors of chronic disease include physical activity, genetics, and diet, but only physical activity and diet can be controlled. Traditionally, diet and human health have been studied through epidemiological studies at the phenotypic and observational level. The molecular mechanisms underlying the health benefits of certain foods remain largely unknown, which creates a disconnect between plant science, nutrition, human health, and molecular biology.
The need to link the disparate domains of nutrition and molecular biology presents a difficult problem. Existing curated resources provide a strong baseline within the distinct domains, but data connecting them does not yet exist in machine-readable form. This work applies data mining techniques to large collections of both medical and agricultural publication abstracts to generate links between nutrition and biology. This network of integrated data is placed into a graph database, which is the first application of Neo4j to a knowledge-oriented nutrigenomics graph. This work enables novel queries to discover new evidence in the support of open nutrigenomics questions.