Director of the Cornell Center for Vertebrate Genomics. Director of the Cornell EpiGenomics Core Facility. Assistant Research Professor in Molecular Biology and Genetics. Assistant Research Professor in Computational Biology. 458 Biotechnology Building, Cornell, Ithaca, NY, USA 14850
The primary focus of my research program is to understand how protein-complexes target the genome, how protein binding changes across cell states, and how mis-regulation of these interactions can result in disease-phenotypes. My group combines novel genomic assays with custom bioinformatic software development. We pursue ‘big-data’ projects that generate thousands of unique (epi)genomic datasets across human, mouse, and yeast model systems. As the Director of the Cornell EpiGenomics Core facility, my group works hand-in-hand with physician scientists at Cornell Weill medical campus to apply our epigenomic technology and algorithmic approaches to biomedical specimens to investigate the fundamental nature of human diseases. Combining wet-bench biochemical genomic technology with analytical bioinformatic algorithmic approaches allows us to understand the fundamental rules of protein-DNA interactions and how these rules are broken in diseased patient samples. The volume and dimensionality of our data typically requires application of machine learning approaches as well as eXplainable AI algorithms. To support our analysis, we heavily leverage NSF-provided ACCESS resources. Our students are trained to perform advanced informatic analysis across a wide spectrum of heterogeneous and high-performance compute systems. The growing complexity of STEM research now frequently demands expertise in more domains of science than what any single research group is able to encompass. As the Director of the Cornell Center for Vertebrate Genomics, my mandate is to identify, develop, and support cross-discipline collaborations and foster an engaged community across the University. In a world that increasingly requires collaboration to leverage the latest and most powerful technolgoies, it is my priviledge to support the researchers in the Center.
The greatest biological discoveries are achieved when we can pair multiple distinct approaches (e.g., biochemistry, molecular biology, bioinformatics, etc.) to build something greater than the sum of its parts.
2025
Protein structure prediction and design for high-throughput computing.
Mathew VS, Kellogg GD, Lai WKM.
bioArxiv 2025, https://doi.org/10.1101/2025.07.18.665594
Adversarial attack of sequence-free enhancer prediction identifies chromatin architecture.
Gafur J, Lang OW, Lai WKM.
Bioinformatics 2025, 41(7). PMID: 40581823; PMCID: PMC12240468. https://doi.org/10.1093/bioinformatics/btaf371
2024
Adversarial Robustness and Explainability of Machine Learning Models.
Gafur J, Goddard S, Lai WKM.
Practice and Experience in Advanced Research Computing 2024, https://doi.org/10.1145/3626203.3670522
Multi-dimensional analyses identify genes of high priority for pancreatic cancer research.
Nwosu ZC, Giza H, Nassif M, Charlestin V, Menjivar RE, Kim D, Kemp SB, Lai WKM, Loveless I, Steele NG, Hu J, Hu B, Wang S, Magliano MP, Lyssiotis CA.
JCI Insight 2024, 10(4):e174264. PMID: 39774001; PMCID: PMC11949049.https://doi.org/10.1172/jci.insight.174264
2023
GenoPipe: identifying the genotype of origin within (epi)genomic datasets.
Lang O, Srivastava D, Pugh BF, Lai WKM.
Nucleic Acids Research 2023, 51 (22), 12054-12068. PMID: 37933851; PMCID: PMC10711449.
Joint sequence & chromatin neural networks characterize the differential abilities of Forkhead transcription factors to engage inaccessible chromatin.
Arora S, Yang J, Akiyama T, James DQ, Morrissey A, Blanda TR, Badjatia N, Lai WKM, Ko MSH, Pugh BF, Mahony S.
bioArxiv 2023, https://doi.org/10.1101/2023.10.06.561228