Director of the Cornell EpiGenomics Core Facility. Assistant Research Professor in Molecular Biology and Genetics. Assistant Research Professor in Computational Biology. 458 Biotechnology Building, Cornell, Ithaca, NY, USA 14850
The primary focus of my research program is to understand how protein-complexes target the genome, how protein binding changes across cell states, and how mis-regulation of these interactions can result in disease-phenotypes. My group combines novel genomic assays with custom bioinformatic software development. We pursue ‘big-data’ projects that generate thousands of unique (epi)genomic datasets across human, mouse, and yeast model systems. As the Director of the Cornell EpiGenomics Core facility, my group works hand-in-hand with physician scientists at Cornell Weill medical campus to apply our epigenomic technology and algorithmic approaches to biomedical specimens to investigate the fundamental nature of human diseases. Combining wet-bench biochemical genomic technology with analytical bioinformatic algorithmic approaches allows us to understand the fundamental rules of protein-DNA interactions and how these rules are broken in diseased patient samples. The volume and dimensionality of our data typically requires application of machine learning approaches as well as eXplainable AI algorithms. To support our analysis, we heavily leverage NSF-provided ACCESS resources. Our students are trained to perform advanced informatic analysis across a wide spectrum of heterogeneous and high-performance compute systems.
The greatest biological discoveries are achieved when we can pair biochemical and bioinformatic approaches to build something greater than the sum of its parts.
2024
Sequence-free identification of enhancers identifies conserved patterns of chromatin.
Gafur J, Lang OW, Lai WKM.
Submitted 2024.
Adversarial Robustness and Explainability of Machine Learning Models.
Gafur J, Goddard S, Lai WKM.
Practice and Experience in Advanced Research Computing 2024, https://doi.org/10.1145/3626203.3670522
Multi-dimensional analyses identify genes of high priority for pancreatic cancer research.
Nwosu ZC, Giza H, Nassif M, Charlestin V, Menjivar RE, Kim D, Kemp SB, Lai WKM, Loveless I, Steele NG, Hu J, Hu B, Wang S, Magliano MP, Lyssiotis CA.
JCI Insight (accepted) 2024, https://www.biorxiv.org/content/10.1101/2021.05.28.446056v2
2023
GenoPipe: identifying the genotype of origin within (epi)genomic datasets.
Lang O, Srivastava D, Pugh BF, Lai WKM.
Nucleic Acids Research 2023, 51 (22), 12054-12068. PMID: 37933851; PMCID: PMC10711449.
Joint sequence & chromatin neural networks characterize the differential abilities of Forkhead transcription factors to engage inaccessible chromatin.
Arora S, Yang J, Akiyama T, James DQ, Morrissey A, Blanda TR, Badjatia N, Lai WKM, Ko MSH, Pugh BF, Mahony S.
bioArxiv 2023, https://doi.org/10.1101/2023.10.06.561228