Bioinformatic infrastructure development

Current genomic projects generate hundreds of terabytes of genomic sequencing data and associated metadata. Properly tracking, analyzing, and visualizing all of that data requires extensive management. We use the STENCIL platform to curate a variety of Galaxy workflow into an integrated space while providing interactive analysis tools for further analysis. This platform was crucial to understanding and interpreting the data generated and provided a mechanism for biochemists with minimal genomic training to understand the results of the data generated. Additionally, in response to the NIH’s continued support for enhanced experimental rigor and reproducibility, we also use and develop the PEGR platform to track the metadata associated with each experiment as it was performed. PEGR tracks all aspects of an experiment (i.e., enzyme catalog #, user ID) in real-time over the course of an experiment and embeds that information into a searchable interface that links directly to the downstream sequencing results and the Galaxy platform for linking reproducible bioinformatics all the way back to the original experimental design.

Highlights

  • GenoPipe: identifying the genotype of origin within (epi)genomic datasets.
    Lang O, Srivastava D, Pugh BF, Lai WKM.
    Nucleic Acids Research 2023, Nucleic Acids Research 51 (22), 12054-12068. PMID: 37933851; PMCID: PMC10711449.

  • ScriptManager: an interactive platform for reducing barriers to genomics analysis.
    Lang O, Pugh BF, Lai WKM Lai WKM.
    Practice and Experience in Advanced Research Computing 2022, https://doi.org/10.1145/3491418.3535161

  • PEGR: a flexible management platform for reproducible epigenomic and genomic research.
    Shao D, Kellogg G, Nematbakhsh A, Kuntala PK, Mahony S, Pugh BF, Lai WKM.
    Genome Biology 2022, 19;23(1):99. PMID: 35440038 PMCID: PMC9016988

  • STENCIL: A web templating engine for visualizing and sharing life science datasets.
    Sun Q, Nematbakhsh A, Kuntala PK, Kellogg G, Pugh BF, Lai WKM.
    PLoS Comput Biol. 2022 Feb 9;18(2):e1009859. doi: 10.1371/journal.pcbi.1009859. PMID: 35139076; PMCID: PMC8863220.

See all publications

Mechanisms of gene regulation

Our group applies high-resolution genomic technology to answer basic biological questions. The resolution of our assays provides unprecedented insights into the mechanisms of gene regulation. We use ChIP-exo, PIP-seq, PB-exo, and MNase-ChIP-seq to map the landscape of protein-DNA regulatory mechanisms at base-pair resolution. High-resolution assays have revealed beyond protein binding specificity beyond the core sequence motif. We prefer to combine multiple orthogonal assays (i.e., PIP-seq, ChIP-exo, and RNA-seq in the S. cerevisia) to gain a multi-faceted view of the gene regulatory response.

Highlights

  • An integrated SAGA and TFIID PIC assembly pathway selective for poised and induced promoters.
    Mittal C, Lang O, Lai WKM, Pugh BF
    Genome Research 2022, 36(17-18):985-1001. PMID: 36302553 PMCID: PMC9732905

  • Acute stress drives global repression through two independent RNA polymerase II stalling events in Saccharomyces.
    Badjatia N, Rossi MJ, Bataille AR, Mittal C, Lai WKM, Pugh BF.
    Cell Rep. 2021 Jan 19;34(3):108640. doi: 10.1016/j.celrep.2020.108640. PMID: 33472084; PMCID: PMC7879390.

  • Genome-wide determinants of sequence-specific DNA binding of general regulatory factors.
    Rossi MJ, Lai WKM, Pugh BF.
    Genome Res. 2018 Apr; 28(4):497-508. doi: 10.1101/gr.229518.117. Epub 2018 Mar 21. PMID: 29563167; PMCID: PMC5880240.

  • Genome-wide uniformity of human ‘open’ pre-initiation complexes.
    Lai WKM, Pugh BF.
    Genome Res. 2017 Jan;27 27(1):15-26. doi: 10.1101/gr.210955.116. Epub 2016 Nov 10. PMID: 27927716; PMCID: PMC5204339.

See all publications

Genomic data generation at scale

Cost and scalability are a huge component of many genomic assays. In collaboration with Pugh lab, we are constantly working to develop significantly cheaper, faster, and higher yield versions of ChIP-exo/seq. We have demonstrated the value of these optimizations, by generating thousands of unique datasets in yeast and human model systems. Our work in the S. cerevisiae system produced the first near-complete (>400 proteins) high-resolution atlas of protein binding. We classified the promoter architecture of every gene in yeast and were able to apply the ChIP-exo assay to identify distinct modes of binding related to gene regulation (i.e., Mediator binding at SAGA genes). We also applied our optimized ChIP-exo assay in collaboration with several other groups according to the NIH’s direct request for us to biochemically validate >1,000 monoclonal antibodies generated through the NIH Common Fund. Of the antibodies tested, 5% produced high-quality data and another 34% produced datasets distinct from background that warrant further investigation. These preliminary epigenomic maps will serve as guides for future hypothesis driven research.

Highlights

  • A high-resolution protein architecture of the budding yeast genome.
    Rossi MJ, Kuntala PK, Lai WKM, Yamada N, Badjatia N, Mittal C, Kuzu G, Bocklund K, Farrell NP, Blanda TR, Mairose JD, Basting AV, Mistretta KS, Rocco DJ, Perkinson ES, Kellogg GD, Mahony S, Pugh BF.
    Nature. 2021 Apr; 592(7853):309-314. doi: 10.1038/s41586-021-03314-8. Epub 2021 Mar 10. PMID: 33692541; PMCID: PMC8035251.

  • A ChIP-exo screen of 887 Protein Capture Reagents Program transcription factor antibodies in human cells.
    Lai WKM, Mariani L, Rothschild G, Smith ER, Venters BJ, Blanda TR, Kuntala PK, Bocklund K, Mairose J, Dweikat SN, Mistretta K, Rossi MJ, James D, Anderson JT, Phanor SK, Zhang W, Zhao Z, Shah AP, Novitzky K, McAnarney E, Keogh MC, Shilatifard A, Basu U, Bulyk ML, Pugh BF.
    Genome Res. 2021 Sep; 31(9):1663-1679. doi: 10.1101/gr.275472.121. Epub 2021 Aug 23. PMID: 34426512; PMCID: PMC8415381.

  • Simplified ChIP-exo assays.
    Rossi MJ, Lai WKM, Pugh BF.
    Nat Commun. 2018 Jul 20;9(1):2842. doi: 10.1038/s41467-018-05265-7. PMID: 30030442; PMCID: PMC6054642.

See all publications

Application of novel bioinformatic algorithms to genomic analysis

The development of algorithms capable of taking advantage of high-resolution data is critical to maximizing the utility of the genomic assays. Our approaches focus on simultaneously integrating various chromatin-interacting proteins across multiple biochemical platforms to identify biologically relevant conclusions. In particular, the ArchAlign algorithm addressed the need for high-resolution reference points for the proper interpretation of high-throughput data. In collaboration with Shaun Mahony’s research group, we published a novel ChIP-exo peak-calling algorithm that identifies ChIP-target co-factors by identifying enriched peaks that possess sequence and tag distributions distinct from the target.

Highlights

  • ArchAlign: coordinate-free chromatin alignment reveals novel architectures.
    Lai WK, Buck MJ.
    Genome Biol. 2010; 11(12):R126. doi: 10.1186/gb-2010-11-12-r126. Epub 2010 Dec 23. PMID: 21182771; PMCID: P

  • ArchTEx: accurate extraction and visualization of next-generation sequence data.
    Lai WK, Bard JE, Buck MJ.
    Bioinformatics. 2012 Apr 1;28(7):1021-3. doi: 10.1093/bioinformatics/bts063. Epub 2012 Feb 2. PMID: 22302569.

  • An integrative approach to understanding the combinatorial histone code at functional elements.
    Lai WK, Buck MJ. Rizzo JM,
    Bioinformatics. 2013 Sep 15; 29(18):2231-7. doi: 10.1093/bioinformatics/btt382. Epub 2013 Jul 2. PMID: 23821650.PMCID: PMC4107033.

  • Characterizing protein-DNA binding event subtypes in ChIP-exo data.
    Yamada N, Lai WKM, Farrell N, Pugh BF, Mahony S.
    Bioinformatics. 2019 Mar 15;35(6):903-913. doi: 10.1093/bioinformatics/bty703. PMID: 30165373; PMCID: PMC6419906.

See all publications