Zhao Lab

Human Genome Annotations

GenoCanyon

A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of genomic conservation measures and biochemical annotations.

GenoSkyline

Tissue-specific functional annotation through integrative analysis of Roadmap epigenomic data.

GenoSkyline-Plus is a comprehensive update of GenoSkyline that incorporates more annotation data into the framework and extends to 127 integrated annotation tracks covering a spectrum of human tissue and cell types.

Genome Wide Association Study

GWAS.PC

GWAS.PC (GWAS Power Calculation) is an R package (Code) that does power analysis in genome wide association studies. In particular, genotyping error is considered in power calculation.

GenoWAP

Post-GWAS prioritization through integrated analysis of GWAS summary statistics and GenoCanyon genomic functional annotation.

GPA

A statistical approach to prioritizing GWAS results by integrating pleiotropy information and annotation data.

GBR

This is an R package (Code Example) implementing a post-GWAS prioritization algorithm, which incorporates the rewiring information of co-expression network to prioritize GWAS signals.

GWAS with MRF pathway

This is a program to implement the Markov Random Field (MRF) method to incorporate pathway topology for genome wide association studies. (Example.R, fun_network.R, network.csv, pval.txt)

SUPERGNOVA

Local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits.

GENJI

Estimating genetic correlation jointly using individual-level and summary-level GWAS data.

Composite-trait LDSC

Estimating correlation between composite phenotypes and traits.

BayesMEModel

A Bayesian Approach to Correcting the Attenuation Bias of Regression Using Polygenic Risk Score.

MAJAR

A statistical model to assess replicability of biomarker.

UKin

UKin is an improved kinship estimation method which can reduce both bias and root mean square error (RMSE) in the estimation of genomic relationship matrix.

SuSiE²

Integration of expression QTLs with fine mapping via SuSiE.

TWASKnockoff

Knockoff procedure improves identification of candidate causal genes in conditional transcriptome-wide association studies.

cWAS

cWAS is a statistical framework to identify cell types whose genetically regulated proportions are associated with complex diseases.

REML-mediation

REML-mediation is an restricted-maximum-likelihood (REML)-based mediation analysis framework that adjusts for genetic confounding effects.

LDER-GE

LDER-GE improves the accuracy of estimating the phenotypic variance component explained by genome-wide GE interactions using large-scale biobank association summary statistics.

BV-LDER-GE

BV-LDER-GE harnesses both correlations with additive genetic effects and full LD information to enhance the statistical power to detect genome-scale G E interactions.

QTL

LORS

A Low-Rank representation and Sparse regression for eQTL mapping. This algorithm accounts for confounding factors such as unobserved covariates, experimental artifacts, and unknown environmental perturbations.

HBI

A hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci .

CASE

CASE is an R package designed for multi-trait fine-mapping analysis, with a particular focus on single-cell eQTL fine-mapping.

TWAS

UTMOST

UTMOST (Unified Test for MOlecular SignaTures) is a principled method to perform cross-tissue expression imputation and gene-level association analysis.

T-GEN

T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) is a framework to identify disease-associated genes leveraging epigenetic information.

Genomics

CosGeneGate

CosGeneGate selects multi-functional and credible biomarkers for single-cell analysis.

Geneverse

Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research.

Pathway Analysis

Pathway Analysis using Random Forests

An R code for pathway-based classification and regression using Random Forests.

COSINE

An R package (source) to extract the globally most discriminative sub-network from multiple gene expression data sets with integration of protein-protein interactions data.

Cancer Genomics

iPAC

iPAC (Identification of Protein Amino acid mutation Clustering) finds mutation clusters on the amino acid level while taking into account the protein structure.

GraphPAC

A bioconductor R package for identifying mutational clusters of amino acids in a protein while utilizing the protein tertiary structure via a graph theoretical model.

GRAPE

GRAPE is a template method that allows for identification of perturbed pathways in individual tumor samples relative to a reference collection of samples (e.g., matched healthy tissue). GRAPE is sensitive to biological variability, robust to batch effects and can be applied to any gene expression platform.

SpacePAC

Identifies clustering of somatic mutations in proteins via a simulation approach while considering the protein's tertiary structure.

dcGSA

Distance-correlation based Gene Set Analysis for longitudinal gene expression profiles. In longitudinal studies, the gene expression profiles were collected at each visit from each subject and hence there are multiple measurements of the gene expression profiles for each subject. The dcGSA package could be used to assess the associations between gene sets and clinical outcomes of interest by fully taking advantage of the longitudinal nature of both the gene expression profiles and clinical outcomes.

QuartPAC

Identifies clustering of somatic mutations in proteins over the entire quaternary structure.

Genetic Risk Prediction

EBPRS

EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2.

SDPR

A fast and robust Bayesian nonparametric method for prediction of complex traits using GWAS summary statistics.

SDPRX

A statistical method for cross-population prediction of complex traits.

SDPR_admix

A statistical method to calculate PRS in admixed population.

JointPRS

A statistical model for multi-population PRS calculation.

Single Cell

ResPAN

ResPAN: a powerful batch correction model for scRNA-seq data through residual adversarial networks.

scAAnet

Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders.

MuSe-GNN

MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data.

scNAT

scNAT: a deep learning method for integrating paired single-cell RNA and T cell receptor sequencing profiles.

MARBLES

A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data.

WES/WGS

M-DATA

A statistical model to jointly analyze de novo mutations for multiple traits.

N-DATA

A network-assisted model of de novo variants using protein-protein interaction information.

PERADIGM

PERADIGM: Phenotype Embedding Similarity-based Rare Disease Gene Mapping.

Others

ViRandomForests

Variable importance-weighted Random Forests (viRandomForests) is an R package, which samples features according to their variable importance scores, and then selects the best split from the randomly selected features, to improved prediction accuracy in the presence of weak signals and large noises.

CorBin

We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity.

Softwares