A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of genomic conservation measures and biochemical annotations.
Tissue-specific functional annotation through integrative analysis of Roadmap epigenomic data.
GenoSkyline-Plus is a comprehensive update of GenoSkyline that incorporates more annotation data into the framework and extends to 127 integrated annotation tracks covering a spectrum of human tissue and cell types.
GWAS.PC (GWAS Power Calculation) is an R package (Code) that does power analysis in genome wide association studies. In particular, genotyping error is considered in power calculation.
Post-GWAS prioritization through integrated analysis of GWAS summary statistics and GenoCanyon genomic functional annotation.
A statistical approach to prioritizing GWAS results by integrating pleiotropy information and annotation data.
This is an R package (Code Example) implementing a post-GWAS prioritization algorithm, which incorporates the rewiring information of co-expression network to prioritize GWAS signals.
This is a program to implement the Markov Random Field (MRF) method to incorporate pathway topology for genome wide association studies. (Example.R, fun_network.R, network.csv, pval.txt)
Local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits.
Estimating genetic correlation jointly using individual-level and summary-level GWAS data.
Estimating correlation between composite phenotypes and traits.
A Bayesian Approach to Correcting the Attenuation Bias of Regression Using Polygenic Risk Score.
A statistical model to assess replicability of biomarker.
UKin is an improved kinship estimation method which can reduce both bias and root mean square error (RMSE) in the estimation of genomic relationship matrix.
Integration of expression QTLs with fine mapping via SuSiE.
Knockoff procedure improves identification of candidate causal genes in conditional transcriptome-wide association studies.
cWAS is a statistical framework to identify cell types whose genetically regulated proportions are associated with complex diseases.
REML-mediation is an restricted-maximum-likelihood (REML)-based mediation analysis framework that adjusts for genetic confounding effects.
LDER-GE improves the accuracy of estimating the phenotypic variance component explained by genome-wide GE interactions using large-scale biobank association summary statistics.
BV-LDER-GE harnesses both correlations with additive genetic effects and full LD information to enhance the statistical power to detect genome-scale G E interactions.
A Low-Rank representation and Sparse regression for eQTL mapping. This algorithm accounts for confounding factors such as unobserved covariates, experimental artifacts, and unknown environmental perturbations.
A hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci .
CASE is an R package designed for multi-trait fine-mapping analysis, with a particular focus on single-cell eQTL fine-mapping.
UTMOST (Unified Test for MOlecular SignaTures) is a principled method to perform cross-tissue expression imputation and gene-level association analysis.
T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) is a framework to identify disease-associated genes leveraging epigenetic information.
CosGeneGate selects multi-functional and credible biomarkers for single-cell analysis.
Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research.
An R code for pathway-based classification and regression using Random Forests.
iPAC (Identification of Protein Amino acid mutation Clustering) finds mutation clusters on the amino acid level while taking into account the protein structure.
A bioconductor R package for identifying mutational clusters of amino acids in a protein while utilizing the protein tertiary structure via a graph theoretical model.
GRAPE is a template method that allows for identification of perturbed pathways in individual tumor samples relative to a reference collection of samples (e.g., matched healthy tissue). GRAPE is sensitive to biological variability, robust to batch effects and can be applied to any gene expression platform.
Identifies clustering of somatic mutations in proteins via a simulation approach while considering the protein's tertiary structure.
Distance-correlation based Gene Set Analysis for longitudinal gene expression profiles. In longitudinal studies, the gene expression profiles were collected at each visit from each subject and hence there are multiple measurements of the gene expression profiles for each subject. The dcGSA package could be used to assess the associations between gene sets and clinical outcomes of interest by fully taking advantage of the longitudinal nature of both the gene expression profiles and clinical outcomes.
Identifies clustering of somatic mutations in proteins over the entire quaternary structure.
EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2.
A fast and robust Bayesian nonparametric method for prediction of complex traits using GWAS summary statistics.
A statistical method for cross-population prediction of complex traits.
A statistical method to calculate PRS in admixed population.
A statistical model for multi-population PRS calculation.
ResPAN: a powerful batch correction model for scRNA-seq data through residual adversarial networks.
Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders.
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data.
scNAT: a deep learning method for integrating paired single-cell RNA and T cell receptor sequencing profiles.
A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data.
Variable importance-weighted Random Forests (viRandomForests) is an R package, which samples features according to their variable importance scores, and then selects the best split from the randomly selected features, to improved prediction accuracy in the presence of weak signals and large noises.
We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity.
© 2022 Hongyu Zhao, Ph.D.
Created by Eddie, Chen and Wangjie.