TSUNAMI:

Translational Bioinformatics Tool SUite for Network Analysis and MIning

Introduction
Gene co-expression network (GCN) mining aims to mine gene modules with highly correlated expression profiles across sample cohorts. It may help to reveal latent molecular interactions, identify novel gene functions, pathways and drug targets, as well as providing disease mechanistic insights on for biological researchers. TSUNAMI is developed to allow biological researchers with no programing background to perform GCN mining themselves. It has several highlight features and advantages:
  • User friendly interface, easy-access and real-time co-expression network mining based on web server;
  • Direct access and search of GEO database as well as user-input expression matrix for network mining;
  • Support multiple data formats and data preprocessing interface is bundled together;
  • Multiple co-expression analysis tools available with a high flexibility of variable selection;
  • Integrated downstream Enrichr GO enrichment analysis and link to other GO tools as well;
  • All results can be downloaded with multiple formats (CSV, txt, etc.).
All of which bring convenience to researchers for multiple purposes.
Pipeline Flowchart
TSUNAMI Flowchart
Figure above: TSUNAMI in flowchart. Blue blocks represent operations; Pink rounded rectangles represent Download processes; Dashed arrow means optional process.



Overview

Gene Expression Omnibus
GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Numerous array- and sequence-based data are available for downstream analysis.
Statistics



Database
Example GSE Microarray Data: GSE17537; GSE88882; GSE98761; GSE40294; GSE73119; GSE31399; GSE21361; GSE13002; GSE4309; GSE61084; GSE61085. Example Single-cell RNA-seq Data: GSE59739_DataTable
Table of illuminahiseq rnaseqv2 RSEM genes normalized mRNA-seq data

File uploader

Note: maximum file size allowed for uploading is 300 MB. If data is uploaded from a .xlsx or .xls file, separator can be any value, but please make sure data are located in Sheet1.

Data Summary

                    
Data Preview
Verify starting column and row of expression data:
Choose starting column and row for expression data. Default values when leaving the input boxes blank: starting row = 1, starting column = 2.
Convert probe ID to gene symbol:
Convert probe ID to gene symbol with identified platform (optional for self-uploaded data): Be sure to verify (modify) gene symbol.

Remove genes:
Remove rows with lowest percentile mean expression value shared by all samples. Then remove data with lowest percentile variance across samples. Default values when leaving the input boxes blank: 0.
Select samples subgroup
Select/Deselect all

lmQCM: An Algorithm for Detecting Weak Quasi-Cliques in Weighted Graph

If you benefit from the results, please cite: Zhang, Jie, and Kun Huang. "Normalized ImQCM: An Algorithm for Detecting Weak Quasi-Cliques in Weighted Graph with Applications in Gene Co-Expression Module Discovery in Cancers." Cancer informatics 13 (2014): CIN-S14021.
Parameter Choosing
Gamma (γ) (Default = 0.7, Recommend: 0.70 ~ 0.75) controls the threshold for the initiation of each new module, lambda (λ) (Default = 1) and t (Default = 1) define the adaptive threshold of the module density to ensure proper stopping criterion for the greedy search for each module (Usually λ and t won't change), and beta (β) (Default = 0.4) is the threshold for overlapping ratio for merging Weight normalization is to normalize the correlation matrix (default: Not selected). However we recommend to check it while the expression data comes from microarray.





WGCNA: An R package for weighted correlation network analysis

If you benefit from the results, please cite: The WGCNA as an analysis method is described in: Zhang B and Horvath S (2005) A General Framework for Weighted Gene Co-Expression Network Analysis, Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17 PMID: 16646834 The package implementation is described in the article: Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559
Step 1: Pick Soft Thresholding
The soft thresholding, is a value used to power the correlation of the genes to that threshold. The assumption on that by raising the correlation to a power will reduce the noise of the correlations in the adjacency matrix. To pick up one threshold use the pickSoftThreshold function, which calculates for each power if the network resembles to a scale-free graph. The power which produce a higher similarity with a scale-free network is the one you should use.

Step 2: Choose Parameters
Choose the power and remaining parameters. Default are as showned. power (β, Default = 6): The soft thresholding. 6 is large enough so that the resulting network exhibited approximate scale free topology. reassignThreshold (Default = 0): P-value ratio threshold for reassigning genes between modules. mergeCutHeight (Default = 0.25): Dendrogram cut height for module merging. minModuleSize (Default = 10): Minimum module size for module detection.




Final Incoming Data

You can verify the final incoming data and also download it. Download Final Data (CSV)
GO Enrichment Analysis for following all Genes. Warning: Directly process large # of genes may cause very slow GO process. We suggest user perform Co-expression clustering and do GO analysis with small amount of genes.
Circos Plot When finished, go to 4. Result Circos plots section. We strongly recomment user clean the genes first through our Data Preprocessing section. If genes are not get cleaned, such as RBM|123 cannot be found which RBM is supposed to be in hg38 database.

Data Preview

Download Results

Download
Download

Preview

Survival Analysis

Please select which row of the eigengene matrix would be applied for survival analysis. Groups are dichotomized by its median value.
Please copy and paste following information in the correct order with regard to sample IDs (column names in above eigengene matrix), Note: separator (space, comma, new line, tab, or semicolon) will be identified automatically.

Circos Plot

Parameters of Circos Plot

You can directly use your own data here without any previous operation.

Enrichment Analysis - by Enrichr

Adjusted P-value (q-value):
The q-value is an adjusted p-value using the Benjamini-Hochberg method for correction for multiple hypotheses testing. Users can read more about this method, and why it is needed here: Yoav Benjamini and Yosef Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) Vol. 57, No. 1 (1995), pp. 289-300
Relationship between P-value, Z-score, and combined score:
The combined score is a combination of the p-value and z-score calculated by multiplying the two scores as follows: c = ln(p) * z Where c is the combined score, p is the p-value computed using Fisher's exact test, and z is the z-score computed to assess the deviation from the expected rank. The combined score provides a compromise between both methods and in several benchmarks we show that it reports the best rankings when compared with the other scoring schemes.

Download Results & Further Analysis

Download

The target gene symbols allow users to copy and use in other GO analysis website.
ToppGene
DAVID
Enrichr

Tutorial

Google Slides

Presentation

Google Slides

Github

https://github.com/huangzhii/TSUNAMI

Report Bugs

https://github.com/huangzhii/TSUNAMI/issues/

Frequently Asked Questions

General Questions

What is the TSUNAMI website?

The TSUNAMI (Translational Bioinformatics Tool Suite for Network Analysis and Mining) was developed at Indiana University School of Medicine.

How do I get started?

Please refer to our tutorial.

Can I use TSUNAMI to analysis my data from my mobile devices?

Yes you can!

TSUNAMI_iPhoneX

TSUNAMI website adopted responsive web design and is compatible with any mobile terminal. Every process, analysis, and computation is performed on the server behind your mobile browser. File uploading system would still work even on the phone when you are waiting a bus.

News

April 24, 2018

  • Updated GEO offline data list to date 04/24/2018.
  • Fixed a bug when percentiles are 0 or NULL.
  • Moved platform converter to the right siderbar.
  • Add conditional Panel on Advanced Data Preprocessing.

March 27, 2018

  • Texts are modified.
  • Footer added.
  • Update pipeline flowchart.
  • Update funding information.
  • Added a Google Slides tutorial talk.

March 20, 2018

  • R package 'lmQCM' was released to CRAN.
  • Create flowchart.

March 16, 2018

  • Renamed our website as TSUNAMI.
  • Various of bugs are fixed.

March 02, 2018

  • First prototype platform has been deployed.

About Us

The TSUNAMI (Translational Bioinformatics Tool Suite for Network Analysis and Mining) was developed at Indiana University School of Medicine. The design of such user-friendly implementations of our TSUNAMI pipeline provides a comprehensive analysis tool suite for users to study gene interaction from raw transcriptomic data level all the way to the gene ontology level with just simple button clicks.
TSUNAMI TSUNAMI

Our Other Softwares

lmQCM
R package: lmQCM

annoPeak
annoPeakR: a web-tool to annotate, visualize and compare peak sets from ChIP-seq/ChIP-exo

iGenomicsR
iGenomicsR: An integrative platform to explore, visualize, and analyze multidimensional genomics data for disease

Circos Viewer
Circos Viewer: A Circos Plot Viewer.

iGPSe Plus
iGPSe Plus: Integrative Genomic based Canser Patient Stratification

Development Team

Prof. Kun Huang's Laboratory
  • Zhi Huang
  • Zhi Han
  • Jie Zhang
  • Kun Huang

Publications

  • -

Funding for the TSUNAMI is or has been provided by:

  • Partially supported by IUSM startup fund, the NCI ITCR U01 (CA188547).
  • Data Science and Bioinformatics Program for Precision Health Initiative, Indiana University.

TSUNAMI Version v1.8 | IUSM | RI

Questions and feedback: zhihuan@iu.edu | Report Issue | Github