Run scGNN

Program scGNN.py is the main entrance of scGNN to impute and clustering. There are quite a few parameters to define to meet users’ requirements.

Required

  • datasetName defines the folder of scRNA-Seq
  • LTMGDir defines folder of the preprocessed LTMG output
  • outputDir Output folder of the results

Optional: Hyperparameters

  • EM-iteration defines the number of iteration, default is 10
  • Regu-epochs defines epochs in Feature Autoencoder initially, default is 500
  • EM-epochs defines epochs in Feature Autoencoder in the iteration, default is 200
  • cluster-epochs defines epochs in the Cluster Autoencoder, default is 200
  • k is k of the K-Nearest-Neighour Graph
  • knn-distance distance type of building K-Nearest-Neighour Graph, supported type: euclidean/cosine/correlation (default: euclidean)
  • GAEepochs Number of epochs to train in Graph Autoencoder

Optional: Performance

  • quickmode whether or not to bypass the Cluster Autoencoder.
  • useGAEembedding whether use Graph Autoencoder
  • regulized-type is the regularized type: noregu/LTMG, default is to use LTMG
  • alphaRegularizePara alpha in the manuscript, the intensity of the regularizer
  • EMregulized-type defines the imputation regularizer type:noregu/Graph/Celltype, default: Celltype
  • gammaImputePara defines the intensity of LTMG regularizer in Imputation
  • graphImputePara defines the intensity of graph regularizer in Imputation
  • celltypeImputePara defines the intensity of celltype regularizer in Imputation
  • L1Para defines the intensity of L1 regularizer, default: 1.0
  • L2Para defines the intensity of L2 regularizer, defualt: 0.0
  • saveinternal whether output internal results for debug usage

Optional: Speed

  • no-cuda defines devices in usage. Default is using GPU, add –no-cuda in command line if you only have CPU.
  • coresUsage defines how many cores can be used. default: 1. Change this value if you want to use more.

Example:

CSV format

For CSV format, we need add –nonsparseMode

Without LTMG:

python -W ignore scGNN.py --datasetName GSE138852 --datasetDir ./  --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode --nonsparseMode

(Optional) Using LTMG:

python -W ignore scGNN.py --datasetName GSE138852 --datasetDir ./ --LTMGDir ./ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode --nonsparseMode --regulized-type LTMG

10X format

Without LTMG:

python -W ignore scGNN.py --datasetName 481193cb-c021-4e04-b477-0b7cfef4614b.mtx --datasetDir liver/ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode

(Optional) Using LTMG:

python -W ignore scGNN.py --datasetName 481193cb-c021-4e04-b477-0b7cfef4614b.mtx --LTMGDir liver/ --datasetDir liver/ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode --regulized-type LTMG

On these demo dataset using single cpu, the running time of demo codes is ~33min/26min. User should get exact same results as paper shown with full running time on single cpu for ~6 hours. If user wants to use multiple CPUs, parameter –coresUsage can be set as all or any number of cores the machine has.