Run scGNN¶
Program scGNN.py is the main entrance of scGNN to impute and clustering. There are quite a few parameters to define to meet users’ requirements.
Required¶
- datasetName defines the folder of scRNA-Seq
- LTMGDir defines folder of the preprocessed LTMG output
- outputDir Output folder of the results
Optional: Hyperparameters¶
- EM-iteration defines the number of iteration, default is 10
- Regu-epochs defines epochs in Feature Autoencoder initially, default is 500
- EM-epochs defines epochs in Feature Autoencoder in the iteration, default is 200
- cluster-epochs defines epochs in the Cluster Autoencoder, default is 200
- k is k of the K-Nearest-Neighour Graph
- knn-distance distance type of building K-Nearest-Neighour Graph, supported type: euclidean/cosine/correlation (default: euclidean)
- GAEepochs Number of epochs to train in Graph Autoencoder
Optional: Performance¶
- quickmode whether or not to bypass the Cluster Autoencoder.
- useGAEembedding whether use Graph Autoencoder
- regulized-type is the regularized type: noregu/LTMG, default is to use LTMG
- alphaRegularizePara alpha in the manuscript, the intensity of the regularizer
- EMregulized-type defines the imputation regularizer type:noregu/Graph/Celltype, default: Celltype
- gammaImputePara defines the intensity of LTMG regularizer in Imputation
- graphImputePara defines the intensity of graph regularizer in Imputation
- celltypeImputePara defines the intensity of celltype regularizer in Imputation
- L1Para defines the intensity of L1 regularizer, default: 1.0
- L2Para defines the intensity of L2 regularizer, defualt: 0.0
- saveinternal whether output internal results for debug usage
Optional: Speed¶
- no-cuda defines devices in usage. Default is using GPU, add –no-cuda in command line if you only have CPU.
- coresUsage defines how many cores can be used. default: 1. Change this value if you want to use more.
Example:¶
CSV format¶
For CSV format, we need add –nonsparseMode
Without LTMG:
python -W ignore scGNN.py --datasetName GSE138852 --datasetDir ./ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode --nonsparseMode
(Optional) Using LTMG:
python -W ignore scGNN.py --datasetName GSE138852 --datasetDir ./ --LTMGDir ./ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode --nonsparseMode --regulized-type LTMG
10X format¶
Without LTMG:
python -W ignore scGNN.py --datasetName 481193cb-c021-4e04-b477-0b7cfef4614b.mtx --datasetDir liver/ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode
(Optional) Using LTMG:
python -W ignore scGNN.py --datasetName 481193cb-c021-4e04-b477-0b7cfef4614b.mtx --LTMGDir liver/ --datasetDir liver/ --outputDir outputdir/ --EM-iteration 2 --Regu-epochs 50 --EM-epochs 20 --quickmode --regulized-type LTMG
On these demo dataset using single cpu, the running time of demo codes is ~33min/26min. User should get exact same results as paper shown with full running time on single cpu for ~6 hours. If user wants to use multiple CPUs, parameter –coresUsage can be set as all or any number of cores the machine has.