GCS

The Graph Clustering System (GCS) is a free platform that can provide users with graph clustering analysis, change-point detection and curve fitting based on the Clustering-Segmented Autoregressive Sigmoid (CSAS) model. This platform not only reproduces the data analysis results based on multiple datasets in the paper, but also provides users with corresponding data analysis tools. For more details about the data and code in the paper, please visit Meiqian-Chen/GraphCpClust.

Paper

Exploring the space-time pattern of log-transformed infectious count of COVID-19: a clustering-segmented autoregressive sigmoid model

Abstract：At the end of April 20, 2020, there were only a few new COVID-19 cases remaining in China, whereas the rest of the world had shown increases in the number of new cases. It is of extreme importance to develop an efficient statistical model of COVID-19 spread, which could help in the global fight against the virus. We propose a clustering-segmented autoregressive sigmoid (CSAS) model to explore the space-time pattern of the log-transformed infectious count. Four key characteristics are included in this CSAS model, including unknown clusters, change points, stretched S-curves, and autoregressive terms, in order to understand how this outbreak is spreading in time and in space, to understand how the spread is affected by epidemic control strategies, and to apply the model to updated data from an extended period of time. We propose a nonparametric graph-based clustering method for discovering dissimilarity of the curve time series in space, which is justified with theoretical support to demonstrate how the model works under mild and easily verified conditions. We propose a very strict purity score that penalizes overestimation of clusters. Simulations show that our nonparametric graph-based clustering method is faster and more accurate than the parametric clustering method regardless of the size of data sets. We provide a Bayesian information criterion (BIC) to identify multiple change points and calculate a confidence interval for a mean response. By applying the CSAS model to the collected data, we can explain the differences between prevention and control policies in China and selected countries.

The clustering results that are shown in the paper for the log-transformed infection counts in China and in 33 selected countries.

The estimates of change points and fittings that are shown in the paper for the log-transformed infection counts in China and in 33 selected countries.

COVID-19 datasets

This part shows graph clustering analysis, change-point detection and curve fitting for the three COVID-19 data sets (Our world in Data, WHO, Wuhan-2019-nCoV). Only the last dataset, Wuhan-2019-nCoV, is analyzed in our paper.

Graph Clustering

You can select different time intervals and datasets, and get the display of clustering, change-points and fitting results. It is worth noting that GCS will perform log-transformed on infectious counts before the analysis; In the terms of returned results, the country names are replaced with iso Alpha-2 code . For more details please see Supplemental document.

Time Interval Select datasets

Change-Point and Fitting

Each curve shows the log-transformed of the sum of infectious counts of the countries included in the corresponding cluster.

Tools

This part provides the corresponding tools, where you can upload the data by yourself, and perform graph clustering analysis, change-point detection and curve fitting based on your uploaded file.

Graph Clustering

Change-Point and Fitting

Support

The research is partially supported by a grant (No. RGPIN-2016-05694) from the Natural Sciences and Engineering Research Council of Canada, and also partially supported by a grant (No. 71871149) from the Natural Science Foundation of China.

Graph Clustering System (GCS)

GCS

Paper

COVID-19 datasets

Tools

Copyright statement

Support

Team

Xiaoping Shi <xiaoping.shi@ubc.ca> Irving K. Barber School of Arts and Sciences,
University of British Columbia,
Kelowna, BC V1V 1V7, Canada.

Meiqian Chen <mqchen@stu.scu.edu.cn> Center for Network Big Data and Decision-Making,
Business School, Sichuan University,
Chengdu, China.

Yucheng Dong <ycdong@scu.edu.cn> Center for Network Big Data and Decision-Making,
Business School , Sichuan University,
Chengdu, China.

Calyampudi Radhakrishna Rao <crr1@psu.edu> Department of Biostatistics, University at Buffalo,
The State University of New York, Buffalo,
NY 14221-3000; C.R. Rao Advanced Institute of
Mathematics,Statistics, and Computer Science,
Hyderabad 500046, India.

Author

GCS

Paper

COVID-19 datasets

Tools

Copyright statement

Support

Team

Xiaoping Shi <xiaoping.shi@ubc.ca> Irving K. Barber School of Arts and Sciences, University of British Columbia, Kelowna, BC V1V 1V7, Canada.

Meiqian Chen <mqchen@stu.scu.edu.cn> Center for Network Big Data and Decision-Making, Business School, Sichuan University, Chengdu, China.

Yucheng Dong <ycdong@scu.edu.cn> Center for Network Big Data and Decision-Making, Business School , Sichuan University, Chengdu, China.

Calyampudi Radhakrishna Rao <crr1@psu.edu> Department of Biostatistics, University at Buffalo, The State University of New York, Buffalo, NY 14221-3000; C.R. Rao Advanced Institute of Mathematics,Statistics, and Computer Science, Hyderabad 500046, India.

Author

Xiaoping Shi <xiaoping.shi@ubc.ca> Irving K. Barber School of Arts and Sciences,
University of British Columbia,
Kelowna, BC V1V 1V7, Canada.

Meiqian Chen <mqchen@stu.scu.edu.cn> Center for Network Big Data and Decision-Making,
Business School, Sichuan University,
Chengdu, China.

Yucheng Dong <ycdong@scu.edu.cn> Center for Network Big Data and Decision-Making,
Business School , Sichuan University,
Chengdu, China.

Calyampudi Radhakrishna Rao <crr1@psu.edu> Department of Biostatistics, University at Buffalo,
The State University of New York, Buffalo,
NY 14221-3000; C.R. Rao Advanced Institute of
Mathematics,Statistics, and Computer Science,
Hyderabad 500046, India.