Graph Clustering System (GCS)

If there are some problems with the page display, you can change your browser and try it (Google Chrome is recommended).
If you are interested in GCS, please be sure to read the supporting document "Supplemental document".

GCS

The Graph Clustering System (GCS) is a free platform that can provide users with graph clustering analysis, change-point detection and curve fitting based on the Clustering-Segmented Autoregressive Sigmoid (CSAS) model. This platform not only reproduces the data analysis results based on multiple datasets in the paper, but also provides users with corresponding data analysis tools. For more details about the data and code in the paper, please visit Meiqian-Chen/GraphCpClust.

Paper

Abstract:At the end of April 20, 2020, there were only a few new COVID-19 cases remaining in China, whereas the rest of the world had shown increases in the number of new cases. It is of extreme importance to develop an efficient statistical model of COVID-19 spread, which could help in the global fight against the virus. We propose a clustering-segmented autoregressive sigmoid (CSAS) model to explore the space-time pattern of the log-transformed infectious count. Four key characteristics are included in this CSAS model, including unknown clusters, change points, stretched S-curves, and autoregressive terms, in order to understand how this outbreak is spreading in time and in space, to understand how the spread is affected by epidemic control strategies, and to apply the model to updated data from an extended period of time. We propose a nonparametric graph-based clustering method for discovering dissimilarity of the curve time series in space, which is justified with theoretical support to demonstrate how the model works under mild and easily verified conditions. We propose a very strict purity score that penalizes overestimation of clusters. Simulations show that our nonparametric graph-based clustering method is faster and more accurate than the parametric clustering method regardless of the size of data sets. We provide a Bayesian information criterion (BIC) to identify multiple change points and calculate a confidence interval for a mean response. By applying the CSAS model to the collected data, we can explain the differences between prevention and control policies in China and selected countries.

The clustering results that are shown in the paper for the log-transformed infection counts in China and in 33 selected countries.

The estimates of change points and fittings that are shown in the paper for the log-transformed infection counts in China and in 33 selected countries.

COVID-19 datasets

This part shows graph clustering analysis, change-point detection and curve fitting for the three COVID-19 data sets (Our world in Data, WHO, Wuhan-2019-nCoV). Only the last dataset, Wuhan-2019-nCoV, is analyzed in our paper.

Graph Clustering

You can select different time intervals and datasets, and get the display of clustering, change-points and fitting results. It is worth noting that GCS will perform log-transformed on infectious counts before the analysis; In the terms of returned results, the country names are replaced with iso Alpha-2 code . For more details please see Supplemental document.

    Change-Point and Fitting

    Each curve shows the log-transformed of the sum of infectious counts of the countries included in the corresponding cluster.

      Tools

      This part provides the corresponding tools, where you can upload the data by yourself, and perform graph clustering analysis, change-point detection and curve fitting based on your uploaded file.

      Graph Clustering

        Plaese upload your file (.csv)
        HELP(Description about uploading CSV file)
        Get the Search Code

        Change-Point and Fitting

          Plaese upload your file (.csv)
          HELP(Description about uploading CSV file)
          Get the Search Code

          Copyright statement

          The copyright of this platform belongs to the GCS Team. GCS is free for all registered users, but it cannot be used for commercial purposes. If you use the GCS for scientific research or other regular uses, please be sure to cite our relevant paper.

          Support

          The research is partially supported by a grant (No. RGPIN-2016-05694) from the Natural Sciences and Engineering Research Council of Canada, and also partially supported by a grant (No. 71871149) from the Natural Science Foundation of China.

          Team

          (In no particular order)

          Xiaoping Shi <xiaoping.shi@ubc.ca> Irving K. Barber School of Arts and Sciences,
          University of British Columbia,
          Kelowna, BC V1V 1V7, Canada.

          Meiqian Chen <mqchen@stu.scu.edu.cn> Center for Network Big Data and Decision-Making,
          Business School, Sichuan University,
          Chengdu, China.

          Yucheng Dong <ycdong@scu.edu.cn> Center for Network Big Data and Decision-Making,
          Business School , Sichuan University,
          Chengdu, China.

          Calyampudi Radhakrishna Rao <crr1@psu.edu> Department of Biostatistics, University at Buffalo,
          The State University of New York, Buffalo,
          NY 14221-3000; C.R. Rao Advanced Institute of
          Mathematics,Statistics, and Computer Science,
          Hyderabad 500046, India.

          Author

          Maintainer: Wenhao Lin <virtuallwh@163.com> and Meiqian Chen <mqchen@stu.scu.edu.cn>
          Authors: Xiaoping Shi, Meiqian Chen, and Yucheng Dong

          Version 1.0