SDOclust Evaluation Tests v2

DOI

SDOclust Evaluation Tests v2 conducted for the paper: Parameterization-Free Clustering with Sparse Data Observers Context and methodology SDOclust is a clustering extension of the Sparse Data Observers (SDO) algorithm. SDOclust uses data observers as graph nodes and cluster them considering connected components and local thresholding. Observers' labels are subsequently propagated to data points.  In this repository, SDOclust is evaluated with 235 datasets (both synthetic and real) taken from the literature about clustering evaluation, and compared with HDBSCAN, k-means--, CLASSIX, N2D (Deep Learning Clustering), Fuzzy Clustering, and Hierarchical Clustering algorithms. This repository is framed within the research on the following domains: algorithm evaluation, clustering, unsupervised learning, machine learning, data mining, data analysis. Datasets and algorithms can be used for experiment replication and for further clustering evaluation and comparison.    Technical details Experiments are conducted in Python 3. The file and folder structure is as follows:

[datasets] contains datasets as CSV files (last column is the label). [comparisons] contains boxplots and latex tables with algorithm comparisons summarized from the [results] folder. [results] contains CSV files with tables that collect algorithms' performances obtained from running the "run.py" script. [algorithms] contains scripts wrapping algorithm classes used and parameter adjustment phases. [utils] contains scripts for clustering validation and measurement of dataset propierties. "dependencies.sh" installs python dependencies. "run.py" runs evaluation experiments. "comparison.py" summarizes performances in TEX tables and boxplots. "LICENSE" file. "README.md" for further details, link to sources and instructions for reproducibility. License The CC-BY license applies to all data generated with MDCgen. All distributed code is under the MIT license.

Identifier
DOI https://doi.org/10.48436/rnf34-61z36
Related Identifier IsDerivedFrom https://github.com/CN-TU/pysdoclust
Related Identifier IsVersionOf https://doi.org/10.48436/0rpc3-7rh34
Metadata Access https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:rnf34-61z36
Provenance
Creator Iglesias Vazquez, Felix (ORCID: 0000-0001-6081-969X)
Publisher TU Wien
Publication Year 2024
Rights Creative Commons Attribution 4.0 International; MIT License; https://creativecommons.org/licenses/by/4.0/legalcode; https://opensource.org/licenses/MIT
OpenAccess true
Contact tudata(at)tuwien.ac.at
Representation
Resource Type Software
Version 2.0.0
Discipline Other