All examples By author By category About

fabiovalse

Instance Similarity III

DBpedia has a strong unbalancement in term of amounts between classes and instances. For this reason, it is necessary to extends the DBpedia ontology with more hierarchical levels. To this aim, clustering algorithms can be used.

Depending on the purpose of the visualization, different criteria can be used for the clustering. Since, this experiment has the aim of helping developers and experts of Semantic Web technologies in the exploration of DBpedia, a criterium based on the structure of the instances in term of predicates has been used.

Given a certain class, the number of predicates used by its instances is a limited number. Hence, an instance can be described as a binary vector stating which predicates the instance has. The overall set of vectors is then used for clustering process.

This experiment extends the previous one by coloring the instances of a certain cluster with the same color using the d3 category20 scale (Since colors are limited they are used multiple times). Differently from the previous example, it is possible to change the clustering metric (e.g., Euclidean, Manhattan and Max Distance) and the threshold defining the stop criterium of the clustering.

By changing the threshold value, it is possible to set the clustering stop criterium and see how the hierarchical clustering algorithm groups instances together. Lower the threshold, higher the number of clusters.