All examples By author By category About

Thanaporn-sk

Instance Similarity III

DBpedia has a strong unbalancement in term of amounts, between classes and instances. For this reason, visualization techniques like Linked Data Maps, need to extends the DBpedia hierarchical ontology with more sub-groups. Clustering algorithms can be used in order to provide additional clusters to ontological classes. This process allows to enrich the hierarchical ontology creating more specific groups of instances.

Different criteria can be used in the clustering depending on the purpose of the visualization. This experiment has the aim of helping developers and experts of Semantic Web technologies in the exploration of DBpedia, hence the criterium we choose is based on the structure/description of the instances in term of predicates.

Given a certain class C, the number of predicates used by its instances is a limited number N. Hence, an instance can be described as binary vector stating which predicates the instance has. The overall set of vectors is then used for clustering process.

This experiment extends the previous one by coloring the instances of a certain cluster with the same color using the d3 category20 scale (Since colors are limited they are used multiple times). Differently from the previous example, it is possible to change the clustering metric (e.g., Euclidean, Manhattan and Max Distance) and the threshold setting the stop criterium of the clustering.

By changing the threshold value, it is possible to set the clustering stop criterium and see how the hierarchical clustering algorithm groups instances together. Lower the threshold, higher the number of clusters.

forked from fabiovalse's block: Instance Similarity III