Learning Similarity Functions for Event Identification using Support Vector Machines

Reuter T, Cimiano P (2011)
In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011).

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA
Abstract / Bemerkung
Every clustering algorithm requires a similarity measure, ideally optimized for the task in question. In this paper we are concerned with the task of identifying events in social media data and address the question of how a suitable similarity function can be learned from training data for this task. The task consists essentially in grouping social media documents by the event they belong to. In order to learn a similarity measure using machine learning techniques, we extract relevant events from last.fm and match the unique machine tags for these events to pictures uploaded to Flickr, thus getting a gold standard were each picture is assigned to its corresponding event. We evaluate the similarity measure with respect to accuracy on the task of assigning a picture to its correct event. We use SVMs to train an appropriate similarity measure and investigate the performance of different types of SVMs (Ranking SVMs vs. Standard SVMs), different strategies for creating training data as well as the impact of the amount of training data and the kernel used. Our results show that a suitable similarity measure can be learned from a few examples only given a suitable strategy for creating training data. We also show that i) Ranking SVMs can learn from fewer examples, ii) are more robust compared to standard SVMs in the sense that their performance does not vary significantly for different sizes and samples of training data and iii) are not as prone to overfitting as standard SVMs.
Stichworte
SupportVector Machine; data mining.; weight adjustment; event identification; similarity function; machine learning; Clustering/classification
Erscheinungsjahr
2011
Titel des Konferenzbandes
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011)
Konferenz
International Conference on Knowledge Discovery and Information Retrieval (KDIR2011)
Page URI
https://pub.uni-bielefeld.de/record/2310522

Zitieren

Reuter T, Cimiano P. Learning Similarity Functions for Event Identification using Support Vector Machines. In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011). 2011.
Reuter, T., & Cimiano, P. (2011). Learning Similarity Functions for Event Identification using Support Vector Machines. Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011)
Reuter, Timo, and Cimiano, Philipp. 2011. “Learning Similarity Functions for Event Identification using Support Vector Machines”. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011).
Reuter, T., and Cimiano, P. (2011). “Learning Similarity Functions for Event Identification using Support Vector Machines” in Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011).
Reuter, T., & Cimiano, P., 2011. Learning Similarity Functions for Event Identification using Support Vector Machines. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011).
T. Reuter and P. Cimiano, “Learning Similarity Functions for Event Identification using Support Vector Machines”, Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011), 2011.
Reuter, T., Cimiano, P.: Learning Similarity Functions for Event Identification using Support Vector Machines. Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011). (2011).
Reuter, Timo, and Cimiano, Philipp. “Learning Similarity Functions for Event Identification using Support Vector Machines”. Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR2011). 2011.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T08:57:52Z
MD5 Prüfsumme
cd12fbd5a18e26fd580a00abd9b636b5


Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar