SuMACC research project
Looking for named entities on the Internet is becoming more difficult, especially since the rapid increase of multimedia data online. Even if the actual methods for named entities detection in textual data are quite mature, the detection on diverse multimedia objects seems to be much more difficult to model. Because of this difficulty, the data needed to find robust identifiers is much higher. The high cost of multimedia data annotation limits indeed the usage of statistical methods which have proved to be effective. On the other hand, the concepts (or entities) that could be looked for appear in different ways depending on the support (audio, text or video support). Thus, the conception of generic methods constitutes a major scientific challenge in the field of multimedia detection. The SuMACC project suggests exploring original learning methods to detect multimedia entities by using specific detection patterns. The usage of those patterns offers a unified framework to express different rules of combination. In this context, we will suggest low supervised methods to estimate the entity's signature for each media. Furthermore, we will develop active learning and cross-media co-learning methods aiming to diminish considerably the effort of supervised learning. All these methods will be evaluated in the frame of the Wikio web-portal that offers the possibility of initial structuring of data and running an evaluation under real conditions. The SuMACC project will cover these topics in the frame of a Fondamental Research Project with a duration of 36 months led by Laboratoire Informatique d'Avignon (LIA - Université d'Avignon), EURECOM Laboratoire and the companies Syllabs and Wikio.