The main goal of the task is to evaluate the quality of document similarity metrics through two different subtasks:
- IberRDI-U: Proposing a document representation and use it to compute similarities between documents from a uniform (homogeneous) collection, i.e. texts from the same corpus, e.g. scientific papers
- IberRDI-H: Computing semantic similarities between heterogeneous texts, i.e. documents from different corpora, e.g. scientific papers and patents.
The goal of participating teams will be to propose a document representation and a metric to compare any two documents in this representation. The goal is to explore metrics that could work efficiently in different circumstances.
The task involve the analysis of three corpora from the Health sector, (which is one of the prioritized areas of the PTL (Villegas, 2017)) containing:
- Scientific publications
- Project proposals
- Patent applications
Aggregated measures based on each corpora will be used to evaluate subtask IberRDI-U. Aggregated measures based on pairs of corpora will be used to evaluate subtask IberRDI-H