IberRDI is an evaluation task targeting the content-based text analysis of the research, development and innovation (RDI) production in Spanish language. The task aims at encouraging Natural Language Processing (NLP) groups to process technical and scientific texts and to tackle the challenges encountered in this type of data. We focus on document from biomedical domains, but the tools and techniques can be expected to be useful for other domains.

The main goal of the task is to evaluate the quality of document similarity metrics through two different subtasks:

  1. Proposing a document representation and use it to compute similarities between documents from a homogeneous collection, i.e. texts from the same corpus, e.g. scientific papers
  2. Computing semantic similarities between heterogeneous texts, i.e. documents from different corpora, e.g. scientific papers and patents.