REDTab: A relation extraction dataset for knowledge extraction from web tables

Siffi Singh; Alham Fikri Aji; Gaurav Singh; Christos Christodoulopoulos

Publication

REDTab: A relation extraction dataset for knowledge extraction from web tables

By Siffi Singh, Alham Fikri Aji, Gaurav Singh, Christos Christodoulopoulos

2022

Download Copy BibTeX GitHub

Share

Download

Copy BibTeX

GitHub

Share

Relational web-tables are significant sources of structural information that are widely used for relation extraction and population of facts into knowledge graphs. To transform the webtable data into knowledge, we need to identify the relations that exist between column pairs. Currently, there are only a handful of publicly available datasets with relations annotated against natural web-tables. Most datasets are constructed using synthetic tables that lack valuable metadata information, or are limited in size to be considered as a challenging evaluation set. In this paper, we present REDTab, the largest natural-table relation extraction dataset. We have annotated ~9K tables and ~22K column pairs using crowd sourced annotators from MTurk, which has 50x larger number of column pairs than the existing human-annotated benchmark. Our test set is specially designed to be challenging as observed in our experiment results using TaBERT.

REDTab: A relation extraction dataset for knowledge extraction from web tables

Latest news

Work with us