TiQuAD: Tigrinya Question Answering Dataset

This repository accompanies our ACL 2023 paper "Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya". Selected for the Outstanding Paper Award.

Overview

Question-Answering (QA) has seen significant advances recently, achieving near human-level performance over some benchmarks. However, these advances focus on high-resourced languages such as English, while the task remains unexplored for most other languages, mainly due to the lack of annotated datasets. This work presents TiQuAD, the first human annotated QA dataset for Tigrinya, an East African language. The dataset contains 10.6K question-answer pairs (6.5K unique questions) spanning 572 paragraphs extracted from 290 news articles on various topics. The paper presents the dataset construction method, which is applicable to building similar resources for related languages.

In addition to the gold-standard TiQuAD, we develop Tigrinya-SQuAD, a silver dataset used as additional training resource and created by machine translating and filtering the English SQuAD v1.1 dataset.

We present comprehensive experiments and analyses of several resource-efficient approaches to QA, including monolingual, cross-lingual, and multilingual setups, along with comparisons against machine-translated silver data. Our strong baseline models reach 81% in the F1 score, while the estimated human performance is 92%, indicating that the benchmark presents a good challenge for future work.

Datasets

1. TiQuAD v1

Human annotated question-answering dataset with <Paragraph, Question, Answer> entries.

📥 Download via HuggingFace Hub

Split	Articles	Paragraphs	Questions	Answers
Train	205	408	4,452	4,454
Dev	43	76	934	2,805
Test*	42	96	1,122	3,378
Total	290	572	6,508	10,637

Data Statistics of TiQuAD: The number of Articles, Paragraphs, Questions, and Answers. The dataset is partitioned by articles.

Note: Test set is not publicly available to maintain evaluation integrity. See TiQuAD Test Set Access section below.

TiQuAD Dataset Construction Pipeline. The five-stage process includes article collection, context selection, question-answer pair annotation, additional answers annotation for evaluation sets, and quality-focused post-processing.

2. Tigrinya-SQuAD v1 (Extra Training Data)

The training split of the English SQuAD 1.1 dataset machine translated and filtered into Tigrinya.

📥 Download via HuggingFace Hub

Split	Articles	Paragraphs	Questions	Answers
Train	442	17,391	46,737	46,737

Data Statistics of Tigrinya-SQuAD: The number of Articles, Paragraphs, Questions, and Answers in the Tigrinya translation of SQuAD v1.1 training set.

Loading TiQuAD and Tigrinya-SQuAD Datasets

Install the datasets library installed by running pip install -U datasets in the terminal.

Make sure the latest datasets library is installed as older versions may not properly load the data.

Then pull and load the dataset using Python, as follows:

TiQuAD:

from datasets import load_dataset

# Load TiQuAD
tiquad = load_dataset("fgaim/tiquad")
print(tiquad)

Output:

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'context', 'answers', 'article_title', 'context_id'],
        num_rows: 4452
    })
    validation: Dataset({
        features: ['id', 'question', 'context', 'answers', 'article_title', 'context_id'],
        num_rows: 934
    })
})

Tigrinya-SQuAD:

from datasets import load_dataset

# Load Tigrinya-SQuAD
tigrinya_squad = load_dataset("fgaim/tigrinya-squad")
print(tigrinya_squad)

Output:

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'context', 'answers', 'article_title', 'context_id'],
        num_rows: 46737
    })
})

A sample entry from TiQuAD validation set:

{
    "id": "5dda7d3e-f76f-4500-a3af-07648a1afa51",
    "question": "ኣልታሕሪር ናይ ቅድም ስማ እንታይ ኔሩ?",
    "context": "ሃብቶም ክብረኣብ (ሞጀ)\nሞጀ ኣብ 80’ታትን ኣብ ፈለማ 90’ታትን ካብቶም ናይ ክለብ ኣልታሕሪር ንፉዓት ተኸላኸልቲ ነይሩ፣ ብድሕሪ’ዚ’ውን ኣብ ውድድራት ሓይልታት ምክልኻል ንንውሕ ዝበለ ዓመታት ከም ኣሰልጣኒ ክለብ በኒፈር ኮይኑ ዝነጥፍ ዘሎ ገዲም ተጻዋታይን ኣሰልጣንን’ዩ። ምሉእ ስሙ ሃብቶም ክብርኣብ (ሞጀ) እዩ። ሞጀ ብ1968 ኣብ ኣስመራ ተወሊዱ ዓብዩ። ንሱ ካብ ንኡስ ዕድሚኡ ብኩዕሶ ጨርቂ ጸወታ ጀሚሩ። ብድሕሪኡ ብደረጃ ምምሕዳር ኣብ ዝካየድ ዝነበረ ናይ ‘ቀበሌ’ ጸወታታት ምስ ጸሓይ በርቂ ምስ እትበሃል ጋንታ ተጻዊቱ። ኣብ 1987 ምስ ዳህላክ እትበሃል ዝነበረት ጋንታ ንሓደ ዓመት ድሕሪ ምጽዋቱ ከኣ ኣብ መወዳእታ ወርሒ 1987 ናብ ጋንታ ፖሊስ (ናይ ሎሚ ኣልታሕሪር) ብምጽንባር ክሳብ 1988 ተጻዊቱ። ምስታ ናይ ቅድሚ ናጽነት ጋንታ ፖሊስ ኣብ ዝተጻወተሉ ሰለስተ ዓመታት ከኣ ዝተፈላለየ ዓወታት ተጐናጺፉ ዋናጩ ከልዕል በቒዑ’ዩ። ድሕሪ ናጽነት ስም ክለቡ ኣልታሕሪር ምስ ተቐየረ፣ ሞጀ ናይታ ክለብ ተጻዋታይ ኮይኑ ውድድሩ ቀጺሉ። ኣብ መጀመርታ ናጽነት (1991) ኣብ ዝተኻየደ ናይ ፋልማይ ዋንጫ ስውኣት መን ዓተረ ውድድር ሞጀ ምስ ክለቡ ኣልታሕሪር ዋንጫ ከልዕል በቒዑ። ብዘይካ’ዚ ኣብ 1992 ብብሉጻት ተጻወትቲ ተተኽቲኻ ዝነበረት ኣልታሕሪር ናይ ፋልማይ ዋንጫ ናጽነት ከምኡ’ውን ሻምፕዮን ክትከውን ከላ ሞጀ ኣባል’ታ ጋንታ ነይሩ። ምስ ክለብ ኣልታሕሪር ፍቕርን ሕውነትን ዝመልኦ ምቁር ናይ ጸወታ ዘመን ከም ዘሕለፈ ዝጠቅስ ሞጀ፣ ምስ ኣልታሕሪር ናብ ከም ሱዳንን ኢትዮጵያን ዝኣመሰላ ሃገራት ብምጋሽ ኣህጉራዊ ጸወታታት’ውን ኣካይዱ’ዩ።",
    "answers": [
        {"answer_start": 414, "text": "ፖሊስ"},
        {"answer_start": 414, "text": "ፖሊስ"},
        {"answer_start": 410, "text": "ጋንታ ፖሊስ"},
    ],
    "article_title": "ሃብቶም ክብረኣብ (ሞጀ)",
    "context_id": "17.1",
}

Note: Samples in the validation and test sets of TiQuAD have up to three answers labeled by different annotators.

TiQuAD Test Set Access

To maintain evaluation integrity and avoid data contamination, the TiQuAD test set is not publicly available.

Researchers looking to access the test set for evaluation purpose, please email the first author of the paper, with the following details:

Subject: TiQuAD Test Set Request
Your full name and affiliation
Research purpose and usage plan
Acknowledgment that the dataset will be used for evaluation only

We review requests to ensure legitimate research use while maintaining benchmark integrity.

Experimental Results

Pre-trained Language Models

Model	Layers	AH	Params	Lang.	PT Tigrinya
tielectra-small	12	4	14M	1	yes
tiroberta-base	12	12	125M	1	yes
afriberta_base	8	6	112M	11	yes
xlm-roberta-base	12	12	278M	100	no
xlm-roberta-large	24	16	560M	100	no

Training Datasets

MT: Tigrinya-SQuAD (Machine Translated SQuAD v1.1 train set) — Tigrinya
Native: TiQuAD train set — Tigrinya
SQuAD: SQuAD v1.1 train set — English

Results of Models and Mix of Dataset

                                                            ╭───────────────────┬─────────────────╮
╭────┬─────────────────┬───────────────────┬────────┬───────┤    TiQuAD Dev     │  TiQuAD Test    │
│    │ Dataset         │ Model             │ Epochs │ Batch │   EM    │   F1    │  EM    │  F1    │
├────┼─────────────────┼───────────────────┼────────┼───────┼─────────┼─────────┼────────┼────────┤
│  1 │ MT              │ tielectra-small   │      3 │    16 │   38.54 │   46.04 │  39.25 │  48.36 │
│  2 │ MT              │ tiroberta-base    │      3 │    16 │   48.5  │   56.39 │  48.17 │  58.81 │
│  3 │ MT              │ afriberta_base    │      3 │    16 │   40.36 │   48.72 │  40.68 │  52.96 │
│  4 │ MT              │ xlm-roberta-base  │      3 │    16 │   51.71 │   59.64 │  53.17 │  62.61 │
│  5 │ MT              │ xlm-roberta-large │      3 │    16 │   59.85 │   67.06 │  61.55 │  70.85 │
│  6 │ Native          │ tielectra-small   │      5 │     8 │   36.19 │   43.06 │  28.81 │  37    │
│  7 │ Native          │ tiroberta-base    │      5 │     8 │   56.21 │   64.36 │  53.08 │  61.82 │
│  8 │ Native          │ afriberta_base    │      5 │     8 │   38.01 │   44.85 │  35.06 │  44.24 │
│  9 │ Native          │ xlm-roberta-base  │      5 │     8 │   56.53 │   65.37 │  55.75 │  65.49 │
│ 10 │ Native          │ xlm-roberta-large │      5 │     8 │   63.17 │   71.32 │  64.94 │  72.62 │
│ 11 │ MT+Native       │ tielectra-small   │      3 │    16 │   46.36 │   53.6  │  47.46 │  56.64 │
│ 12 │ MT+Native       │ tiroberta-base    │      3 │    16 │   62.42 │   70.12 │  62.18 │  70.42 │
│ 13 │ MT+Native       │ afriberta_base    │      3 │    16 │   52.68 │   59.38 │  47.37 │  58.35 │
│ 14 │ MT+Native       │ xlm-roberta-base  │      3 │    16 │   61.99 │   70.44 │  64.76 │  73.53 │
│ 15 │ MT+Native       │ xlm-roberta-large │      3 │    16 │   70.88 │   77.96 │  74.67 │  82.31 │
│ 16 │ SQuAD           │ tielectra-small   │      3 │    16 │    9.85 │   20.91 │   9.81 │  20.41 │
│ 17 │ SQuAD           │ tiroberta-base    │      3 │    16 │   10.71 │   20.88 │  10.88 │  20.69 │
│ 18 │ SQuAD           │ afriberta_base    │      3 │    16 │   20.24 │   32.05 │  20.52 │  32.95 │
│ 19 │ SQuAD           │ xlm-roberta-base  │      3 │    16 │   17.99 │   27.81 │  22.66 │  34.44 │
│ 20 │ SQuAD           │ xlm-roberta-large │      3 │    16 │   29.12 │   40.26 │  34.7  │  43.96 │
│ 21 │ SQuAD+MT        │ tielectra-small   │      3 │    16 │   37.69 │   46.06 │  39.07 │  49.07 │
│ 22 │ SQuAD+MT        │ tiroberta-base    │      3 │    16 │   51.28 │   59.25 │  51.12 │  60.75 │
│ 23 │ SQuAD+MT        │ afriberta_base    │      3 │    16 │   44.33 │   51.43 │  45.58 │  56.36 │
│ 24 │ SQuAD+MT        │ xlm-roberta-base  │      3 │    16 │   52.89 │   61.06 │  57.36 │  66.37 │
│ 25 │ SQuAD+MT        │ xlm-roberta-large │      3 │    16 │   61.03 │   67.75 │  61.91 │  71.05 │
│ 26 │ SQuAD+Native    │ tielectra-small   │      3 │    16 │   33.73 │   41.51 │  32.74 │  40.53 │
│ 27 │ SQuAD+Native    │ tiroberta-base    │      3 │    16 │   57.07 │   65.75 │  59.05 │  67.3  │
│ 28 │ SQuAD+Native    │ afriberta_base    │      3 │    16 │   51.93 │   59.66 │  51.38 │  62.13 │
│ 29 │ SQuAD+Native    │ xlm-roberta-base  │      3 │    16 │   62.42 │   69.95 │  63.07 │  71.76 │
│ 30 │ SQuAD+Native    │ xlm-roberta-large │      3 │    16 │   67.24 │   76.19 │  71.54 │  78.39 │
│ 31 │ SQuAD+MT+Native │ tielectra-small   │      3 │    16 │   45.72 │   53.4  │  47.73 │  57.1  │
│ 32 │ SQuAD+MT+Native │ tiroberta-base    │      3 │    16 │   65.2  │   71.88 │  62.53 │  71.08 │
│ 33 │ SQuAD+MT+Native │ afriberta_base    │      3 │    16 │   51.93 │   59.47 │  53.26 │  63.22 │
│ 34 │ SQuAD+MT+Native │ xlm-roberta-base  │      3 │    16 │   64.78 │   72.8  │  68.06 │  76.58 │
│ 35 │ SQuAD+MT+Native │ xlm-roberta-large │      3 │    16 │   72.59 │   79.66 │  74.13 │  81.39 │
╰────┴─────────────────┴───────────────────┴────────┴───────┴─────────┴─────────┴────────┴────────╯

The experiments on xlm-roberta-large were added after the paper was published. It outperforms other models mainly due to its larger size (parameters), showing successful transfer capability of fine-tuned multilingual models with minimal or zero exposure to the target language during pre-training.

TiQuAD Evaluation

We provide the official evaluation script evaluate-tiquad.py for computing TiQuAD benchmark scores. The script supports evaluation against both the HuggingFace dataset and local JSON files. Install dependencies by running pip install -U datasets numpy.

The script report the following metrics:

Exact Match (EM): Percentage of predictions that match ground truth exactly
Token-level F1: F1 score computed over tokens
Multi-reference handling: Max score across multiple reference answers

Predictions File Format

Your predictions file should be a JSON file with question IDs as keys and predicted answer texts as values:

{
  "5dda7d3e-...": "ጋንታ ፖሊስ",
  ...
}

Usage Examples

# Evaluate against HuggingFace dataset (specific split)
python evaluate-tiquad.py predictions.json --use-hf-dataset --split validation

# Evaluate against a local JSON file (TiQuAD/SQuAD format)
python evaluate-tiquad.py predictions.json --eval-set-path eval-set-v1.json

Add --verbose options to print out more details.

Sample Output:

Loading predictions from: predictions.json
Loading validation set from HF dataset...
Computing evaluation scores...

===================================
TiQuAD EVALUATION RESULTS
===================================
Exact Match (EM): 0.6542 (65.42%)
F1 Score:         0.7321 (73.21%)
Questions evaluated: 934
===================================

Citation

This work can be cited as follows:

@inproceedings{gaim-etal-2023-tiquad,
    title = "Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for {T}igrinya",
    author = "Fitsum Gaim and Wonsuk Yang and Hancheol Park and Jong C. Park",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.661",
    pages = "11857--11870",
}

Acknowledgments

Native Tigrinya speakers who contributed to the annotation process of TiQuAD
Hadas Ertra newspaper and Eritrean Ministry of Information (shabait.com) for source articles
The SQuAD team for the foundational work used as source for Tigrinya-SQuAD.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
README.md		README.md
evaluate-tiquad.py		evaluate-tiquad.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TiQuAD: Tigrinya Question Answering Dataset

Overview

Datasets

1. TiQuAD v1

2. Tigrinya-SQuAD v1 (Extra Training Data)

Loading TiQuAD and Tigrinya-SQuAD Datasets

TiQuAD Test Set Access

Experimental Results

Pre-trained Language Models

Training Datasets

Results of Models and Mix of Dataset

TiQuAD Evaluation

Predictions File Format

Usage Examples

Citation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

fgaim/TiQuAD

Folders and files

Latest commit

History

Repository files navigation

TiQuAD: Tigrinya Question Answering Dataset

Overview

Datasets

1. TiQuAD v1

2. Tigrinya-SQuAD v1 (Extra Training Data)

Loading TiQuAD and Tigrinya-SQuAD Datasets

TiQuAD Test Set Access

Experimental Results

Pre-trained Language Models

Training Datasets

Results of Models and Mix of Dataset

TiQuAD Evaluation

Predictions File Format

Usage Examples

Citation

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages