Use of prompt-based learning for code-mixed and code-switched text classification

Udawatta, Pasindu; Udayangana, Indunil; Gamage, Chathulanka; Shekhar, Ravi; Ranathunga, Surangika

doi:10.1007/s11280-024-01302-2

Use of prompt-based learning for code-mixed and code-switched text classification

Open access
Published: 09 September 2024

Volume 27, article number 63, (2024)
Cite this article

You have full access to this open access article

Download PDF

World Wide Web Aims and scope Submit manuscript

Use of prompt-based learning for code-mixed and code-switched text classification

Download PDF

2520 Accesses
2 Citations
Explore all metrics

Abstract

Code-mixing and code-switching (CMCS) are prevalent phenomena observed in social media conversations and various other modes of communication. When developing applications such as sentiment analysers and hate-speech detectors that operate on this social media data, CMCS text poses challenges. Recent studies have demonstrated that prompt-based learning of pre-trained language models outperforms full fine-tuning across various tasks. Despite the growing interest in classifying CMCS text, the effectiveness of prompt-based learning for the task remains unexplored. This paper presents an extensive exploration of prompt-based learning for CMCS text classification and the first comprehensive analysis of the impact of the script on classifying CMCS text. Our study reveals that the performance in classifying CMCS text is significantly influenced by the inclusion of multiple scripts and the intensity of code-mixing. In response, we introduce a novel method, Dynamic+AdapterPrompt, which employs distinct models for each script, integrated with adapters. While DynamicPrompt captures the script-specific representation of the text, AdapterPrompt emphasizes capturing the task-oriented functionality. Our experiments on Sinhala-English, Kannada-English, and Hindi-English datasets for sentiment classification, hate-speech detection, and humour detection tasks show that our method outperforms strong fine-tuning baselines and basic prompting strategies.

Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

Article 02 July 2022

Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning

Transfer Learning for Detecting Hateful Sentiments in Code Switched Language

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Code-mixing involves borrowing words from one language and incorporating them into another without affecting the context [1, 2]. Code-switching, or language alternation, occurs when individuals alternate between two or more languages within a single conversation or situation [3]. In the context of code-mixed and code-switched (CMCS) text, we distinguish two subtypes: (1) text comprising words that alternate between two languages, and (2) text transitioning from one script to another by substituting letters in a predictable manner, known as Transliteration [4].

Code-mixing and code-switching are intricate phenomena of linguistic behaviour, characterized by the intentional or spontaneous alternation of languages within a single discourse. Another characteristic of CMCS data is lexical borrowing, where words or phrases from one language are used in another. Grammatical hybridity [5], a distinct feature of CMCS, results in blending grammatical structures from different languages. Furthermore, CMCS is influenced by linguistic, social, and cultural constraints, leading to a specific contextual framework.

CMCS is commonly observed in online conversations. A thorough understanding of CMCS data is pivotal for effective communication, advertising, sentiment analysis, and fostering inclusivity across language boundaries. However, the inherent characteristics of CMCS data introduce unique challenges to NLP systems. In particular, the inclusion of multiple scripts and lexical patterns and the potential misidentification of transliterated tokens pose challenges even to modern Natural Language Processing (NLP) systems when processing such text. These challenges are particularly pronounced when working with low-resource languages [6, 7].

In recent years, the domain of NLP has witnessed remarkable advancements, notably propelled by the emergence of pre-trained language models (PLMs) [8, 9]. These PLMs have been trained on extensive datasets, preserving a task-agnostic stance regarding the specific tasks for which they will be later used. To leverage the extensive knowledge embedded in PLMs for diverse NLP tasks, the PLM has to be fine-tuned with task-specific data [10]. This “pre-train and fine-tune” paradigm has been able to activate and harness the comprehensive knowledge within PLMs, leading to very promising results across various downstream tasks such as text classification and named entity recognition [10, 11]. On the negative side, this paradigm faces challenges due to the disparity between pre-training and fine-tuning objectives, leading to inefficiencies in utilizing PLMs across diverse tasks, as they may be unstable in low-resource settings, and less transferable to new tasks after fine-tuning [10,11,12,13].

Prompt-based learning has recently been demonstrated to yield promising results compared to full fine-tuning of PLMs for many downstream tasks [13], even in low-resource scenarios [14]. This paradigm involves redefining downstream tasks using textual prompts, encompassing both prompt engineering and answer engineering [11]. In contrast to fine-tuning, prompt-based learning leverages the existing knowledge of PLMs by redefining downstream tasks as pre-training objectives [10, 11, 15]. This removes the need for extensive parameter updates in PLMs, thus preserving their transferability across various tasks. Prompt-based learning has been extended to incorporate pre-trained multilingual language models (PMLMs) as well, enabling experimentation in languages beyond English [16,17,18].

Existing research on CMCS text classification mainly focuses on the full fine-tuning of PMLMs for downstream tasks [6, 19]. On the other hand, while prompt-based learning has shown success over full fine-tuning for monolingual text, its application to CMCS data has not been explored. Given that prompt-based learning relies on textual prompts, designing effective prompts for CMCS text remains an open question. In other words, a prompt formulated in one language might not be suitable for effectively classifying CMCS data. The absence of multilingual prompts poses a challenge in inducing knowledge from PMLMs effectively, and the potential misidentification of transliterated tokens adds further complexity to accurate classification. These challenges are even more pronounced for low-resource languages. Therefore, addressing these unique challenges is crucial for advancing CMCS text classification through prompt-based learning.

In this study, we focus on prompt-based learning for CMCS text classification. To the best of our knowledge, we believe that we are the first to explore prompt-based learning for CMCS text classification. Therefore, we first delve into the challenges surrounding CMCS text classification and the intricacies introduced by the presence of multiple scripts within a single text. Our experiments unveil that the performance of prompt-based CMCS text classification is influenced by the inclusion of multiple scripts and the intensity of code-mixing.

In response to the aforementioned challenges, we propose a novel methodology named Dynamic+AdapterPrompt. This approach employs distinct models for each script to generate script-specific representations by considering the script of the input sentence (DynamicPrompt). Additionally, it effectively captures task-specific representations necessary for the respective CMCS classification tasks through the utilization of adapters (AdapterPrompt). This combined approach leverages the benefits of both adapters and dynamic script considerations.

We have conducted extensive experiments across Sinhala-English, Kannada-English, and Hindi-English datasets, for the tasks of sentiment classification, hate-speech detection, and humour detection. It is noteworthy that Sinhala and Kannada are categorized as low-resource languages [20]. The outcomes demonstrate that our novel approach, Dynamic+AdapterPrompt, outperforms the existing methodologies: full fine-tuning, adapter-based fine-tuning, and conventional prompt-based learning techniques.

To summarize, the key contributions of this paper are as follows:

We present an extensive study on prompt-based learning for CMCS text classification and the first comprehensive exploration of the impact of the script on CMCS text classification.
We introduce a novel prompt tuning approach for CMCS text classification termed Dynamic+AdapterPrompt that provides script-specific and task-specific representations, to address the intricacies introduced by the inclusion of multiple scripts in CMCS data.

2 Related work

In this section, we delve into three key areas: prompt-based learning, adapter-based fine-tuning of PLMs, and the challenges and advancements in CMCS text classification.

2.1 Prompt-based learning

Until recently, full fine-tuning, also known as vanilla fine-tuning, was the predominant method for adapting PLMs to downstream tasks [13, 21, 22]. In full fine-tuning, all the parameters of the PLM are trained for an underlying downstream task, which demands a significant amount of computational resources. Full fine-tuning also struggles in fully exploiting the linguistic knowledge acquired during pre-training, due to the disparity between the objectives of pre-training and fine-tuning stages [10, 23, 24]. While pre-training typically encompasses self-supervised tasks such as masked language modelling, full fine-tuning has to use task-specific training objectives (e.g. classification, sequence labelling, or generation). Prompt-based learning aims to bridge this gap between pre-training and fine-tuning objectives. In other words, prompt-based learning reformulates downstream tasks to be similar to training objectives used during PLM pre-training [11]. For encoder-based models that use a masked language modelling objective, one such reformulation technique is to convert the downstream task into a cloze-style format, as illustrated in Figure 1.

Prompt-based learning primarily comprises three key components: the prompt, the PLM, and the verbalizer [11]. As depicted in Figure 1, prompt engineering involves the selection of a prompt template for a downstream task. Early research used manually designed human-readable prompts, referred to as manual or discrete prompts [22, 23, 25]. Subsequent studies have shifted focus towards soft prompts, also known as continuous prompts, which are optimized during training for specific downstream tasks [22, 23, 25]. Answer engineering refers to the selection of the verbalizer. The verbalizer is the component that maps the predicted mask token of the PLM into the intended label [26] as illustrated in Figure 1. Verbalizers that are human-readable are denoted as discrete verbalizers, whereas soft verbalizers undergo optimization during the training process. Several studies have explored designing suitable verbalizers for downstream tasks, utilizing both discrete and soft tokens [12, 16, 27]. The primary aim of these studies has been to broaden the coverage of the answer space of the verbalizer for each respective label. The effectiveness of this pipeline is significantly determined by prompt engineering and answer engineering [28].

2.2 Adapter-based fine-tuning of PLMs

Adapters are compact trainable modules that can be integrated into transformer layers. They provide a lightweight fine-tuning alternative to the full fine-tuning approach [29]. Houlsby [29] and Peiffer [30] are the two adapter architectures that are commonly used. The key distinction between these two is that the Houlsby adapter employs two down- and up-projection modules, whereas the Pfeiffer adapter utilizes only one module. Adapters can be generally categorized into two categories: task adapters, which learn task-specific representations, and language adapters, which learn language-specific representations [30]. Typically, language adapters are used in conjunction with task adapters [6, 31]. Extensive research has been conducted on adapters as a parameter-efficient fine-tuning method for various tasks. In Rathnayake et al. [6], Sinhala-English CMCS text classification was performed employing different combinations of adapters, yielding improved results compared to full fine-tuning with minimal parameter updates. Moreover, Rücklć et al. [32] demonstrated the benefits of adapters beyond lightweight fine-tuning. They observed a minimal impact on task performance when adapters were dropped from the lower layers of the PLM.

The application of adapters has proven to be beneficial for prompt-based learning as well. Karimi Mahabadi et al. [15] introduced a few-shot learning method utilizing a masked language modelling objective, and leveraged task-specific adapters as a prompt-free strategy. Their experimental results showcased the effectiveness of this technique in comparison to manual and soft prompts.

Smaller language models face difficulties with soft prompts, as discussed by Shah et al. [33]. Li and Liang [22], Reynolds and McDonell [34] suggest that, as the model size increases, the performance gap between prompt-based approaches and fine-tuning narrows, indicating that larger models tend to benefit more from fine-tuning. To enhance smaller language models’ effectiveness, Shah et al. [33] suggest using adapters in combination with soft prompts. Their novel approach shows promise in optimizing smaller models, achieving up to 98% of the performance of full fine-tuning.

2.3 CMCS text classification

Classifying CMCS text poses a significant challenge in NLP, largely due to the scarcity of annotated datasets, particularly in the context of low-resource languages. Despite these challenges, studies have been made in developing manually annotated CMCS text classification datasets for low-resource languages [2, 6, 19, 35, 36]. A range of Deep Learning (DL) approaches has been employed for classifying CMCS data. For instance, Chathuranga and Ranathunga [37], Kamble and Joshi [38] utilized techniques such as capsule networks, LSTM, and BiLSTM for CMCS text classification.

Currently, state-of-the-art performance in CMCS text classification is achieved using PMLMs [4, 6, 19, 31, 39,40,41,42,43]. However, Zhang et al. [44] showed that PMLMs are not perfectly code-switching compatible. When no training examples are provided (zero-shot), the observed performance of PLMs for CMCS-related tasks shows that these models are less effective compared to models that have been specifically trained for a task. Additionally, they exhibit limited learning capabilities in few-shot settings. Table 1 provides a summary of different PMLM approaches for CMCS text classification.

Table 1 Related Work in CMCS Text Classification

Use of prompt-based learning for code-mixed and code-switched text classification

Abstract

Similar content being viewed by others

Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning

Transfer Learning for Detecting Hateful Sentiments in Code Switched Language

Explore related subjects

1 Introduction

2 Related work

2.1 Prompt-based learning

2.2 Adapter-based fine-tuning of PLMs

2.3 CMCS text classification

3 Datasets

4 Baselines

4.1 Full fine-tuning (Full FT)

4.2 Adapter-based fine-tuning (A-B FT)

4.3 Prompt-based learning with soft prompt + soft verbalizer (SP+SV)

5 Experimental setup

6 Impact of script variation and code-mixing intensity on CMCS text classification

7 Optimizing prompt-based learning through script-based adaptations

7.1 AdapterPrompt

7.2 DynamicPrompt

7.3 Dynamic+AdapterPrompt

7.3.1 Dynamic+AdapterPrompt with shared adapters setting

7.3.2 Dynamic+AdapterPrompt with distinct adapters setting

8 Evaluation and analysis

8.1 Overall evaluation

8.2 Script-based analysis

8.3 Evaluating the efficacy of the architecture in Dynamic+AdapterPrompt

8.4 Error analysis

9 Conclusion and future work

Data Availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent for Publication

Ethical Approval and Consent to Participate

Additional information

Publisher's Note

Appendices

Appendix A: Dataset statistics

Appendix B: Hyperparameters

Appendix C: AdapterPrompt performance

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles