- Introduction
- Features
- Requirements
- Build and installation
- Customization and contribution
- Notable users
- How to cite
KonText is an advanced corpus query interface and corpus data integration platform built around corpus search engine Manatee-open. It is written in Python 3 and TypeScript and it runs on any major Linux distribution. The development is maintained by the Institute of the Czech National Corpus.
- fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
- simple and advanced query types
- advanced CQL editor with syntax highlighting and attribute recognition
- interactive PoS tag composing tool for positional and key-value tagsets
- customizable query suggestions and simple type query refinement (e.g. for homonym disambiguation)
- support for spoken corpora
- defined text segments can be played back as audio
- KWIC detail with easily distinguishable speeches
- rich concordance view options and tools
- any positional attribute can be set as primary
- multiple ways how to display other attributes
- user-defined line groups - filtering, reviewing groups ratios
- tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)
- rich subcorpus-related functionality
- a subcorpus can be either private or published
- text types metadata can be gradually refined to a specific subcorpus ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
- frequency distribution
- univariate
- positional attributes (including tuples of multiple attributes per token)
- structural attributes
- multivariate distribution (2 dimensions) for both positional and structural attributes
- univariate
- collocation analysis
- persistent URLs - any result page can be easily shared even if the original query is megabytes long
- access to previous queries, named queries
- convenient corpus access
- finding corpus by a keyword (tag), size, description
- adding corpus to favorites (incl. subcorpora, aligned corpora)
- saving result to Excel, CSV, XML, TXT
- integrability with existing information systems
- modern client-side application (written in TypeScript, event stream architecture, React components, extensible)
- server-side written as a WSGI application with fully decoupled background concordance/frequency/collocation calculation (using an integrated worker server)
- modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
- Python 3.6 (or newer):
- Manatee corpus search engine - version 2.167.8 and onwards
- a key-value storage
- a task queue - Rq (recommended), Celery task queue (supported)
- HTTP proxy server
KonText provides a script for automatic installation to an existing Ubuntu system. The easiest way to install KonText is to create an LXC/LXD container, clone the repository there and run the script. On a decently fast network, the whole process takes only a couple of seconds. Please refer to the doc/INSTALL.md file for details.
Please refer to our Wiki.
- Institute of the Czech National Corpus
- LINDAT/CLARIAH-CZ
- CLARIN-PL
- CLARIN-SI
- Інститут української
- Serbski Institut (API version of KonText)
Tomáš Machálek (2020) - KonText: Advanced and Flexible Corpus Query Interface
@inproceedings{machalek-2020-kontext,
title = "{K}on{T}ext: Advanced and Flexible Corpus Query Interface",
author = "Mach{\'a}lek, Tom{\'a}{\v{s}}",
booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://www.aclweb.org/anthology/2020.lrec-1.865",
pages = "7003--7008",
language = "English",
ISBN = "979-10-95546-34-4",
}