GitHub

Introduction

KonText is an advanced corpus query interface and corpus data integration platform built around corpus search engine Manatee-open. It is written in Python 3 and TypeScript and it runs on any major Linux distribution. The development is maintained by the Institute of the Czech National Corpus.

Features

fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
simple and advanced query types
- advanced CQL editor with syntax highlighting and attribute recognition
- interactive PoS tag composing tool for positional and key-value tagsets
- customizable query suggestions and simple type query refinement (e.g. for homonym disambiguation)
support for spoken corpora
- defined text segments can be played back as audio
- KWIC detail with easily distinguishable speeches
rich concordance view options and tools
- any positional attribute can be set as primary
- multiple ways how to display other attributes
- user-defined line groups - filtering, reviewing groups ratios
- tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)
rich subcorpus-related functionality
- a subcorpus can be either private or published
- text types metadata can be gradually refined to a specific subcorpus ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
frequency distribution
- univariate
  - positional attributes (including tuples of multiple attributes per token)
  - structural attributes
- multivariate distribution (2 dimensions) for both positional and structural attributes
collocation analysis
persistent URLs - any result page can be easily shared even if the original query is megabytes long
access to previous queries, named queries
convenient corpus access
- finding corpus by a keyword (tag), size, description
- adding corpus to favorites (incl. subcorpora, aligned corpora)
saving result to Excel, CSV, XML, TXT
integrability with existing information systems

Internal features

modern client-side application (written in TypeScript, event stream architecture, React components, extensible)
server-side written as a WSGI application with fully decoupled background concordance/frequency/collocation calculation (using an integrated worker server)
modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)

Requirements

Python 3.6 (or newer):
- WSGI-compatible server - Gunicorn (recommended), uWsgi (supported)
- Werkzeug web application library
- Jinja2 template engine
- lxml library
- PyICU library (optional but preferred)
- markdown library (optional, for formatted corpora references)
- openpyxl library (optional, for XLSX export)
- Babel library
Manatee corpus search engine - version 2.167.8 and onwards
a key-value storage
- Redis (recommended), SQLite (supported), custom implementations possible
a task queue - Rq (recommended), Celery task queue (supported)
HTTP proxy server
- Nginx (recommended), Apache,...

Build and installation

KonText provides a script for automatic installation to an existing Ubuntu system. The easiest way to install KonText is to create an LXC/LXD container, clone the repository there and run the script. On a decently fast network, the whole process takes only a couple of seconds. Please refer to the doc/INSTALL.md file for details.

Customization and contribution

Please refer to our Wiki.

Notable users

Institute of the Czech National Corpus
LINDAT/CLARIAH-CZ
CLARIN-PL
CLARIN-SI
Інститут української
Serbski Institut (API version of KonText)

How to cite KonText

Tomáš Machálek (2020) - KonText: Advanced and Flexible Corpus Query Interface

@inproceedings{machalek-2020-kontext,
    title = "{K}on{T}ext: Advanced and Flexible Corpus Query Interface",
    author = "Mach{\'a}lek, Tom{\'a}{\v{s}}",
    booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.865",
    pages = "7003--7008",
    language = "English",
    ISBN = "979-10-95546-34-4",
}

Name		Name	Last commit message	Last commit date
Latest commit History 8,650 Commits
.github/workflows		.github/workflows
build-scripts		build-scripts
conf		conf
doc		doc
lib		lib
locale		locale
public		public
scripts		scripts
templates		templates
test-data/tags		test-data/tags
tests		tests
worker		worker
.eslintrc		.eslintrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
Makefile		Makefile
README.md		README.md
apt-requirements.txt		apt-requirements.txt
dev-requirements.txt		dev-requirements.txt
ecosystem.config.js		ecosystem.config.js
launcher-config.json		launcher-config.json
launcher-menu.json		launcher-menu.json
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
tslint.json		tslint.json
webpack.dev.js		webpack.dev.js
webpack.prod.js		webpack.prod.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contents

Introduction

Features

Internal features

Requirements

Build and installation

Customization and contribution

Notable users

How to cite KonText

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 13

Uh oh!

Languages

License

czcorpus/kontext

Folders and files

Latest commit

History

Repository files navigation

Contents

Introduction

Features

Internal features

Requirements

Build and installation

Customization and contribution

Notable users

How to cite KonText

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 13

Uh oh!

Languages

Packages