geolocation

Geolocation prediction for a given Tweet, or a short text. The system trains a neural net, as described in

Philippe Thomas and Leonhard Hennig (2017), "Twitter Geolocation Prediction using Neural Networks." In Proceedings of GSCL

Usage

Train and apply models

To train models, training data (tweets and gold labels) needs to be retrieved. As Tweets can not be shared directly, we refer to the WNUT'16 workshop page for further information.

After retrieving the training files, the preprocess script converts tweets into the desired representation to train a neural network. Models can be trained from scratch using the trainindividual script. Pretrained models are available in HDF5 format here. Additionally, we require some information on model and preprocessor (e.g., tokenizer) which is provided here. The evaluation of models is implemented here.

Docker image

Alternatively we provide a docker container here, containing processed data (e.g., tokenizers), pretrained models, evaluation data, and scripts. Extract, load, and connect to the container using:

unlzma geolocation.docker.lzma
docker load --input geolocation.docker
docker run -it geolocation:v1 bash

Evaluate performance by:

python3 /root/code/EvaluateTweet.py 
python3 /root/code/EvaluateUser.py

Example usage for short text:

The code below briefly describes how to use our neural network, trained on text only. For other examples (e.g., using Twitter text and metadata), we refer to the examples in the two evaluation scripts

from keras.models import load_model
import pickle
from keras.preprocessing.sequence import pad_sequences
import numpy as np

#Load Model
textBranch = load_model('data/w-nut-latest/models/textBranchNorm.h5')

#Load tokenizers, and mapping
file = open("data/w-nut-latest/binaries/processors.obj",'rb')
descriptionTokenizer, domainEncoder, tldEncoder, locationTokenizer, sourceEncoder, textTokenizer, nameTokenizer, timeZoneTokenizer, utcEncoder, langEncoder, timeEncoder, placeMedian, classes, colnames = pickle.load(file)

#Load properties from model
file = open("data/w-nut-latest/binaries/vars.obj",'rb')
MAX_DESC_SEQUENCE_LENGTH, MAX_LOC_SEQUENCE_LENGTH, MAX_TEXT_SEQUENCE_LENGTH, MAX_NAME_SEQUENCE_LENGTH, MAX_TZ_SEQUENCE_LENGTH = pickle.load(file)
#Predict text (e.g., 'Montmartre is truly beautiful')
testTexts=[];
testTexts.append("Montmartre is truly beautiful")

textSequences = textTokenizer.texts_to_sequences(testTexts)
textSequences = np.asarray(textSequences)
textSequences = pad_sequences(textSequences, maxlen=MAX_TEXT_SEQUENCE_LENGTH)

predict = textBranch.predict(textSequences)

#Print the top 5
for index in reversed(predict.argsort()[0][-5:]):
    print("%s with score=%.3f" % (colnames[index], float(predict[0][index])) )

The output is:

paris-a875-fr with score=0.275
city of london-enggla-gb with score=0.079
boulogne billancourt-a892-fr with score=0.032
saint denis-a893-fr with score=0.024
meaux-a877-fr with score=0.015

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
EvaluateTweet.py		EvaluateTweet.py
EvaluateUser.py		EvaluateUser.py
LICENSE		LICENSE
Preprocess.py		Preprocess.py
README.md		README.md
SimpleBaseline.py		SimpleBaseline.py
TrainIndividualModels.py		TrainIndividualModels.py
TrainMergedModel.py		TrainMergedModel.py
geoEval.py		geoEval.py
predictText.py		predictText.py
representation.py		representation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

geolocation

Usage

Train and apply models

Docker image

Example usage for short text:

The output is:

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Erechtheus/geolocation

Folders and files

Latest commit

History

Repository files navigation

geolocation

Usage

Train and apply models

Docker image

Example usage for short text:

The output is:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages