this is no way near an OCR. this project contains some perfect test images of sinhala characters (generated) and a test CNN (convolution neural network) based on lenet to train the data set in order to identify images with sinhala characters.
here we have used caffe
for implementing and training the network. do anything you want with the provided dataset.
.
├── commands.sh # contains useful caffe commands
├── imgs # dataset
│ ├── adjusted # traing images with label files (TRAIN & TEST phase)
│ ├── labels_unicode.txt
│ └── unicode # a new dataset to evaluvate trained model
├── lenet.prototxt # TEST model
├── lenet_solver.prototxt # caffe solver
├── lenet_train_test.prototxt # TRAIN model
├── map.txt # mapping of labels / unicode char / other sinhala fonts
├── model # output models / solverstate directory
├── node_modules
├── README.md
├── test.py # pythin script to evaluvate trained model
└── textgen.js # script to generate sinhala chars
ones you have defined lenet_train_test.prototxt
and configured solver lenet_solver.prototxt
start training using following commands.
view commands.sh
file.
use the test.py
to test the trained model on your new images.
sudo apt-get install libcairo2-dev libjpeg8-dev libpango1.0-dev libgif-dev build-essential g++
you need to have all the fonts mentioned in the code installed in your system. make sure font names don't have any spaces or escape characters.
npm install
node textgen.js
created image will be saved in imgs/
directory.