Offline indexing for documentation to make it searchable.
- Clone this repository
- Clone https://github.com/cloudfoundry/docs-bosh (leave the name
docs-bosh) cd pythonmake(assumes you havepython3installed)
This will index the documentation, creating index_*.json files in the current directory.
Now that we have the indexes, we can feed it into a search engine to test things out. Let's copy the indexes into a place where the web app can find it.
$ make syncNow that the data is copied, we prepare and launch the web app. First, we setup the python libraries needed to run the app:
pip install -r requirements.txt
Next we need to configure some essential variables for the app:
source dev_variables(if you typeecho $DOCS_URLand get nothing, something went wrong)
We're now ready to launch the app:
$ make runIf you get an error about ports, edit Makefile and change 8001 to something else. If all went well, you should now navigate to http://localhost:8001 to play with the search engine.
Currently two types of search are possible:
- search/title/[query]
- search/content/[query]
As the names suggest, the first matches documents based on title, the second based on the entire contents.
- 01/15/17
- started project in Golang
- rewrote initial project in Python
- introduced stop words
- 01/16/17
- introduced punctuation
- finished a basic prototype
- introduced a search client
- 01/20/17
- introduced LocalRepository, Document
- tokenize entire contents of docs, not just titles
- store positional info on tokens
- useful for efficient phrasal queries
- supports induction of titles (line_num == 2)
- decided not to do stemming
- 01/21/17
- introduced TitleIndexer, ContextIndexer
- added options to search by title or by content in the web app
- refactored
repomodule - extracted dict to json routine into a utility function
- The set of stop words is from Kevin Bougé.