这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Dec 10, 2021. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

### Added

- Support bulk update #41
- Support Badger #38
- Add index stats #37
- Add Wikipedia example #35
Expand Down
129 changes: 114 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ $ make \
#### macOS

```bash
$ make GOOS=darwin \
$ make \
GOOS=darwin \
BUILD_TAGS="kagome icu libstemmer cld2 cznicb leveldb badger" \
CGO_ENABLED=1 \
Expand Down Expand Up @@ -166,6 +166,41 @@ blast-index
```


## Testing Blast

If you want to test your changes, run command like following:

```bash
$ make \
test
```

You can test with all the Bleve extensions supported by Blast as follows:


### Linux

```bash
$ make \
BUILD_TAGS="kagome icu libstemmer cld2 cznicb leveldb badger" \
CGO_ENABLED=1 \
test
```


#### macOS

```bash
$ make \
BUILD_TAGS="kagome icu libstemmer cld2 cznicb leveldb badger" \
CGO_ENABLED=1 \
CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib" \
CGO_CFLAGS="-I/usr/local/opt/icu4c/include" \
build
```



## Starting Blast index node

Running a Blast index node is easy. Start Blast data node like so:
Expand All @@ -188,7 +223,15 @@ You can now put, get, search and delete the documents via CLI.
For document indexing, execute the following command:

```bash
$ cat ./example/doc_enwiki_1.json | xargs -0 ./bin/blast-index index --grpc-addr=:5050 enwiki_1
$ cat ./example/doc_enwiki_1.json | xargs -0 ./bin/blast-index index --grpc-addr=:5050 --id=enwiki_1
```

You can see the result in JSON format. The result of the above command is:

```bash
{
"count": 1
}
```


Expand All @@ -197,7 +240,7 @@ $ cat ./example/doc_enwiki_1.json | xargs -0 ./bin/blast-index index --grpc-addr
Getting a document is as following:

```bash
$ ./bin/blast-index get --grpc-addr=:5050 enwiki_1
$ ./bin/blast-index get --grpc-addr=:5050 --id=enwiki_1
```

You can see the result in JSON format. The result of the above command is:
Expand Down Expand Up @@ -390,7 +433,49 @@ Please refer to following document for details of search request and result:
Deleting a document is as following:

```bash
$ ./bin/blast-index delete --grpc-addr=:5050 enwiki_1
$ ./bin/blast-index delete --grpc-addr=:5050 --id=enwiki_1
```

You can see the result in JSON format. The result of the above command is:

```bash
{
"count": 1
}
```


### Indexing documents in bulk via CLI

Indexing documents in bulk, run the following command:

```bash
$ cat ./example/docs_wiki.json | xargs -0 ./bin/blast-index index --grpc-addr=:5050
```

You can see the result in JSON format. The result of the above command is:

```bash
{
"count": 4
}
```


### Deleting documents in bulk via CLI

Deleting documents in bulk, run the following command:

```bash
$ cat ./example/docs_wiki.json | xargs -0 ./bin/blast-index delete --grpc-addr=:5050
```

You can see the result in JSON format. The result of the above command is:

```bash
{
"count": 4
}
```


Expand All @@ -401,10 +486,10 @@ Also you can do above commands via HTTP REST API that listened port 8080.

### Indexing a document via HTTP REST API

Putting a document via HTTP is as following:
Indexing a document via HTTP is as following:

```bash
$ curl -X PUT 'http://127.0.0.1:8080/documents/enwiki_1' -d @./example/doc_enwiki_1.json
$ curl -s -X PUT 'http://127.0.0.1:8080/documents/enwiki_1' -d @./example/doc_enwiki_1.json
```


Expand All @@ -413,7 +498,7 @@ $ curl -X PUT 'http://127.0.0.1:8080/documents/enwiki_1' -d @./example/doc_enwik
Getting a document via HTTP is as following:

```bash
$ curl -X GET 'http://127.0.0.1:8080/documents/enwiki_1'
$ curl -s -X GET 'http://127.0.0.1:8080/documents/enwiki_1'
```


Expand All @@ -435,6 +520,24 @@ $ curl -X DELETE 'http://127.0.0.1:8080/documents/enwiki_1'
```


### Indexing documents in bulk via HTTP REST API

Indexing documents in bulk via HTTP is as following:

```bash
$ curl -s -X PUT 'http://127.0.0.1:8080/documents' -d @./example/docs_wiki.json
```


### Deleting documents in bulk via HTTP REST API

Deleting documents in bulk via HTTP is as following:

```bash
$ curl -X DELETE 'http://127.0.0.1:8080/documents' -d @./example/docs_wiki.json
```


## Bringing up a cluster

Blast is easy to bring up the cluster. Blast data node is already running, but that is not fault tolerant. If you need to increase the fault tolerance, bring up 2 more data nodes like so:
Expand Down Expand Up @@ -620,13 +723,9 @@ $ ./WikiExtractor.py -o ~/tmp/enwiki --json ~/tmp/enwiki-20190101-pages-articles
```bash
$ for FILE in $(find ~/tmp/enwiki -type f -name '*' | sort)
do
echo "${FILE}"
cat ${FILE} | while read -r LINE; do
TIMESTAMP=$(date -u "+%Y-%m-%dT%H:%M:%SZ")
ID=$(echo ${LINE} | jq -r .id)
FIELDS=$(echo "${LINE}" | jq -c -r '{url: .url, title_en: .title, text_en: .text, timestamp: "'${TIMESTAMP}'", _type: "enwiki"}')
echo "- ${ID} ${FIELDS}"
curl -X PUT "http://127.0.0.1:8080/documents/${ID}" -d "${FIELDS}"
done
echo "Indexing ${FILE}"
TIMESTAMP=$(date -u "+%Y-%m-%dT%H:%M:%SZ")
DOCS=$(cat ${FILE} | jq -r '. + {fields: {url: .url, title_en: .title, text_en: .text, timestamp: "'${TIMESTAMP}'", _type: "enwiki"}} | del(.url) | del(.title) | del(.text) | del(.fields.id)' | jq -s)
curl -s -X PUT -H 'Content-Type: application/json' "http://127.0.0.1:8080/documents" -d "${DOCS}"
done
```
47 changes: 40 additions & 7 deletions cmd/blast-index/delete.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
package main

import (
"encoding/json"
"errors"
"fmt"
"os"
Expand All @@ -26,18 +27,43 @@ import (

func execDelete(c *cli.Context) error {
grpcAddr := c.String("grpc-addr")
id := c.String("id")

id := c.Args().Get(0)
// create documents
docs := make([]*pbindex.Document, 0)

if id == "" {
err := errors.New("key argument must be set")
return err
}
if c.NArg() == 0 {
err := errors.New("arguments are not correct")
return err
}

// documents
docsStr := c.Args().Get(0)

var docMaps []map[string]interface{}
err := json.Unmarshal([]byte(docsStr), &docMaps)
if err != nil {
return err
}

for _, docMap := range docMaps {
// create document
doc := &pbindex.Document{
Id: docMap["id"].(string),
}

doc := &pbindex.Document{
Id: id,
docs = append(docs, doc)
}
} else {
doc := &pbindex.Document{
Id: id,
}

docs = append(docs, doc)
}

// create client
client, err := index.NewGRPCClient(grpcAddr)
if err != nil {
return err
Expand All @@ -49,10 +75,17 @@ func execDelete(c *cli.Context) error {
}
}()

err = client.Delete(doc)
result, err := client.BulkDelete(docs)
if err != nil {
return err
}

resultBytes, err := json.MarshalIndent(result, "", " ")
if err != nil {
return err
}

fmt.Fprintln(os.Stdout, fmt.Sprintf("%v\n", string(resultBytes)))

return nil
}
8 changes: 3 additions & 5 deletions cmd/blast-index/get.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,17 @@ import (
"fmt"
"os"

"github.com/mosuka/blast/protobuf"

"github.com/mosuka/blast/index"
"github.com/mosuka/blast/protobuf"
pbindex "github.com/mosuka/blast/protobuf/index"
"github.com/urfave/cli"
)

func execGet(c *cli.Context) error {
grpcAddr := c.String("grpc-addr")

id := c.Args().Get(0)
id := c.String("id")
if id == "" {
err := errors.New("key argument must be set")
err := errors.New("arguments are not correct")
return err
}

Expand Down
Loading