这是indexloc提供的服务,不要输入任何密码
Skip to content
/ phalanx Public

Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.

License

Notifications You must be signed in to change notification settings

mosuka/phalanx

Repository files navigation

Phalanx

Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.
Phalanx implements a cluster formation by hashicorp/memberlist and managing index metadata on etcd, so it is easy to bring up a fault-tolerant cluster.
Metrics for system operation can also be output in Prometheus exposition format, so that monitoring can be done immediately using Prometheus.
Phalanx is using object storage for the storage layer, it is only responsible for the computation layer, such as indexing and retrieval processes. Therefore, scaling is easy, and you can simply add new nodes to the cluster.
Currently, it is an alpha version and only supports MinIO as the storage layer, but in the future it will support Amazon S3, Google Cloud Storage, and Azure Blob Storage.

Start Phalanx on local machine with local file system

Phalanx can be started on a local machine as if it were using local storage instead of object storage.
A configuration file is available for starting on the local machine. You can start Phalanx by using it or by specifying the contents described in it with flags in the CLI. The following command starts with a configuration file:

% phalanx --config-file=./examples/phalanx_local.yml

Start Phalanx on local machine with MinIO and etcd

To experience the features of Phalanx, let's try to start Phalanx using MinIO and etcd.
First of all, let's start MinIO and etcd that Phalanx depends on. You can start them yourself, but there is a docker-compose.yml in this repository. You can use this to easily run MinIO and etcd on Docker.

% docker-compose up

Once the container has been started, you can check the MinIO and etcd data in your browser at the following URL.

Then, let's start Phalanx. A configuration file is available for starting on the local machine with MinIO and etcd. You can start Phalanx by using it or by specifying the contents described in it with flags in the CLI. The following command starts with a configuration file:

% phalanx --config-file=./examples/phalanx.yml
% phalanx --config-file=./examples/phalanx.yml --bind-port=2001 --grpc-port=5001 --http-port=8001 --seed-addresses=0.0.0.0:2000
% phalanx --config-file=./examples/phalanx.yml --bind-port=2002 --grpc-port=5002 --http-port=8002 --seed-addresses=0.0.0.0:2000

The above command overwrites the values in the configuration file with CLI flags in order to start multiple Phalanx nodes. Above example shows each Phalanx node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.
This instructs each new node to join an existing node, each node recognizes the joining clusters when started. So you have a 3-node cluster.

Health check

Liveness check

% curl -XGET http://localhost:8000/livez | jq .
{
  "state":"alive"
}

Readiness check

% curl -XGET http://localhost:8000/readyz | jq .
{
  "state":"ready"
}

There are endpoints, but it is not yet fully implemented.

Metrics exposition

% curl -XGET http://localhost:8000/metrics
# HELP phalanx_grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE phalanx_grpc_server_handled_total counter
phalanx_grpc_server_handled_total{grpc_code="Aborted",grpc_method="AddDocuments",grpc_service="index.Index",grpc_type="unary"} 0
phalanx_grpc_server_handled_total{grpc_code="Aborted",grpc_method="Cluster",grpc_service="index.Index",grpc_type="unary"} 0
...

Cluster status

% curl -XGET http://localhost:8000/cluster | jq .
{
  "indexer_assignment": {},
  "indexes": {},
  "nodes": {
    "node-KWkaw0jJ": {
      "addr": "0.0.0.0",
      "meta": {
        "grpc_port": 5000,
        "http_port": 8000,
        "roles": [
          "indexer",
          "searcher"
        ]
      },
      "port": 2000,
      "state": "alive"
    }
  },
  "searcher_assignment": {}
}

Create index

Create index on local file system

If you have started Phalanx to use the local file system, you can use this command to create an index.

% curl -XPUT -H 'Content-type: application/json' http://localhost:8000/v1/indexes/example_en --data-binary @./examples/create_index_example_en_local.json

Create index with MinIO and etcd

If you have started Phalanx to use MinIO and etcd, use this command to create the index.

% curl -XPUT -H 'Content-type: application/json' http://localhost:8000/v1/indexes/example_en --data-binary @./examples/create_index_example_en.json

The difference between the above commands is the difference between index_uri and lock_uri in the configuration file. This parameter specifies where the index and its lock file will be created.

Delete index

% curl -XDELETE http://localhost:8000/v1/indexes/example_en

Add / Update documents

% curl -XPUT -H 'Content-type: application/x-ndjson' http://localhost:8000/v1/indexes/example_en/documents --data-binary @./examples/add_documents.ndjson

Delete documents

% curl -XDELETE -H 'Content-type: text/plain' http://localhost:8000/v1/indexes/example_en/documents --data-binary @./examples/delete_ids.txt

Search

% curl -XPOST -H 'Content-type: text/plain' http://localhost:8000/v1/indexes/example_en/_search --data-binary @./examples/search.json | jq .
{
  "documents": [
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/rust",
        "description": "Sonic is a fast, lightweight and schema-less search backend.",
        "name": "Sonic",
        "popularity": 7895,
        "publish_date": "2019-12-10T14:13:00Z",
        "url": "https://github.com/valeriansaliou/sonic"
      },
      "id": "7",
      "score": 0.37863163826497015
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/python",
        "description": "Whoosh is a fast, pure Python search engine library.",
        "name": "Whoosh",
        "popularity": 0,
        "publish_date": "2019-10-07T20:30:26Z",
        "url": "https://bitbucket.org/mchaput/whoosh/wiki/Home"
      },
      "id": "11",
      "score": 0.3731338946601548
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/java",
        "description": "Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.",
        "name": "Lucene",
        "popularity": 3135,
        "publish_date": "2019-12-19T05:08:00Z",
        "url": "https://lucene.apache.org/"
      },
      "id": "9",
      "score": 0.3710793549141038
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/go",
        "description": "Riot is Go Open Source, Distributed, Simple and efficient full text search engine.",
        "name": "Riot",
        "popularity": 4948,
        "publish_date": "2019-12-15T22:12:00Z",
        "url": "https://github.com/go-ego/riot"
      },
      "id": "5",
      "score": 0.3611255085637879
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/rust",
        "description": "Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust.",
        "name": "Tantivy",
        "popularity": 3142,
        "publish_date": "2019-12-19T01:07:00Z",
        "url": "https://github.com/quickwit-inc/tantivy"
      },
      "id": "8",
      "score": 0.34530979286026436
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/java",
        "description": "Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.",
        "name": "Elasticsearch",
        "popularity": 46054,
        "publish_date": "2019-12-18T23:19:00Z",
        "url": "https://www.elastic.co/products/elasticsearch"
      },
      "id": "3",
      "score": 0.13076457838717315
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/go",
        "description": "Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.",
        "name": "Phalanx",
        "popularity": 0,
        "publish_date": "2021-12-10T12:00:00Z",
        "url": "https://github.com/mosuka/phalanx"
      },
      "id": "1",
      "score": 0.13076457838717315
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/rust",
        "description": "Toshi is meant to be a full-text search engine similar to Elasticsearch. Toshi strives to be to Elasticsearch what Tantivy is to Lucene.",
        "name": "Toshi",
        "popularity": 2448,
        "publish_date": "2019-12-01T19:00:00Z",
        "url": "https://github.com/toshi-search/Toshi"
      },
      "id": "6",
      "score": 0.13076457838717315
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/go",
        "description": "Blast is a full text search and indexing server, written in Go, built on top of Bleve.",
        "name": "Blast",
        "popularity": 654,
        "publish_date": "2019-10-18T10:50:00Z",
        "url": "https://github.com/mosuka/blast"
      },
      "id": "4",
      "score": 0.08523749485612774
    },
    {
      "fields": {
        "_timestamp": "2021-12-10T13:03:18Z",
        "category": "/language/rust",
        "description": "Quickwit is a distributed search engine built from the ground up to offer cost-efficiency and high reliability.",
        "name": "quickwit",
        "popularity": 0,
        "publish_date": "2021-07-13T15:07:00Z",
        "url": "https://github.com/quickwit-inc/quickwit"
      },
      "id": "13",
      "score": 0.08063697039612684
    }
  ],
  "hits": 10,
  "index_name": "example_en"
}

About

Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Contributors 3

  •  
  •  
  •