编辑
2021-04-15
编程
00

目录

base elasticsearch
1.概述
1.1 基本概念
1.2 安装
1.2.1 准备
1.2.2 单节点部署
1.2.2.1 elasticsearch
2.1 初步检索
2.1.1 _cat
2.1.2 索引一个文档
2.2 进阶检索
2.2.1 query-dsl:基本语法
2.2.2 query-dsl: match
2.2.3 query-dsl: multi_match
2.2.4 query-dsl: bool
2.2.5 query-dsl: filter
2.2.6 query-dsl: term
2.2.7 aggregations
2.4 分词
3.1 依赖

base elasticsearch

1.概述

1.1 基本概念

  • documents : 可以建立索引的基本单位

  • index:具有相似特征的 documents 集合

  • type:曾经是 index 的逻辑类别或分区,准许在 index 中创建不同类型的 documents, 但是 type在6.0以后被弃用了

  • mapping: 处理数据的方式和规则

  • Near Realtime (NRT):轻微的延迟(通常是 1 秒以内)

1.2 安装

1.2.1 准备

1.2.2 单节点部署

1.2.2.1 elasticsearch
  • 解压:

    $ tar -zxf elasticsearch-6.8.15.tar.gz

  • 直接启动:

    $ cd elasticsearch-6.8.15

    $ ./bin/elasticsearch

    • 直接启动异常:
    txt
    [2021-03-26T16:32:58,140][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [unknown] uncaught exception in thread [main] org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.8.15.jar:6.8.15] at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.8.15.jar:6.8.15] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.8.15.jar:6.8.15] Caused by: java.lang.RuntimeException: can not run elasticsearch as root at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:103) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-6.8.15.jar:6.8.15] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-6.8.15.jar:6.8.15] ... 6 more
    • 根据日志提示,添加用户

    // 添加用户elasticsearch-user

    $ adduser elasticsearch-user

    // 修改用户 elasticsearch-user 的密码

    $ passwd elasticsearch-user Changing password for user elasticseach-user. New password: Retype new password: passwd: all authentication tokens updated successfully.

    // 修改目录权限

    $ chown -R elasticsearch-user /data/elasticsearch

    // 切换用户

    $ su elasticsearch-user

    • 再次尝试启动

    nohup ./bin/elasticsearch &

    log
    [elasticsearch-user@elasticsearch1 elasticsearch-6.8.15]$ nohup ./bin/elasticsearch & [1] 9755 [elasticsearch-user@elasticsearch1 elasticsearch-6.8.15]$ nohup: ignoring input and appending output to ‘nohup.out’ ^C [elasticsearch-user@elasticsearch1 elasticsearch-6.8.15]$ tail -100f nohup.out [2021-03-31T09:17:08,259][INFO ][o.e.e.NodeEnvironment ] [No5lu90] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [41.9gb], net total_space [49.9gb], types [rootfs] [2021-03-31T09:17:08,274][INFO ][o.e.e.NodeEnvironment ] [No5lu90] heap size [989.8mb], compressed ordinary object pointers [true] [2021-03-31T09:17:08,277][INFO ][o.e.n.Node ] [No5lu90] node name derived from node ID [No5lu90XRxiKNN-5R0aXfQ]; set [node.name] to override [2021-03-31T09:17:08,278][INFO ][o.e.n.Node ] [No5lu90] version[6.8.15], pid[9755], build[default/tar/c9a8c60/2021-03-18T06:33:32.588487Z], OS[Linux/3.10.0-693.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_151/25.151-b12] [2021-03-31T09:17:08,278][INFO ][o.e.n.Node ] [No5lu90] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-4819315221049591280, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.path.home=/data/elasticsearch/elasticsearch-6.8.15, -Des.path.conf=/data/elasticsearch/elasticsearch-6.8.15/config, -Des.distribution.flavor=default, -Des.distribution.type=tar] [2021-03-31T09:17:10,670][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [aggs-matrix-stats] [2021-03-31T09:17:10,670][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [analysis-common] [2021-03-31T09:17:10,670][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [ingest-common] [2021-03-31T09:17:10,670][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [ingest-geoip] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [ingest-user-agent] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [lang-expression] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [lang-mustache] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [lang-painless] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [mapper-extras] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [parent-join] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [percolator] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [rank-eval] [2021-03-31T09:17:10,671][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [reindex] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [repository-url] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [transport-netty4] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [tribe] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-ccr] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-core] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-deprecation] [2021-03-31T09:17:10,672][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-graph] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-ilm] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-logstash] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-ml] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-monitoring] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-rollup] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-security] [2021-03-31T09:17:10,673][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-sql] [2021-03-31T09:17:10,674][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-upgrade] [2021-03-31T09:17:10,674][INFO ][o.e.p.PluginsService ] [No5lu90] loaded module [x-pack-watcher] [2021-03-31T09:17:10,674][INFO ][o.e.p.PluginsService ] [No5lu90] no plugins loaded [2021-03-31T09:17:15,589][INFO ][o.e.x.s.a.s.FileRolesStore] [No5lu90] parsed [0] roles from file [/data/elasticsearch/elasticsearch-6.8.15/config/roles.yml] [2021-03-31T09:17:16,329][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [No5lu90] [controller/9931] [Main.cc@114] controller (64 bit): Version 6.8.15 (Build 4cd69e3d9d9390) Copyright (c) 2021 Elasticsearch BV [2021-03-31T09:17:17,068][DEBUG][o.e.a.ActionModule ] [No5lu90] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security [2021-03-31T09:17:17,393][INFO ][o.e.d.DiscoveryModule ] [No5lu90] using discovery type [zen] and host providers [settings] [2021-03-31T09:17:18,543][INFO ][o.e.n.Node ] [No5lu90] initialized [2021-03-31T09:17:18,544][INFO ][o.e.n.Node ] [No5lu90] starting ... [2021-03-31T09:17:23,820][INFO ][o.e.t.TransportService ] [No5lu90] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300} [2021-03-31T09:17:23,917][WARN ][o.e.b.BootstrapChecks ] [No5lu90] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535] [2021-03-31T09:17:27,015][INFO ][o.e.c.s.MasterService ] [No5lu90] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {No5lu90}{No5lu90XRxiKNN-5R0aXfQ}{EauWlcgjRVaCWUOC9LDE3Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=16824958976, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} [2021-03-31T09:17:27,023][INFO ][o.e.c.s.ClusterApplierService] [No5lu90] new_master {No5lu90}{No5lu90XRxiKNN-5R0aXfQ}{EauWlcgjRVaCWUOC9LDE3Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=16824958976, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {No5lu90}{No5lu90XRxiKNN-5R0aXfQ}{EauWlcgjRVaCWUOC9LDE3Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=16824958976, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]]) [2021-03-31T09:17:27,105][INFO ][o.e.h.n.Netty4HttpServerTransport] [No5lu90] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200} [2021-03-31T09:17:27,106][INFO ][o.e.n.Node ] [No5lu90] started [2021-03-31T09:17:27,490][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [No5lu90] Failed to clear cache for realms [[]] [2021-03-31T09:17:27,557][INFO ][o.e.l.LicenseService ] [No5lu90] license [91784dfe-a371-44cd-b9e1-0bb3181a49ee] mode [basic] - valid [2021-03-31T09:17:27,571][INFO ][o.e.g.GatewayService ] [No5lu90] recovered [0] indices into cluster_state
    • 测试

    $ curl http://127.0.0.1:9200

    json
    { "name" : "No5lu90", "cluster_name" : "elasticsearch", "cluster_uuid" : "cDaupHYNQRWw7vlocIbGoQ", "version" : { "number" : "6.8.15", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "c9a8c60", "build_date" : "2021-03-18T06:33:32.588487Z", "build_snapshot" : false, "lucene_version" : "7.7.3", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
1.2.2.2 kibana
  • 解压:

    $ tar -zxf kibana-6.8.15-linux-x86_64.tar.gz

    $ mv kibana-6.8.15-linux-x86_64 kibana-6.8.15

  • 修改配置

    $ vim config/kibana.yml

    yaml
    server.host: "192.168.242.121"

2. 检索

2.1 初步检索

2.1.1 _cat

  • 命令:

    GET /_cat/nodes: 查看所有节点

    GET /_cat/health: 查看 es 健康状况

    GET /_cat/master: 查看主节点

    GET /_cat/indices: 查看所有索引

  • 示例:

json
$ GET _cat/nodes 192.168.240.63 16 81 0 0.04 0.04 0.05 cdhilmrstw * node-1 $ GET _cat/health 1617068246 01:37:26 elasticsearch yellow 1 1 11 11 0 0 1 0 - 91.7% $ GET _cat/master fU1EPHXAT-uVDhz05ctvmQ 192.168.240.63 192.168.240.63 node-1 $ GET _cat/indices green open .apm-custom-link HRtTvn9jTXOCP-7RqZj7BQ 1 0 0 0 208b 208b green open .kibana_task_manager_1 KlmGX3mZQaW_vHPctcOy-g 1 0 5 346287 27.2mb 27.2mb green open .apm-agent-configuration aB0fHKdnTRO14368O55PtQ 1 0 0 0 208b 208b green open .kibana-event-log-7.10.1-000002 Ysid-__OToyXmIWgUAxKHw 1 0 0 0 208b 208b green open .kibana-event-log-7.10.1-000001 KU0GM5MfT5ixtdrzEUVC_g 1 0 3 0 11.3kb 11.3kb green open .kibana_1 WZlQtWvoRxOT7dmMj7z1qg 1 0 39 5 4.2mb 4.2mb yellow open class cq30cyBHTEe437bX5-TyXA 1 1 2 0 8kb 8kb green open .kibana-event-log-7.10.1-000003 E9yWvm7dTvCugOCPt3nKVg 1 0 0 0 208b 208b

2.1.2 索引一个文档

说明: 在7.x版本没有 type 这个定义,所以不建议再使用 PUT <index>/<type>/<id> 这种方式去添加数据 type 统一为_doc即可

  • 命令

    markdown
    PUT /<target>/_doc/<_id> POST /<target>/_doc/ PUT /<target>/_create/<_id> POST /<target>/_create/<_id>
  • 示例

    json
    $ PUT customer/_doc/2 { "name":"小明2" } // result { "_index" : "customer", "_type" : "_doc", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 } //---- // 再次执行 $ PUT customer/_doc/2 { "name":"小明2" } // result { "_index" : "customer", "_type" : "_doc", "_id" : "2", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 1 } //---- $ POST customer/_doc { "name":"小明3" } { "_index" : "customer", "_type" : "_doc", "_id" : "KCbJhXgBy-3NOEnaoBlO", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 7, "_primary_term" : 1 }

2.1.3 查找 & 乐观锁修改

json
$ GET customer/_doc/2 { "_index" : "customer", // 索引 "_type" : "_doc", "_id" : "2", // id "_version" : 4, // 版本号,更新时候会变 "_seq_no" : 6, // 并发控制字段,每次更新就会+1,用来做乐观锁 "_primary_term" : 1, // 同上,主分片重新分配,如重启,就会变化 "found" : true, "_source" : { // 真正内容 "name" : "小明2" } }
  • 乐观锁修改

    json
    $ PUT customer/_doc/2?if_seq_no=10&if_primary_term=1 { "name":"小明2" } { "_index" : "customer", "_type" : "_doc", "_id" : "2", "_version" : 6, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 11, "_primary_term" : 1 } // 再次执行 $ PUT customer/_doc/2?if_seq_no=10&if_primary_term=1 { "name":"小明2" } { "error" : { "root_cause" : [ { "type" : "version_conflict_engine_exception", "reason" : "[2]: version conflict, required seqNo [10], primary term [1]. current document has seqNo [11] and primary term [1]", "index_uuid" : "p1v3s28RSempuLo3UBQyOg", "shard" : "0", "index" : "customer" } ], "type" : "version_conflict_engine_exception", "reason" : "[2]: version conflict, required seqNo [10], primary term [1]. current document has seqNo [11] and primary term [1]", "index_uuid" : "p1v3s28RSempuLo3UBQyOg", "shard" : "0", "index" : "customer" }, "status" : 409 }

2.1.4 更新文档

  • 命令

    http
    POST /<target>/_update/<id> : 对比已经保存的数据,如果发生变化才会修改 (旧版命令:POST /<target>/<type>/<id>/_update)
    • 对于没有变化的数据,result 为 noop,并且 result 和 _seq_no 不会发生变化

    • 数据需要置于 doc 属性下

      json
      { "doc":{ ${data} } }
    • 而 2.1.1 中提到的 POST不带 update 和PUT,每次都会修改数据,不会进行对比

  • 示例

    json
    $ POST customer/_update/2 { "doc":{ "name":"小明3" } } { "_index" : "customer", "_type" : "_doc", "_id" : "2", "_version" : 7, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 12, "_primary_term" : 1 } // 再执行一次 $ POST customer/_update/2 { "doc":{ "name":"小明3" } } { "_index" : "customer", "_type" : "_doc", "_id" : "2", "_version" : 7, "result" : "noop", "_shards" : { "total" : 0, "successful" : 0, "failed" : 0 }, "_seq_no" : 12, "_primary_term" : 1 }

2.1.5 删除索引&文档

  • 命令

    http
    delete /<target>/
  • 示例

    json
    $ DELETE customer/_doc/1 { "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 13, "_primary_term" : 1 } GET customer/_doc/1 { "_index" : "customer", "_type" : "_doc", "_id" : "1", "found" : false }

2.1.6 批量 API: _bulk

  • 语法

    http
    POST <target>/_doc/_bulk {"action":{metadata}} {request body} {"action":{metadata}} {request body} {"action":{metadata}} {request body}
    • action:
      • 创建: create index
      • 删除:delete
      • 更新: updat
  • 示例

    json
    POST /_bulk {"create":{"_index":"mytest","_id":1}} {"name":"test1","age":11} {"index":{"_index":"mytest","_id":2}} {"name":"test2","age":12} {"delete":{"_index":"mytest","_id":1}} {"update":{"_index":"mytest","_id":2}} {"doc":{"name":"test3","age":13}} {"update":{"_index":"mytest","_id":1}} {"doc":{"name":"test1","age":10}} { "took" : 160, "errors" : true, "items" : [ { "create" : { "_index" : "mytest", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, { "index" : { "_index" : "mytest", "_type" : "_doc", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1, "status" : 201 } }, { "delete" : { "_index" : "mytest", "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1, "status" : 200 } }, { "update" : { "_index" : "mytest", "_type" : "_doc", "_id" : "2", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 1, "status" : 200 } }, { "update" : { "_index" : "mytest", "_type" : "_doc", "_id" : "1", "status" : 404, "error" : { "type" : "document_missing_exception", "reason" : "[_doc][1]: document missing", "index_uuid" : "1L8DZPSITY6Xs91xR0xDbA", "shard" : "0", "index" : "mytest" } } } ] }
  • 导入 elastic 官方样本数据

    • 下载:account.json
    • 使用 POST {yourIndexName}/_bulk 请求执行样本数据 : 这里使用 POST bank/_bulk

2.2 进阶检索

  • ES 支持两种基本方式检索
    1. 通过使用 REST request URI 发送搜索参数(url+检索参数)
    2. 通过使用 REST request Body 来发送他们(url+请求体):query-dsl
  • 主要使用 query-dsl 进行检索
  • 参考文档:getting-started-search , query-dslsearch-aggregations

2.2.1 query-dsl:基本语法

query-dsl 由 Leaf query clausesCompound query clauses 两种子句组成

Leaf query clauses 叶子查询字句:Leaf query clauses 在指定的字段上查询指定的值, 如:match, term or range queries. 叶子字句可以单独使用. Compound query clauses 复合查询字句:以逻辑方式组合多个叶子、复合查询为一个查询

Query clauses behave differently depending on whether they are used in query context or filter context.

json
GET /bank/_search { "query": { "match_all": {} }, "from": 0, "size": 3, "sort": [ { "age":"asc" } ] }

2.2.2 query-dsl: match

json
{ "query": { "match": { "FIELD": "TEXT" } } }
  • 用于匹配非字符串的字段,则为精确查询
  • 用于匹配字符串的字段,则为全文检索
    • 默认结果按评分排序

2.2.3 query-dsl: multi_match

  • 语法
json
GET /<target>/_search { "query": { "multi_match": { "query": "查询关键字", "fields": ["查询的字段1", "查询的字段2"] } } }
  • 示例
json
GET /bank/_search { "query": { "multi_match": { "query": "mill", "fields": ["address", "city"] } } } // result { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 5.4032025, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "970", "_score" : 5.4032025, "_source" : { "account_number" : 970, "balance" : 19648, "firstname" : "Forbes", "lastname" : "Wallace", "age" : 28, "gender" : "M", "address" : "990 Mill Road", "employer" : "Pheast", "email" : "forbeswallace@pheast.com", "city" : "Lopezo", "state" : "AK" } }, { "_index" : "bank", "_type" : "_doc", "_id" : "136", "_score" : 5.4032025, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } }, { "_index" : "bank", "_type" : "_doc", "_id" : "345", "_score" : 5.4032025, "_source" : { "account_number" : 345, "balance" : 9812, "firstname" : "Parker", "lastname" : "Hines", "age" : 38, "gender" : "M", "address" : "715 Mill Avenue", "employer" : "Baluba", "email" : "parkerhines@baluba.com", "city" : "Blackgum", "state" : "KY" } }, { "_index" : "bank", "_type" : "_doc", "_id" : "472", "_score" : 5.4032025, "_source" : { "account_number" : 472, "balance" : 25571, "firstname" : "Lee", "lastname" : "Long", "age" : 32, "gender" : "F", "address" : "288 Mill Street", "employer" : "Comverges", "email" : "leelong@comverges.com", "city" : "Movico", "state" : "MT" } } ] } }

2.2.4 query-dsl: bool

复合查询,协助合并多个查询条件

  • 语法
json
GET <target>/_search { "query":{ "bool":{ "must":[ // 必须条件 {"match":{...}} ], "must_not":[ // 必须否条件 {"match":{...}} ], "should":[ // 有更好条件 {"match":{...}} ] } } }
  • 示例
json
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "gender": "M" } }, { "match": { "age": "38" } }, { "match": { "address": "Mill" } } ], "should": [ { "match": { "address": "Avenue" } } ] } } } // result { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 8.622485, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "345", "_score" : 8.622485, "_source" : { "account_number" : 345, "balance" : 9812, "firstname" : "Parker", "lastname" : "Hines", "age" : 38, "gender" : "M", "address" : "715 Mill Avenue", "employer" : "Baluba", "email" : "parkerhines@baluba.com", "city" : "Blackgum", "state" : "KY" } }, { "_index" : "bank", "_type" : "_doc", "_id" : "136", "_score" : 7.0824604, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } } ] } }

2.2.5 query-dsl: filter

结果过滤

json
GET <target>/_search { "query":{ "bool":{ "filter":{ .... } } } }
  • filter 不会计算相关性得分

2.2.6 query-dsl: term

精确查找

  • 与 match 用法一致
  • 只能用于 非text 字段的查找
  • 文本的精确查找可以使用:match_phrase<filed>.keyword

2.2.7 aggregations

提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于 SQL GROUP BYSQL 聚合函数

查询和多次聚合可以在同一个请求中,有利于简化api来避免网络多次往返。

  • 语法

    json
    GET <target>/_search { "aggs": { "agg-name": { "agg-type": { ... } } } }
    • 可以使用多个聚合并列

    • 聚合里面可以再使用聚合

  • 示例

    json
    // 查出所有年龄分布,并且这些年龄中M的平均薪资和F的平均薪资,以及在这个年龄段的总体平均薪资 GET bank/_search { "query": { "match_all": {} }, "aggs": { "ageTerm": { "terms": { "field": "age", "size": 100 }, "aggs": { "genderTerm": { "terms": { "field": "gender.keyword", "size": 100 }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } }, "balanceAvg":{ "avg": { "field": "balance" } } } } }, "size": 0 } // result { "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "ageTerm" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 31, "doc_count" : 61, "genderTerm" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 35, "balanceAvg" : { "value" : 29565.628571428573 } }, { "key" : "F", "doc_count" : 26, "balanceAvg" : { "value" : 26626.576923076922 } } ] }, "balanceAvg" : { "value" : 28312.918032786885 } }, ... ] } } }

2.3 Mapping

  • 参考文档:mapping

  • 查询各个字段的类型

    json
    GET <target>/_mapping
  • 创建 mapping

    json
    PUT /<target> { "mappings": { "properties": { "filed1": { "type": "integer" }, "filed2": { "type": "keyword" }, "filed3": { "type": "text" } } } }
    • 已经存在的不能修改
  • 添加新的字段

    json
    PUT /<target>/_mapping { "properties": { "filed4": { "type": "keyword", "index": false } } }
  • 修改映射

    • 已经存在的字段不能修改映射
    • 实在要修改映射,需要用正确的映射规则创建新的索引,并通过 reindex 迁移数据
    json
    POST _reindex { "source": { "index": "my-index-000001", "type":"旧版可能会有类型这个字段" }, "dest": { "index": "my-new-index-000001" } }

2.4 分词

  • 使用 tokenizer(分词器) 将接受的字符流,切割成独立的 tokens (词元,通常是独立的单纯),容后输出 tokens 流

  • tokenizer 还负责记录各个 term(词条) 的顺序或 position 位置(用于 phrase 短语和 word proximity 词近邻查询)。以及 term 所代表的原始 word 的 start 和 end 的 character offsets (字符偏移量)(用于高亮显示搜索的内容)

  • Elasticsearch 提供了很多内置分词器,用以构建 custorn analyzers(自定义分词器)

  • 参考文档:analysis

  • 测试分词器

    json
    POST _analyze { "analyzer": "standard", "text": "Hello World 2." } { "tokens" : [ { "token" : "hello", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "world", "start_offset" : 6, "end_offset" : 11, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "2", "start_offset" : 12, "end_offset" : 13, "type" : "<NUM>", "position" : 2 } ] }

3. java-api

3.1 依赖

xml
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.12.0</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>7.12.0</version> </dependency>

测试代码:MyClient.java

java
package com.study.elasticsearch.client; import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import java.io.IOException; /** * @author XuZhuohao * @date 2021/4/7 */ public class MyClient { public static void main(String[] args) throws IOException { RestHighLevelClient client = new RestHighLevelClient( RestClient.builder(new HttpHost("192.168.240.63", 9200, "http"))); client.close(); } }

输出

text
10:22:13.302 [main] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection manager is shutting down 10:22:13.309 [main] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection manager shut down

本文作者:Yui_HTT

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!