当前位置: 代码迷 >> 综合 >> elastic-search入门教程总结
  详细解决方案

elastic-search入门教程总结

热度:52   发布时间:2024-01-16 03:27:25.0

文章目录

  • 安装
  • 概念
  • 分词解析器
  • restful调用
  • 调试插件
  • 集群使用
  • JAVA API调用
  • Spring-data-elasticsearch
  • 原理
  • 数据库和elastic-search性能对比
  • 参考

安装

  1. 下载压缩包 https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip
  2. 解压
  3. Windows,执行解压后bin目录下的elasticsearch.bat
  4. Linux,执行解压后bin目录下的elasticsearch

概念

  • 索引——数据库
  • 类型——表
  • 文档——记录
  • 属性——字段
  • mapping——表结构的定义

分词解析器

分词解析器的作用是将一段话根据分词解析器的规则对一段话进行词语的划分。词语划分后,模糊搜索时,会根据划分的词语进行搜索,实际是词语作为倒排索引。

  • 中文分词解析器:根据中文的词语规则,对语句进行词语的划分。
  • 下载地址:https://github.com/medcl/elasticsearch-analysis-ik

restful调用

  • 创建索引并设置类型:

    PUT请求: localhost:9200/blog

    {
          "mappings":{
          "hello":{
          "properties":{
          "id":{
          "store":true,"analyzer":"standard","type":"long"},"title":{
          "store":true,"analyzer":"ik_smart","type":"text"},"content":{
          "store":true,"analyzer":"ik_smart","type":"text"}}}}
    }
    
  • 添加文档:

    POST请求(其中索引的id为url后的id,若不设置,则会生成默认id,而非属性中的id): localhost:9200/blog/hello/1

    {
          "id":1,"title":"字符过滤器","content":"首先字符串经过过滤器(character filter),他们的工作是在表征化(注:这个词叫做断词更适合)前处理字符串。字符过滤器能够去除HTML标记,或者转化为“&”为“and”。 "
    }
    
  • 修改文档:

    与添加文档的请求一致。

  • 删除文档:

    DELETE请求:localhost:9200/blog/hello/1

  • 查询文档:查询(query)与过滤(filter),查询会对词语进行评分,模糊匹配,根据相关性进行查询,性能较低;过滤是是否匹配,yes或者no,直接进行过滤操作,不进行评分,性能更高

    • match_all:所有字段的匹配所有

      {
               "match_all": {
              }}
      
    • match:全文字段上全文搜索,非全文字段进行精确查询

      {
               "match": {
               "tweet": "About Search" }}
      
    • multi_match:多个字段上执行match

      {
              "multi_match": {
              "query":    "full text search","fields":   [ "title", "body" ]}
      }
      
    • range:落在指定区间内的数字或时间

      {
              "multi_match": {
              "query":    "full text search","fields":   [ "title", "body" ]}
      }
      
    • term:不进行文本分析,精确值匹配

      {
               "term": {
               "age":    26           }}
      {
               "term": {
               "date":   "2014-09-01" }}
      {
               "term": {
               "public": true         }}
      {
               "term": {
               "tag":    "full_text"  }}
      
    • terms:与term一样,但进行多值匹配

      {
               "terms": {
               "tag": [ "search", "full_text", "nosql" ] }}
      
    • exists和missing:查询是否有值或无值,类似SQL的 IS_NULL 或 NOT IS_NULL

      {
              "exists":   {
              "field":    "title"}
      }
      

调试插件

  • ElasticSearch Head:可使用该工具查看索引、类型和文档的信息,并可执行常用的查询

集群使用

  1. 修改config目录下的elasticsearch.yml该配置文件

    cluster.name: my-es-cluster #集群名称
    node.name: node-2 #节点名称
    http.port: 9202 #操作数据的端口
    transport.tcp.port: 9302 #各节点相互通信的端口
    discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301", "127.0.0.1:9302", "127.0.0.1:9303"] #集群单播地址
    
  2. 分别启动各个节点的elasticsearch

  3. 连接集群,进行数据操作

JAVA API调用

  • 依赖jar
<dependencies><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>5.5.1</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>transport</artifactId><version>5.5.1</version></dependency><!-- https://mvnrepository.com/artifact/io.netty/netty-transport --><!-- 引入之后PreBuiltTransportClient会引入 --><dependency><groupId>io.netty</groupId><artifactId>netty-transport</artifactId><version>4.1.13.Final</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-core</artifactId><version>2.9.1</version></dependency><!-- https://mvnrepository.com/artifact/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb --><!-- 引入连接 mongo数据库的相关jar --><dependency><groupId>com.github.richardwilly98.elasticsearch</groupId><artifactId>elasticsearch-river-mongodb</artifactId><version>2.0.9</version></dependency><!-- https://mvnrepository.com/artifact/junit/junit --><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency><dependency><groupId>org.junit.jupiter</groupId><artifactId>junit-jupiter-api</artifactId><version>5.0.0-M4</version><scope>test</scope></dependency><!--lombok注解简化代码--><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><version>1.18.4</version><scope>provided</scope></dependency><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-core</artifactId><version>${jackson.version}</version></dependency><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-databind</artifactId><version>${jackson.version}</version></dependency><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-annotations</artifactId><version>${jackson.version}</version></dependency></dependencies>
  • 类型实体
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Article {
    private long id;private String title;private String content;
}
  • 测试类
public class ElasticSearchClientTest {
    private TransportClient client;// 连接集群@Beforepublic void init() throws Exception {
    Settings settings = Settings.builder().put("cluster.name", "my-es-cluster").build();client = new PreBuiltTransportClient(settings);client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9301)).addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9302));}// 创建索引@Testpublic void createIndex() throws UnknownHostException {
    Settings settings = Settings.builder().put("cluster.name", "my-es-cluster").build();TransportClient client = new PreBuiltTransportClient(settings);client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9301));client.admin().indices().prepareCreate("hello-index").get();client.close();}// 设置文档@Testpublic void setMappings() throws Exception {
    Settings settings = Settings.builder().put("cluster.name", "my-es-cluster").build();TransportClient client = new PreBuiltTransportClient(settings);client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9301)).addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9302));XContentBuilder builder = XContentFactory.jsonBuilder().startObject().startObject("article").startObject("properties").startObject("id").field("store", true).field("type", "long").endObject().startObject("title").field("store", true).field("type", "text").field("analyzer", "ik_smart").endObject().startObject("content").field("store", true).field("type", "text").field("analyzer", "ik_smart").endObject().endObject().endObject().endObject();// 提交数据client.admin().indices()// 索引.preparePutMapping("hello-index")// 类型.setType("article")// 设置数据,可以是builder,可以是json字符串.setSource(builder).get();client.close();}// 添加文档@Testpublic void testAddDocument() throws Exception {
    // 创建client对象// 创建文档对象XContentBuilder builder = XContentFactory.jsonBuilder().startObject().field("id", 2).field("title", "分词器").field("content", "下一步,分词器(tokenizer)被表征化(断词)为独立的词。一个简单的分词器(tokenizer)可以根据空格或逗号将单词分开(注:这个在中文中不适用)。").endObject();// 提交数据client.prepareIndex()// 设置索引.setIndex("hello-index")// 设置type.setType("article")// 设置id,不设置,则会默认生成.setId("2")// 设置文档信息.setSource(builder)// 提交信息.get();// 关闭客户端client.close();}// 加入大量数据,进行测试@Testpublic void testDocument2() throws Exception {
    for (int i=0;i<96;i++) {
    Article article = new Article();article.setId(i);article.setTitle("表征过滤"+i);article.setContent(i+"最后,每个词都通过所有表征过滤(token filters),他可以修改词(例如将“Quick”转为小写),去掉词(例如停用词像“a”、“and”、“the”等等),或者增加词(例如同义词像“a”、“and”、“the”等等)或者增加词(例如同义词像“jump”和“leap”)");String articleJson = new ObjectMapper().writeValueAsString(article);// 提交数据client.prepareIndex()// 设置索引.setIndex("hello-index")// 设置type.setType("article")// 设置id,不设置,则会默认生成.setId(""+i)// 设置文档信息.setSource(articleJson, XContentType.JSON)// 提交信息.get();}// 关闭客户端client.close();}// 根据id查询@Testpublic void searchById() throws Exception {
    QueryBuilder builder = QueryBuilders.idsQuery().addIds("1","2");SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();SearchHits hits = article.getHits();System.out.println("总数:"+hits.getTotalHits());Iterator<SearchHit> iterator = hits.iterator();while (iterator.hasNext()){
    SearchHit next = iterator.next();Map<String, Object> sourceAsMap = next.getSourceAsMap();System.out.println(next.getSourceAsString());System.out.println("id:"+sourceAsMap.get("id"));System.out.println("title:"+sourceAsMap.get("title"));System.out.println("content:"+sourceAsMap.get("content"));}client.close();}// 根据term查询@Testpublic void searchByTerm() throws Exception {
    QueryBuilder builder = QueryBuilders.termQuery("title","过滤");SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();SearchHits hits = article.getHits();System.out.println("总数:"+hits.getTotalHits());Iterator<SearchHit> iterator = hits.iterator();while (iterator.hasNext()){
    SearchHit next = iterator.next();Map<String, Object> sourceAsMap = next.getSourceAsMap();System.out.println(next.getSourceAsString());System.out.println("id:"+sourceAsMap.get("id"));System.out.println("title:"+sourceAsMap.get("title"));System.out.println("content:"+sourceAsMap.get("content"));}client.close();}// 根据query_string查询@Testpublic void searchByQueryString() throws Exception {
    QueryBuilder builder = QueryBuilders.queryStringQuery("过滤器").defaultField("title");SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();SearchHits hits = article.getHits();System.out.println("总数:"+hits.getTotalHits());Iterator<SearchHit> iterator = hits.iterator();while (iterator.hasNext()){
    SearchHit next = iterator.next();Map<String, Object> sourceAsMap = next.getSourceAsMap();System.out.println(next.getSourceAsString());System.out.println("id:"+sourceAsMap.get("id"));System.out.println("title:"+sourceAsMap.get("title"));System.out.println("content:"+sourceAsMap.get("content"));}client.close();}// 根据query_string查询@Testpublic void searchByQueryStringPage() throws Exception {
    QueryBuilder builder = QueryBuilders.queryStringQuery("表征").defaultField("title");SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder)// from 起始行号,每页的行数.setFrom(0).setSize(5).get();SearchHits hits = article.getHits();System.out.println("总数:"+hits.getTotalHits());Iterator<SearchHit> iterator = hits.iterator();while (iterator.hasNext()){
    SearchHit next = iterator.next();Map<String, Object> sourceAsMap = next.getSourceAsMap();System.out.println(next.getSourceAsString());System.out.println("id:"+sourceAsMap.get("id"));System.out.println("title:"+sourceAsMap.get("title"));System.out.println("content:"+sourceAsMap.get("content"));}client.close();}// 根据query_string查询,并高亮显示@Testpublic void searchByQueryStringPageHilight() throws Exception {
    QueryBuilder builder = QueryBuilders.queryStringQuery("表征").defaultField("title");HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("title").preTags("<em>").postTags("</em>");SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder)// from 起始行号,每页的行数.setFrom(0).setSize(5)// 设置高亮显示.highlighter(highlightBuilder).get();SearchHits hits = article.getHits();System.out.println("总数:"+hits.getTotalHits());Iterator<SearchHit> iterator = hits.iterator();while (iterator.hasNext()){
    SearchHit next = iterator.next();Map<String, Object> sourceAsMap = next.getSourceAsMap();System.out.println(next.getSourceAsString());System.out.println("id:"+sourceAsMap.get("id"));System.out.println("title:"+sourceAsMap.get("title"));System.out.println("content:"+sourceAsMap.get("content"));System.out.println("高亮结果:"+next.getHighlightFields());System.out.println("高亮结果2:"+next.getHighlightFields().get("title").getFragments()[0].string());}client.close();}// 根据id查询,并搞了显示@Testpublic void searchByIdHilight() throws Exception {
    QueryBuilder builder = QueryBuilders.idsQuery().addIds("1","2");SearchResponse article = client.prepareSearch("hello-index").setTypes("article").setQuery(builder).get();SearchHits hits = article.getHits();System.out.println("总数:"+hits.getTotalHits());Iterator<SearchHit> iterator = hits.iterator();while (iterator.hasNext()){
    SearchHit next = iterator.next();Map<String, Object> sourceAsMap = next.getSourceAsMap();System.out.println(next.getSourceAsString());System.out.println("id:"+sourceAsMap.get("id"));System.out.println("title:"+sourceAsMap.get("title"));System.out.println("content:"+sourceAsMap.get("content"));}client.close();}
}

Spring-data-elasticsearch

  • 依赖jar
 <parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.0.4.RELEASE</version><relativePath/> <!-- lookup parent from repository --></parent><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId></dependency><!--lombok注解简化代码--><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><version>1.18.4</version><scope>provided</scope></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version><scope>compile</scope></dependency><dependency><groupId>org.apache.commons</groupId><artifactId>commons-lang3</artifactId><version>3.9</version></dependency><dependency><groupId>org.apache.commons</groupId><artifactId>commons-collections4</artifactId><version>4.1</version></dependency></dependencies>
  • 类型实体
@Data
@NoArgsConstructor
@AllArgsConstructor
// 注解标识索引和类型
@Document(indexName = "zoo",type = "animal")
public class Animal {
    // 注解标识属性的各种信息@Field(type = FieldType.Long)private long id;@Field(type = FieldType.Text,analyzer = "ik_smart")private String name;@Field(type = FieldType.Text,analyzer = "ik_smart")private String hobby;
}
  • 接口实体
// 通过继承ElasticsearchRepository,通过编写接口方法的方式进行数据查询,泛型指定类型的实体和ID的类型
@Component
public interface AnimalRepository extends ElasticsearchRepository<Animal, Long> {
    List<Animal> findAnimalByHobbyOrName(String hobby, String name);List<Animal> findAnimalByHobbyOrName(String hobby, String name, Pageable pageable);}
  • 测试类
@RunWith(SpringRunner.class)
@SpringBootTest(classes = ESApp.class)
public class ElasticsearchApplicationTest {
    @Autowiredprivate AnimalRepository animalRepository;@Autowiredprivate ElasticsearchTemplate template;// 添加文档@Testpublic void testSaveAnimal() {
    for (int i = 0; i < 100; i++) {
    Animal animal = new Animal();animal.setId(2L+i);animal.setName("12我是一个大号人啊啊啊啊"+i);animal.setHobby("456我是尼采,我是一个大坏人啊啊啊"+i);animalRepository.save(animal);}}// 删除文档@Testpublic void testDeleteAnimal() {
    Animal animal = new Animal();animal.setId(2L);animal.setName("123");animal.setHobby("456");animalRepository.delete(animal);}// 根据id查询文档@Testpublic void testQueryAnimal() {
    Optional<Animal> optional = animalRepository.findById(5L);Animal animal = optional.get();System.out.println(animal);}// 通过在接口中命名查询方法查找文档@Testpublic void testFindAnimal() {
    List<Animal> animal = animalRepository.findAnimalByHobbyOrName("尼采","一个");System.out.println(animal);System.out.println(animal.size());}// 通过在接口中命名查询方法查找文档,且支持分页查询@Testpublic void testFindAnimalPage() {
    Pageable pageable = PageRequest.of(0,15);List<Animal> animal = animalRepository.findAnimalByHobbyOrName("尼采","一个",pageable);System.out.println(animal);System.out.println(animal.size());}// 通过java中的es查询方式进行查询@Testpublic void testNativeQuery() {
    NativeSearchQuery searchQuery = new 	NativeSearchQueryBuilder().withQuery(QueryBuilders.queryStringQuery("坏人")).withQuery(QueryBuilders.termQuery("name","我是")).withPageable(PageRequest.of(0,20)).build();List<Animal> animal =  template.queryForList(searchQuery,Animal.class);System.out.println(animal);System.out.println(animal.size());}

原理

ES之所以快的原因是使用了倒排索引。

在这里插入图片描述
倒排索引:

  1. 将文档中的内容通过分词解析器进行划分,划分成词语
  2. 对词语进行索引建立,而该词语的值存储的是包含有该词文档的ID
  3. 搜索时,直接可通过词语找到包含该词语的各个文档

正排索引:将文档ID作为索引,而倒排索引是将文档中的词语作为索引。

数据库和elastic-search性能对比

  • elastic-search优势:模糊查询性能高出很多数量级,统计查询也较为方便,也可多条件查询
  • elastic-search劣势:不支持事务,相对SQL语句,需要增加学习成本,分页数量大后,速度并不快
  • 使用场景:检索服务,日志搜索等

谈一谈es的优势和限制

参考

Elasticsearch: 权威指南

全文搜索引擎 Elasticsearch 入门教程

  相关解决方案