elasticsearch ik分词器-自定义分词（本地方式，centos）-马育民老师

# 创建分词文件

### 创建分词文件

在 `ik/config/` 目录下，是 **配置文件** 和 **分词文件**，进入到该目录：

```
cd /program/elasticsearch-7.9.3/plugins/ik/config/
```

创建自己的分词文件：
```
vim myword.dic
```

内容如下：

```
李雷
韩梅梅
```

保存为 `UTF-8` 文件

# 修改 ik 配置

编辑 `IKAnalyzer.cfg.xml` 文件，修改内容如下：

```
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	
	<entry key="ext_dict">myword.dic</entry>
	 
	<entry key="ext_stopwords"></entry>
	
	
	
	
</properties>

```
保存退出

**解释：**

```
<entry key="ext_dict">myword.dic</entry>
```

配置本地扩展字典

# 重启 es

# 测试分词

### 测试一：

再次执行下面分词：
```
get _analyze
{
  "analyzer":"ik_smart",
  "text":"李雷的博客"
}
```

执行结果如下，可以正确分词：

```
{
  "tokens" : [
    {
      "token" : "李雷",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "博客",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}
```

### 测试二

再次执行下面分词：
```
get _analyze
{
  "analyzer":"ik_max_word",
  "text":"韩梅梅的博客"
}
```
执行结果如下，可以正确分词：

```
{
  "tokens" : [
    {
      "token" : "韩梅梅",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "博客",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

```

# 添加新的分词

在有新的词，就填入到下面文件中：
```
myword.dic
```
每个词占一行

### 注意

保存后，需要重启 es

原文出处：http://malaoshi.top/show_1IX4VH2xJRtt.html