SpringBoot - Elasticsearch Rest Client使用详解3(聚合统计、获取所有分组数据)
作者:hangge | 2024-12-18 08:38
Elasticsearch 中可以实现基于字段进行分组聚合的统计,聚合操作支持 count()、sum()、avg()、max()、min() 等。本文通过样例分别进行演示。
五、aggregations 聚合统计
1,count 聚合(统计相同年龄的学员个数)
(1)假设我们需要统计相同年龄的学员个数,首先调用 Elasticsearch 的 API 接口添加一些初始数据:
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/1' -d'{"name":"tom","age":18}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/2' -d'{"name":"jack","age":29}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/3' -d'{"name":"jessica","age":18}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/4' -d'{"name":"dave","age":19}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/5' -d'{"name":"lilei","age":18}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/6' -d'{"name":"lili","age":29}'
(2)下面是聚合统计样例代码,统计相同年龄的学员个数:
//获取 RestClient 连接 RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("192.168.121.128", 9200, "http"), new HttpHost("192.168.121.129", 9200, "http"), new HttpHost("192.168.121.130", 9200, "http"))); SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("stu"); //指定查询条件 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //指定分组信息,默认是执行 count 聚合 TermsAggregationBuilder aggregation = AggregationBuilders.terms("age_term") .field("age"); searchSourceBuilder.aggregation(aggregation); searchRequest.source(searchSourceBuilder); //执行查询操作 SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); //获取分组信息 Terms terms = searchResponse.getAggregations().get("age_term"); List<? extends Terms.Bucket> buckets = terms.getBuckets(); for (Terms.Bucket bucket: buckets) { System.out.println(bucket.getKey()+"---"+bucket.getDocCount()); } //关闭连接 client.close();
(3)程序运行后输出结果如下:
2,sum 聚合(统计每个学员的总成绩)
(1)假设我们需要统计统计每个学员的总成绩,首先调用 Elasticsearch 的 API 接口添加一些初始数据:
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/1' -d'{"name":"tom","subject":"chinese","score":59}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/2' -d'{"name":"tom","subject":"math","score":89}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/3' -d'{"name":"jack","subject":"chinese","score":78}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/4' -d'{"name":"jack","subject":"math","score":85}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/5' -d'{"name":"jessica","subject":"chinese","score":97}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/6' -d'{"name":"jessica","subject":"math","score":68}'
(2)下面是聚合统计样例代码,统计每个学员的总成绩:
//获取 RestClient 连接 RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("192.168.121.128", 9200, "http"), new HttpHost("192.168.121.129", 9200, "http"), new HttpHost("192.168.121.130", 9200, "http"))); SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("score"); //指定查询条件 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //指定分组和求 sum TermsAggregationBuilder aggregation = AggregationBuilders.terms("name_term") .field("name.keyword")//指定分组字段,如果是字符串(Text)类型,则需要指定使用 keyword 类型 //指定求 sum,也支持 avg、min、max 等操作 .subAggregation(AggregationBuilders.sum("sum_score").field("score")); searchSourceBuilder.aggregation(aggregation); searchRequest.source(searchSourceBuilder); //执行查询操作 SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); //获取分组信息 Terms terms = searchResponse.getAggregations().get("name_term"); List<? extends Terms.Bucket> buckets = terms.getBuckets(); for (Terms.Bucket bucket: buckets) { //获取 sum 聚合的结果 Sum sum = bucket.getAggregations().get("sum_score"); System.out.println(bucket.getKey()+"---"+sum.getValue()); } //关闭连接 client.close();
(3)程序运行后输出结果如下:
附:aggregations 获取所有分组数据
1,默认只返回 10 个分组
(1)默认情况下,ES 只会返回 10 个分组的数据。我们在上面样例 1(统计相同年龄的学员个数)的基础上再初始化一批测试数据:
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/61' -d'{"name":"s1","age":31}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/62' -d'{"name":"s2","age":32}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/63' -d'{"name":"s3","age":33}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/64' -d'{"name":"s4","age":34}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/65' -d'{"name":"s5","age":35}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/66' -d'{"name":"s6","age":36}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/67' -d'{"name":"s7","age":37}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/68' -d'{"name":"s8","age":38}' curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/69' -d'{"name":"s9","age":39}'
(2)再次执行样例 1 代码,发现结果中返回的分组个数是 10 个,没有全部都显示出来。
(3)即使我们尝试增加分页数,结果还是 10 条数据。说明分页参数对分组数据是无效的。
searchSourceBuilder.from(0).size(20);
2,获取指定分组个数数据
(1)我们可以在在聚合操作上使用 size 方法进行设置获取指定分组个数数据,比如下面代码最多可以获取到 20 个分组数据:
TermsAggregationBuilder aggregation = AggregationBuilders.terms("age_term") .field("age") .size(20);//获取指定分组个数的数据
- 运行结果如下,因为我们设置做多获取 20 个分组的数据,而结果一共只有 12 个分组,所以全都获取到了。
(2)如果前期不确定到底有多少个分组的数据,还想获取到所有分组的数据,此时可以在 size 中设置一个 Integer 的最大值,这样基本上就没什么问题了。
注意:如果最后的分组个数太多,会给 ES 造成比较大的压力,所以官方在这做了限制,让用户手工指定获取多少分组的数据。
TermsAggregationBuilder aggregation = AggregationBuilders.terms("age_term") .field("age") .size(Integer.MAX_VALUE);//获取指定分组个数的数据
全部评论(0)