返回 导航

SpringBoot / Cloud

hangge.com

SpringBoot - Elasticsearch Rest Client使用详解3(聚合统计、获取所有分组数据)

作者:hangge | 2024-12-18 08:38
    Elasticsearch 中可以实现基于字段进行分组聚合的统计,聚合操作支持 count()sum()avg()max()min() 等。本文通过样例分别进行演示。

五、aggregations 聚合统计

1,count 聚合(统计相同年龄的学员个数)

(1)假设我们需要统计相同年龄的学员个数,首先调用 ElasticsearchAPI 接口添加一些初始数据:
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/1' -d'{"name":"tom","age":18}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/2' -d'{"name":"jack","age":29}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/3' -d'{"name":"jessica","age":18}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/4' -d'{"name":"dave","age":19}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/5' -d'{"name":"lilei","age":18}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/6' -d'{"name":"lili","age":29}'

(2)下面是聚合统计样例代码,统计相同年龄的学员个数:
//获取 RestClient 连接
RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(
                new HttpHost("192.168.121.128", 9200, "http"),
                new HttpHost("192.168.121.129", 9200, "http"),
                new HttpHost("192.168.121.130", 9200, "http")));
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("stu");

//指定查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//指定分组信息,默认是执行 count 聚合
TermsAggregationBuilder aggregation = AggregationBuilders.terms("age_term")
        .field("age");
searchSourceBuilder.aggregation(aggregation);

searchRequest.source(searchSourceBuilder);

//执行查询操作
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//获取分组信息
Terms terms = searchResponse.getAggregations().get("age_term");
List<? extends Terms.Bucket> buckets = terms.getBuckets();
for (Terms.Bucket bucket: buckets) {
    System.out.println(bucket.getKey()+"---"+bucket.getDocCount());
}

//关闭连接
client.close();

(3)程序运行后输出结果如下:

2,sum 聚合(统计每个学员的总成绩)

(1)假设我们需要统计统计每个学员的总成绩,首先调用 ElasticsearchAPI 接口添加一些初始数据:
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/1' -d'{"name":"tom","subject":"chinese","score":59}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/2' -d'{"name":"tom","subject":"math","score":89}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/3' -d'{"name":"jack","subject":"chinese","score":78}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/4' -d'{"name":"jack","subject":"math","score":85}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/5' -d'{"name":"jessica","subject":"chinese","score":97}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/score/_doc/6' -d'{"name":"jessica","subject":"math","score":68}'

(2)下面是聚合统计样例代码,统计每个学员的总成绩:
//获取 RestClient 连接
RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(
                new HttpHost("192.168.121.128", 9200, "http"),
                new HttpHost("192.168.121.129", 9200, "http"),
                new HttpHost("192.168.121.130", 9200, "http")));
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("score");

//指定查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//指定分组和求 sum
TermsAggregationBuilder aggregation = AggregationBuilders.terms("name_term")
        .field("name.keyword")//指定分组字段,如果是字符串(Text)类型,则需要指定使用 keyword 类型
        //指定求 sum,也支持 avg、min、max 等操作
        .subAggregation(AggregationBuilders.sum("sum_score").field("score"));
searchSourceBuilder.aggregation(aggregation);

searchRequest.source(searchSourceBuilder);

//执行查询操作
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//获取分组信息
Terms terms = searchResponse.getAggregations().get("name_term");
List<? extends Terms.Bucket> buckets = terms.getBuckets();
for (Terms.Bucket bucket: buckets) {
    //获取 sum 聚合的结果
    Sum sum = bucket.getAggregations().get("sum_score");
    System.out.println(bucket.getKey()+"---"+sum.getValue());
}

//关闭连接
client.close();

(3)程序运行后输出结果如下:

附:aggregations 获取所有分组数据

1,默认只返回 10 个分组

(1)默认情况下,ES 只会返回 10 个分组的数据。我们在上面样例 1(统计相同年龄的学员个数)的基础上再初始化一批测试数据:
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/61' -d'{"name":"s1","age":31}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/62' -d'{"name":"s2","age":32}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/63' -d'{"name":"s3","age":33}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/64' -d'{"name":"s4","age":34}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/65' -d'{"name":"s5","age":35}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/66' -d'{"name":"s6","age":36}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/67' -d'{"name":"s7","age":37}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/68' -d'{"name":"s8","age":38}'
curl -H "Content-Type: application/json" -XPOST 'http://192.168.121.128:9200/stu/_doc/69' -d'{"name":"s9","age":39}'

(2)再次执行样例 1 代码,发现结果中返回的分组个数是 10 个,没有全部都显示出来。

(3)即使我们尝试增加分页数,结果还是 10 条数据。说明分页参数对分组数据是无效的。
searchSourceBuilder.from(0).size(20);

2,获取指定分组个数数据

(1)我们可以在在聚合操作上使用 size 方法进行设置获取指定分组个数数据,比如下面代码最多可以获取到 20 个分组数据:
TermsAggregationBuilder aggregation = AggregationBuilders.terms("age_term")
        .field("age")
        .size(20);//获取指定分组个数的数据

(2)如果前期不确定到底有多少个分组的数据,还想获取到所有分组的数据,此时可以在 size 中设置一个 Integer 的最大值,这样基本上就没什么问题了。
注意:如果最后的分组个数太多,会给 ES 造成比较大的压力,所以官方在这做了限制,让用户手工指定获取多少分组的数据。
TermsAggregationBuilder aggregation = AggregationBuilders.terms("age_term")
        .field("age")
        .size(Integer.MAX_VALUE);//获取指定分组个数的数据
评论

全部评论(0)

回到顶部