druid.git
3 months agoQueryScheduler: Log per-query message at DEBUG level. (#12467)
Gian Merlino [Fri, 22 Apr 2022 18:22:34 +0000 (11:22 -0700)] 
QueryScheduler: Log per-query message at DEBUG level. (#12467)

We generally want to avoid having any routine per-query messages at
INFO level, because they pollute logs.

3 months agostringFirst and stringLast supported in ingestion (#12466)
Victoria Lim [Fri, 22 Apr 2022 02:28:49 +0000 (19:28 -0700)] 
stringFirst and stringLast supported in ingestion (#12466)

3 months agoupdated docs for sql query context (#12406)
Victoria Lim [Thu, 21 Apr 2022 18:19:39 +0000 (11:19 -0700)] 
updated docs for sql query context (#12406)

3 months agoSupress CVE 2022 26612 (#12463)
Tejaswini Bandlamudi [Thu, 21 Apr 2022 15:48:20 +0000 (21:18 +0530)] 
Supress CVE 2022 26612 (#12463)

* supress CVE-2022-26612

* adding packageUrl

* suppressing CVE-2022-26612

* adding packageUrl

* moving to hadoop section

3 months agoAdd support for authorizing query context params (#12396)
Jihoon Son [Thu, 21 Apr 2022 08:51:16 +0000 (01:51 -0700)] 
Add support for authorizing query context params (#12396)

The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause
1) a bad UX if the context param is not matured yet or
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

{
  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    },
    "action" : "WRITE"
  },
  "resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.

3 months agoEmit vectorized metric dimension by default (#12464)
Rohan Garg [Thu, 21 Apr 2022 04:14:55 +0000 (09:44 +0530)] 
Emit vectorized metric dimension by default (#12464)

3 months agoFix GCS based ingestion if bucket name contains underscores (#12445)
Tejaswini Bandlamudi [Thu, 21 Apr 2022 03:52:35 +0000 (09:22 +0530)] 
Fix GCS based ingestion if bucket name contains underscores (#12445)

GCP allows bucket names to contain underscores. When a location in such a bucket
is mapped to `java.net.URI`, `URI.getHost()` returns null. `URI.getHost()` is used as
the bucket name in `CloudObjectLocation`, leading to an NPE.

This commit uses `URI.getAuthority()` as the bucket name if `URI.getHost()` is null.

3 months agoupdate httpclient due to cve (#12422)
PJ Fanning [Thu, 21 Apr 2022 02:12:19 +0000 (04:12 +0200)] 
update httpclient due to cve (#12422)

https://github.com/apache/druid/issues/12421

3 months agoissue-12426 upgrade k8s client due to cve (#12427)
PJ Fanning [Thu, 21 Apr 2022 02:11:55 +0000 (04:11 +0200)] 
issue-12426 upgrade k8s client due to cve (#12427)

* issue-12426 upgrade k8s client due to cve

* compile issues

* try to fix license check

3 months agoUpdating an error msg (#12450)
somu-imply [Wed, 20 Apr 2022 14:56:09 +0000 (07:56 -0700)] 
Updating an error msg (#12450)

* Updating an error msg

* Added an extra [] so removing it

3 months agoSuppress CVE-2021-43138 (#12437)
Jihoon Son [Tue, 19 Apr 2022 03:00:06 +0000 (20:00 -0700)] 
Suppress CVE-2021-43138 (#12437)

* Suppress CVE-2021-43138

* revert netty 3.10.5.Final

3 months agoDocument expression post-aggregators (#11896)
jacobtolar [Tue, 19 Apr 2022 02:36:19 +0000 (21:36 -0500)] 
Document expression post-aggregators (#11896)

* Document expression post-aggregators

* Update docs/querying/post-aggregations.md

Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
3 months agoRemove h2 database from dependency (#12447)
Frank Chen [Tue, 19 Apr 2022 02:25:17 +0000 (10:25 +0800)] 
Remove h2 database from dependency (#12447)

3 months agoDocument running it tests from intellij IDE (#12440)
TSFenwick [Tue, 19 Apr 2022 02:24:46 +0000 (19:24 -0700)] 
Document running it tests from intellij IDE (#12440)

* document running IT tests in intellij

* clean up unnecessary changes

* address comments

3 months agorecommendation for comparing strings and numbers (#12442)
Victoria Lim [Mon, 18 Apr 2022 16:28:32 +0000 (09:28 -0700)] 
recommendation for comparing strings and numbers (#12442)

3 months agoDocs - query caching (#11584)
Peter Marshall [Mon, 18 Apr 2022 09:00:21 +0000 (10:00 +0100)] 
Docs - query caching (#11584)

* Update caching.md

Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900

Update caching.md

A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300

* Update caching.md

Typos

* Amendments on the segment cache

Significant updates on content around the segment cache, pull process, and in-memory cache

* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update basic-cluster-tuning.md

typo

* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Whole-query caching update

Made more succinct and removed specific config to change.

* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
3 months agoFixes a small typo in ingestion spec doc (#12143)
Charles Smith [Mon, 18 Apr 2022 08:53:50 +0000 (01:53 -0700)] 
Fixes a small typo in ingestion spec doc (#12143)

* small typo

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: sthetland <steve.hetland@imply.io>
3 months agoFail fast incase a lookup load fails (#12397)
Rohan Garg [Mon, 18 Apr 2022 07:44:02 +0000 (13:14 +0530)] 
Fail fast incase a lookup load fails (#12397)

Currently while loading a lookup for the first time, loading threads blocks
for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs`
is long (like 10 minutes), such blocking can slow down the loading of other lookups.

This commit allows the thread to progress as soon as the loading of the lookup fails.

3 months agoDocs - added another common config property to tuningConfig (#11935)
Peter Marshall [Mon, 18 Apr 2022 05:41:39 +0000 (06:41 +0100)] 
Docs - added another common config property to tuningConfig (#11935)

* Update ingestion-spec.md

Added indexSpecForIntermediatePersists as a common configuration property.

* Update ingestion-spec.md

Amended to remove "below" and add link to the table.

* Update ingestion-spec.md

Removed passive.

3 months agoUpdate tutorial-compaction.md to change an unclear statement (#11988)
Alexandre BERTHIOT [Mon, 18 Apr 2022 05:25:09 +0000 (05:25 +0000)] 
Update tutorial-compaction.md to change an unclear statement (#11988)

* Update tutorial-compaction.md

Unclear statement on the explanation of tuningConfig section.

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
3 months agoFix bug in auto compaction preserveExistingMetrics feature (#12438)
Maytas Monsereenusorn [Fri, 15 Apr 2022 22:47:47 +0000 (15:47 -0700)] 
Fix bug in auto compaction preserveExistingMetrics feature (#12438)

* fix bug

* fix test

* fix IT

3 months agoMake tombstones ingestible by having them return an empty result set. (#12392)
Agustin Gonzalez [Fri, 15 Apr 2022 16:08:06 +0000 (09:08 -0700)] 
Make tombstones ingestible by having them return an empty result set. (#12392)

* Make tombstones ingestible by having them return an empty result set.

* Spotbug

* Coverage

* Coverage

* Remove unnecessary exception (checkstyle)

* Fix integration test and add one more to test dropExisting set to false over tombstones

* Force dropExisting to true in auto-compaction when the interval contains only tombstones

* Checkstyle, fix unit test

* Changed flag by mistake, fixing it

* Remove method from interface since this method is specific to only DruidSegmentInputentity

* Fix typo

* Adapt to latest code

* Update comments when only tombstones to compact

* Move empty iterator to a new DruidTombstoneSegmentReader

* Code review feedback

* Checkstyle

* Review feedback

* Coverage

3 months agoUse binary search to improve DimensionRangeShardSpec lookup (#12417)
hqx871 [Fri, 15 Apr 2022 16:07:06 +0000 (00:07 +0800)] 
Use binary search to improve DimensionRangeShardSpec lookup (#12417)

If there are many shards, mapper of IndexGeneratorJob seems to spend a lot of time in calling
DimensionRangeShardSpec.isInChunk to lookup target shard. This can be significantly improved
by using binary search instead of comparing an input row to every shardSpec.

Changes:
* Add `BaseDimensionRangeShardSpec` which provides a binary-search-based
   implementation for `createLookup`
* `DimensionRangeShardSpec`, `SingleDimensionShardSpec`, and
   `DimensionRangeBucketShardSpec` now extend `BaseDimensionRangeShardSpec`

3 months agoHandling planning with alias for time for group by and order by (#12418)
somu-imply [Fri, 15 Apr 2022 04:59:17 +0000 (21:59 -0700)] 
Handling planning with alias for time for group by and order by (#12418)

An outer scan query, that requires ordering on a column, should be considered an invalid query.

3 months agogood stuff (#12435)
Vadim Ogievetsky [Thu, 14 Apr 2022 07:23:06 +0000 (00:23 -0700)] 
good stuff (#12435)

3 months agofix issue with boolean expression input (#12429)
Clint Wylie [Wed, 13 Apr 2022 23:34:01 +0000 (16:34 -0700)] 
fix issue with boolean expression input (#12429)

3 months agoAdd docs to metric spec for auto compaction (#12415)
Maytas Monsereenusorn [Wed, 13 Apr 2022 20:27:00 +0000 (13:27 -0700)] 
Add docs to metric spec for auto compaction (#12415)

* add docs

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update index.md

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
3 months agoFix indexMerger to respect the includeAllDimensions flag (#12428)
Jihoon Son [Wed, 13 Apr 2022 19:43:11 +0000 (12:43 -0700)] 
Fix indexMerger to respect the includeAllDimensions flag (#12428)

* Fix indexMerger to respect flag includeAllDimensions flag; jsonInputFormat should set keepNullColumns if useFieldDiscovery is set

* address comments

3 months agoAdd Kinesis ListShards permission (#12387)
Katya Macedo [Wed, 13 Apr 2022 09:59:56 +0000 (04:59 -0500)] 
Add Kinesis ListShards permission (#12387)

* add Kinesis permission

* List Kinesis IAM permissions

* Adopt review suggestions

* Fix merge conflicts

3 months agoWeb console: Misc fixes and improvements (#12361)
Vadim Ogievetsky [Wed, 13 Apr 2022 05:20:28 +0000 (22:20 -0700)] 
Web console: Misc fixes and improvements  (#12361)

* Misc fixes

* pad column numbers

* make shard_type filterable

3 months agoCopy of #11309 with fixes (#12402)
Parag Jain [Mon, 11 Apr 2022 15:35:24 +0000 (21:05 +0530)] 
Copy of #11309 with fixes (#12402)

* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
3 months agoFix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial ...
Tiffany Yeh [Mon, 11 Apr 2022 14:58:09 +0000 (10:58 -0400)] 
Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248)

Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.

The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: https://github.com/zulu-openjdk/zulu-openjdk/blob/be45d20302e42df5aa95d2de078bb5e4214f5dba/centos/8u282-8.52.0.23/Dockerfile.

4 months agoBump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)
Jihoon Son [Sat, 9 Apr 2022 10:08:26 +0000 (03:08 -0700)] 
Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)

* Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724)

* update license file

4 months agoMake error messages for insert statements consistent with select statements (#12414)
Adarsh Sanjeev [Sat, 9 Apr 2022 06:51:40 +0000 (12:21 +0530)] 
Make error messages for insert statements consistent with select statements (#12414)

For a query like
INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement
INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL
returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this.

Add EOF to end of Druid SQL Insert statements
Rename SQL Insert statements in the parser to reflect the behaviour change

4 months agoImprove metrics for Auto Compaction (#12413)
Maytas Monsereenusorn [Sat, 9 Apr 2022 03:14:36 +0000 (20:14 -0700)] 
Improve metrics for Auto Compaction (#12413)

* add impl

* add docs

* fix

4 months agoAdd a new flag for ingestion to preserve existing metrics (#12185)
Maytas Monsereenusorn [Fri, 8 Apr 2022 18:02:02 +0000 (11:02 -0700)] 
Add a new flag for ingestion to preserve existing metrics (#12185)

* add impl

* add impl

* fix checkstyle

* add impl

* add unit test

* fix stuff

* fix stuff

* fix stuff

* add unit test

* add more unit tests

* add more unit tests

* add IT

* add IT

* add IT

* add IT

* add ITs

* address comments

* fix test

* fix test

* fix test

* address comments

* address comments

* address comments

* fix conflict

* fix checkstyle

* address comments

* fix test

* fix checkstyle

* fix test

* fix test

* fix IT

4 months agoUpdate index.md (#12390)
mark-imply [Fri, 8 Apr 2022 12:31:54 +0000 (06:31 -0600)] 
Update index.md (#12390)

Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.

4 months agoFix the other 2 python scripts that generates license. (#12340)
Didip Kerabat [Fri, 8 Apr 2022 11:13:17 +0000 (04:13 -0700)] 
Fix the other 2 python scripts that generates license. (#12340)

Fixes YAML.load_all issues on two of the Python scripts that generate license.

The broken Python files interfere with some of the Maven tasks.

4 months agoUpdate basic-cluster-tuning.md (#12412)
mark-imply [Fri, 8 Apr 2022 09:59:55 +0000 (03:59 -0600)] 
Update basic-cluster-tuning.md (#12412)

Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.

4 months agofix(docs): clarify what s3 permissions are needed based on the access management...
317brian [Thu, 7 Apr 2022 23:22:56 +0000 (16:22 -0700)] 
fix(docs): clarify what s3 permissions are needed based on the access management type (#12405)

* fix(docs): clarify what s3 permissions are needed based on the permissions model

* fix typo

* Update docs/development/extensions-core/s3.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
4 months agoBump minimist from 1.2.5 to 1.2.6 in /website (#12400)
dependabot[bot] [Thu, 7 Apr 2022 10:08:39 +0000 (03:08 -0700)] 
Bump minimist from 1.2.5 to 1.2.6 in /website (#12400)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoBump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)
dependabot[bot] [Wed, 6 Apr 2022 23:55:14 +0000 (16:55 -0700)] 
Bump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoclean up some bp3 classes (#12403)
Vadim Ogievetsky [Wed, 6 Apr 2022 22:27:44 +0000 (15:27 -0700)] 
clean up some bp3 classes (#12403)

4 months agoDocument data format and example for featureSpec (#12394)
Victoria Lim [Wed, 6 Apr 2022 22:17:15 +0000 (15:17 -0700)] 
Document data format and example for featureSpec (#12394)

* add data format and example for featureSpec

* add second feature in example

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agodocs(fix): add clarity around granularitySpec (#12362)
317brian [Wed, 6 Apr 2022 16:24:37 +0000 (09:24 -0700)] 
docs(fix): add clarity around granularitySpec (#12362)

* fix: add clarify around granularitySpec

* fix spacing

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
4 months agoDocument config for ingesting null columns (#12389)
Victoria Lim [Tue, 5 Apr 2022 16:15:42 +0000 (09:15 -0700)] 
Document config for ingesting null columns (#12389)

* config for ingesting null columns

* add link

* edit .spelling

* what happens if storeEmptyColumns is disabled

4 months agoupgrade surefire 3.0.0-M6 (#12395)
aggarwalakshay [Tue, 5 Apr 2022 06:56:15 +0000 (23:56 -0700)] 
upgrade surefire 3.0.0-M6 (#12395)

* upgrade surefire 3.0.0-M6

* increasing memory

4 months agoMethod to specify eternity in the scan query builder (#12223)
Paul Rogers [Mon, 4 Apr 2022 22:11:32 +0000 (15:11 -0700)] 
Method to specify eternity in the scan query builder (#12223)

* Method to specify eternity in the scan query builder

* Fix checkstyle issue

* Renamed eterity() to eternityInterval()

* Minor fixes

4 months agoBlueprint 4 (#12391)
John Gozde [Mon, 4 Apr 2022 17:34:22 +0000 (11:34 -0600)] 
Blueprint 4 (#12391)

* Update blueprint dependencies & LICENSES

* Switch to bp4 namespace; use bp-ns variable in overrides

* Add webpack alias for colors.scss

* Snapshots

* Update selectors in e2e tests

4 months agoPackage kinesis client jar within the extension (#12370)
AmatyaAvadhanula [Mon, 4 Apr 2022 16:01:18 +0000 (21:31 +0530)] 
Package kinesis client jar within the extension (#12370)

amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.

Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.

4 months agoIncrease default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE...
Tejaswini Bandlamudi [Mon, 4 Apr 2022 10:58:53 +0000 (16:28 +0530)] 
Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381)

The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.

The default value is now increased to Long.MAX_VALUE.

4 months agoAdd feature flag for Kinesis listShards API usage (#12383)
AmatyaAvadhanula [Mon, 4 Apr 2022 09:28:10 +0000 (14:58 +0530)] 
Add feature flag for Kinesis listShards API usage (#12383)

listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.

However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).

A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.

4 months agoIntroducing a new config to ignore nulls while computing String Cardinality (#12345)
somu-imply [Tue, 29 Mar 2022 21:31:36 +0000 (14:31 -0700)] 
Introducing a new config to ignore nulls while computing String Cardinality (#12345)

* Counting nulls in String cardinality with a config

* Adding tests for the new config

* Wrapping the vectorize part to allow backward compatibility

* Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes

* Updating testcase and code

* Adding null handling test to improve coverage

* Checkstyle fix

* Adding 1 more change in docs

* Making docs clearer

4 months agoDocs - S3 masking and nav update to S3 page (#11490)
Peter Marshall [Tue, 29 Mar 2022 16:13:05 +0000 (17:13 +0100)] 
Docs - S3 masking and nav update to S3 page (#11490)

* Docs: Masking S3 creds and some rewording

Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688

* Removed bold in one of the quote sections

* Update s3.md

* Update s3.md

Quick grammar change

* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md

Typo

* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md

Active lang

* Update s3.md

LAng nit

* Update native-batch.md

LAng nit

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Grammar tidy-up and link fix

Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes.

* Update docs/development/extensions-core/s3.md

* Update s3.md

Removed an Erroneous E

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agoDocs – expressions link back and timestamp hint (#11674)
Peter Marshall [Tue, 29 Mar 2022 16:12:30 +0000 (17:12 +0100)] 
Docs – expressions link back and timestamp hint (#11674)

* Update math-expr.md

Link back to transformSpec

* Update ingestion-spec.md

Moved info about using the timestamp inside transforms into the actual timestamp section.

* Update ingestion-spec.md

Active language.

4 months agoUpdate ingestion-spec.md (#12371)
mark-imply [Tue, 29 Mar 2022 16:12:02 +0000 (10:12 -0600)] 
Update ingestion-spec.md (#12371)

* Update ingestion-spec.md

Added best practice point to dimensions description.

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agoAdd an integration test for null-only columns (#12365)
Jihoon Son [Mon, 28 Mar 2022 23:40:45 +0000 (16:40 -0700)] 
Add an integration test for null-only columns (#12365)

* integration test for null-only-columns

* metadata query

* fix test

4 months agoDocs for request logging (#12363)
Victoria Lim [Mon, 28 Mar 2022 21:09:41 +0000 (14:09 -0700)] 
Docs for request logging (#12363)

* add docs for request logging

* remove stray character

* Update docs/operations/request-logging.md

Co-authored-by: TSFenwick <tsfenwick@gmail.com>
* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: TSFenwick <tsfenwick@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agofix messageGap metric (#12337)
Yuanli Han [Mon, 28 Mar 2022 16:21:06 +0000 (00:21 +0800)] 
fix messageGap metric (#12337)

4 months agoUse javaOptsArray provided in task context (#12326)
AmatyaAvadhanula [Mon, 28 Mar 2022 11:03:40 +0000 (16:33 +0530)] 
Use javaOptsArray provided in task context (#12326)

The `javaOpts` property is being read from task context but not `javaOptsArray`.
Changes:
- Read `javaOptsArray` from task context in `ForkingTaskRunner`.
- Add test to verify that `javaOptsArray` in task context takes precedence over `javaOpts`

4 months agoBump java-dogstatsd-client from 2.13.0 to 4.0.0 (#12353)
dependabot[bot] [Sat, 26 Mar 2022 23:25:13 +0000 (16:25 -0700)] 
Bump java-dogstatsd-client from 2.13.0 to 4.0.0 (#12353)

* Bump java-dogstatsd-client from 2.13.0 to 4.0.0
Bumps [java-dogstatsd-client](https://github.com/DataDog/java-dogstatsd-client) from 2.13.0 to 4.0.0.
- [Release notes](https://github.com/DataDog/java-dogstatsd-client/releases)
- [Changelog](https://github.com/DataDog/java-dogstatsd-client/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DataDog/java-dogstatsd-client/compare/v2.13.0...v4.0.0)

* migrate statsd-emitter tests from easymock to mockito
* add simple init test to make diff coverage happy

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
4 months agoDuties in Indexing group (such as Auto Compaction) does not report metrics (#12352)
Maytas Monsereenusorn [Thu, 24 Mar 2022 01:18:28 +0000 (18:18 -0700)] 
Duties in Indexing group (such as Auto Compaction) does not report metrics (#12352)

* add impl

* add unit tests

* fix checkstyle

* address comments

* fix checkstyle

4 months agoStore null columns in the segments (#12279)
Jihoon Son [Wed, 23 Mar 2022 23:54:04 +0000 (16:54 -0700)] 
Store null columns in the segments (#12279)

* Store null columns in the segments

* fix test

* remove NullNumericColumn and unused dependency

* fix compile failure

* use guava instead of apache commons

* split new tests

* unused imports

* address comments

4 months agoAdded support in urls, and grouped metrics (#12296)
syacobovitz [Tue, 22 Mar 2022 18:22:05 +0000 (20:22 +0200)] 
Added support in urls, and grouped metrics (#12296)

4 months agoFix OOM failures in dimension distribution phase of parallel indexing (#12331)
Kashif Faraz [Tue, 22 Mar 2022 13:58:15 +0000 (19:28 +0530)] 
Fix OOM failures in dimension distribution phase of parallel indexing (#12331)

Parallel indexing with range partitioning can often cause OOM in the
`ParallelIndexSupervisorTask` during the dimension distribution phase.
This typically happens because of too many `StringSketch` objects
obtained from the different `partial_dimension_distribution` sub-tasks.

We need not keep any of the sketches in memory until we need to compute
the PartitionBoundaries for the respective interval.

Changes
- Extract `StringDistribution` from `DimensionDistributionReport`s when they are received
  and write to disk inside the task/temp/distributions
- After all the subtasks have finished, iterate over all the intervals one by one
- For each interval, read the distributions from disk, merge them and create `PartitionBoundaries`.
- Cleanup task/temp/distributions directory when all `PartitionBoundaries` have been determined

4 months agoConvert inQueryThreshold into query context parameter. (#12357)
Adarsh Sanjeev [Tue, 22 Mar 2022 13:03:57 +0000 (18:33 +0530)] 
Convert inQueryThreshold into query context parameter. (#12357)

Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.

4 months agofix use of deprecated initMocks method (#12351)
Xavier Léauté [Sat, 19 Mar 2022 17:19:02 +0000 (10:19 -0700)] 
fix use of deprecated initMocks method (#12351)

follow-up to #12341
- fix use of deprecated initMocks methods and properly close mocks on teardown

4 months agoupgrade maven-pmd-plugin to fix warning (#12349)
Xavier Léauté [Sat, 19 Mar 2022 17:18:26 +0000 (10:18 -0700)] 
upgrade maven-pmd-plugin to fix warning (#12349)

we sometimes see warnings similar to the one mentioned
https://issues.apache.org/jira/browse/MPMD-325

Upgrading the plugin should hopefully reduce occurrence of those.

4 months agoBump slf4j.version from 1.7.12 to 1.7.36 (#11594)
dependabot[bot] [Fri, 18 Mar 2022 20:45:44 +0000 (13:45 -0700)] 
Bump slf4j.version from 1.7.12 to 1.7.36 (#11594)

Bump slf4j.version from 1.7.12 to 1.7.36

- [Release notes](Release notes: https://www.slf4j.org/news.html)

Updates `jcl-over-slf4j` from 1.7.12 to 1.7.36
- [Commits](https://github.com/qos-ch/slf4j/compare/v_1.7.12...v_1.7.36)

Updates `slf4j-simple` from 1.7.12 to 1.7.36
- [Commits](https://github.com/qos-ch/slf4j/compare/v_1.7.12...v_1.7.36)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
4 months agoFix auto compaction by adjusting compaction task's interval to align with segmentGran...
Maytas Monsereenusorn [Fri, 18 Mar 2022 19:46:16 +0000 (12:46 -0700)] 
Fix auto compaction by adjusting compaction task's interval to align with segmentGranularity when segmentGranularity is set (#12334)

* add impl

* add ITs

* address comments

* address comments

* address comments

* fix failure

* fix checkstyle

* fix checkstyle

4 months agoupdate surefire plugin to 3.0.0-M4 (#12342)
Xavier Léauté [Fri, 18 Mar 2022 15:20:28 +0000 (08:20 -0700)] 
update surefire plugin to 3.0.0-M4 (#12342)

stay on surefire 3.0.0-M4 until we can upgrade to 3.0.0-M6
with a fix for https://issues.apache.org/jira/browse/SUREFIRE-1815
causing issues in RetryUtilsTest.

4 months agoimprove test compatibility with Java 17 and remove deprecated methods (#12341)
Xavier Léauté [Fri, 18 Mar 2022 15:19:28 +0000 (08:19 -0700)] 
improve test compatibility with Java 17 and remove deprecated methods (#12341)

* remove use of reflection in EnvironmentVariableDynamicConfigProvider for Java 17 compatibility
* fix mocks mock objects not getting closed properly, causing issues with Java 17
* remove use of deprecated methods and rules in tests

4 months agoFix missing conversionFactor in prometheus emitter (#12338)
Aurélien Dunand [Fri, 18 Mar 2022 04:46:06 +0000 (05:46 +0100)] 
Fix missing conversionFactor in prometheus emitter (#12338)

query/node/ttfb metrics are in milliseconds.

4 months agofix build due to com.nimbusds:lang-tag update (#12348)
Xavier Léauté [Fri, 18 Mar 2022 00:44:08 +0000 (17:44 -0700)] 
fix build due to com.nimbusds:lang-tag update (#12348)

the version of com.nimbusds:oauth2-oidc-sdk we depend on does not
specific an exact version dependency for com.nimbusds:lang-tag, and
instead uses a version range (see
    https://search.maven.org/artifact/com.nimbusds/oauth2-oidc-sdk/6.5/jar)

Recently a new version of lang-tag was released requiring us to update
the license file accordingly.

4 months agoBump maven-site-plugin from 3.1 to 3.11.0 (#12310)
dependabot[bot] [Thu, 17 Mar 2022 07:17:29 +0000 (15:17 +0800)] 
Bump maven-site-plugin from 3.1 to 3.11.0 (#12310)

Bumps [maven-site-plugin](https://github.com/apache/maven-site-plugin) from 3.1 to 3.11.0.
- [Release notes](https://github.com/apache/maven-site-plugin/releases)
- [Commits](https://github.com/apache/maven-site-plugin/compare/maven-site-plugin-3.1...maven-site-plugin-3.11.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-site-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoFix a race condition in the '/tasks' Overlord API (#12330)
Jihoon Son [Thu, 17 Mar 2022 01:47:45 +0000 (10:47 +0900)] 
Fix a race condition in the '/tasks' Overlord API (#12330)

* finds complete and active tasks from the same snapshot

* overlord resource

* unit test

* integration test

* javadoc and cleanup

* more cleanup

* fix test and add more

4 months agoAdd JDK 11 (#12333)
Frank Chen [Wed, 16 Mar 2022 22:03:04 +0000 (06:03 +0800)] 
Add JDK 11 (#12333)

4 months agoAdding k8s support for human readable parsing (#12316)
Dr. Sizzles [Wed, 16 Mar 2022 03:18:47 +0000 (20:18 -0700)] 
Adding k8s support for human readable parsing (#12316)

* Adding k8s support for human readable parsing

* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>
* Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java

Co-authored-by: Frank Chen <frankchen@apache.org>
* Changes per review

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
4 months agoupgrade Error Prone to 2.11 (requires Java 11) (#12306)
Xavier Léauté [Tue, 15 Mar 2022 02:40:48 +0000 (19:40 -0700)] 
upgrade Error Prone to 2.11 (requires Java 11) (#12306)

The latest version of Error Prone now requires Java 11. Upgrading means we can
remove a lot of the maven profile complexity required to run checks with Java 8.
This also requires switching our strict build to use Java 11.

* update error-prone to 2.11
* remove need for specific maven profiles for Java 8 and Java 15
* fix additional Error Prone warnings with Java 11
* update strict build to use Java 11

4 months agoGraceful null handling and correctness in DoubleMean Aggregator (#12320)
somu-imply [Mon, 14 Mar 2022 23:52:47 +0000 (16:52 -0700)] 
Graceful null handling and correctness in DoubleMean Aggregator (#12320)

* Adding null handling for double mean aggregator

* Updating code to handle nulls in DoubleMean aggregator

* oops last one should have checkstyle issues. fixed

* Updating some code and test cases

* Checking on object is null in case of numeric aggregator

* Adding one more test to improve coverage

* Changing one test as asked in the review

* Changing one test as asked in the review for nulls

4 months agobug fix: merge results of group by limit push down (#11969)
mchades [Fri, 11 Mar 2022 17:04:34 +0000 (01:04 +0800)] 
bug fix: merge results of group by limit push down (#11969)

4 months agokubernetes: restart watch on null response (#12233)
Kyle Larose [Thu, 10 Mar 2022 20:56:40 +0000 (15:56 -0500)] 
kubernetes: restart watch on null response (#12233)

* kubernetes: restart watch on null response

Kubernetes watches allow a client to efficiently processes changes to
resources. However, they have some idiosyncrasies. In particular, they
can error out for various reasons leading to what would normally be seen
as an invalid result.

The Druid kubernetes node discovery subsystem does not handle a certain
case properly. The watch can return an item with a null object.  These
leads to a null pointer exception. When this happens, the provider needs
to restart the watch, because rerunning the watch from the same resource
version leads to the same result: yet another null pointer exception.

This commit changes the provider to handle null objects by restarting
the watch.

* review: add more coverage

This adds a bit more coverage to the K8sDruidNodeDiscoveryProvider watch
loop, and removes an unnecessay return.

* kubernetes: reduce logging verbosity

The log messages about items being NULL don't really deserve to be at a
level other than DEBUG since they are not actionable, particularly since
we automatically recover now. Move them to the DEBUG level.

4 months agoFix error message for groupByEnableMultiValueUnnesting. (#12325)
Gian Merlino [Thu, 10 Mar 2022 19:37:24 +0000 (11:37 -0800)] 
Fix error message for groupByEnableMultiValueUnnesting. (#12325)

* Fix error message for groupByEnableMultiValueUnnesting.

It referred to the incorrect context parameter.

Also, create a dedicated exception class, to allow easier detection of this
specific error.

* Fix other test.

* More better error messages.

* Test getDimensionName method.

5 months agofix supervisor auto scaler config serde bug (#12317)
Parag Jain [Thu, 10 Mar 2022 00:17:12 +0000 (05:47 +0530)] 
fix supervisor auto scaler config serde bug (#12317)

5 months agoGit hooks should fail on errors; pass args to git hooks (#12322)
Jihoon Son [Thu, 10 Mar 2022 00:07:50 +0000 (09:07 +0900)] 
Git hooks should fail on errors; pass args to git hooks (#12322)

* Git hooks should fail on errors

* don't set shell to pass args

5 months agoReuse the InputEntityReader in SettableByteEntityReader (#12269)
Abhishek Agarwal [Wed, 9 Mar 2022 22:38:31 +0000 (04:08 +0530)] 
Reuse the InputEntityReader in SettableByteEntityReader (#12269)

* Reuse the InputEntityReader in SettableByteEntityReader

* Fix logic

* Fix kafka streaming ingestion

* Add Tests for kafka input format change

* Address review comments

5 months agopush value range and set index get operations into BitmapIndex (#12315)
Clint Wylie [Wed, 9 Mar 2022 21:30:58 +0000 (13:30 -0800)] 
push value range and set index get operations into BitmapIndex (#12315)

* push value range and set index get operations into BitmapIndex

* fix bug

* oops, fix better

* better like, fix test, javadocs

* fix checkstyle

* simplify and fixes

* cache

* fix tests

* move indexOf into GenericIndexed

* oops

* fix tests

5 months agoFix join query incase of filter explosion during CNF conversion (#12324)
Rohan Garg [Wed, 9 Mar 2022 20:43:09 +0000 (02:13 +0530)] 
Fix join query incase of filter explosion during CNF conversion (#12324)

5 months agoimprove FileWriteOutBytes.readFully (#12323)
Clint Wylie [Wed, 9 Mar 2022 19:45:45 +0000 (11:45 -0800)] 
improve FileWriteOutBytes.readFully (#12323)

* improve FileWriteOutBytes.readFully

* no need to flush if out of bounds

5 months agoFacilitate lazy initialization of connections to mitigate overwhelming of Coordinator...
AmatyaAvadhanula [Wed, 9 Mar 2022 17:47:43 +0000 (23:17 +0530)] 
Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298)

Add config for eager / lazy connection initialization in ResourcePool

Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.

While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.

Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.

A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.

If set to false, lazy initialization of connection resources takes place.

NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR

Algorithm
The current implementation relies on the creation of maxSize resources eagerly.

The new implementation's behaviour is as follows:

If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.

5 months agoGuard against exponential increase of filters during CNF conversion (#12314)
Rohan Garg [Wed, 9 Mar 2022 07:49:52 +0000 (13:19 +0530)] 
Guard against exponential increase of filters during CNF conversion (#12314)

Currently, the CNF conversion of a filter is unbounded, which means that it can create as many filters as possible thereby also leading to OOMs in historical heap. We should throw an error or disable CNF conversion if the filter count starts getting out of hand. There are ways to do CNF conversion with linear increase in filters as well but that has been left out of the scope of this change since those algorithms add new variables in the predicate - which can be contentious.

5 months agouse a non-concurrent map for lookups-cached-global unless incremental updates are...
Clint Wylie [Wed, 9 Mar 2022 05:54:25 +0000 (21:54 -0800)] 
use a non-concurrent map for lookups-cached-global unless incremental updates are actually required (#12293)

* use a non-concurrent map for lookups-cached-global unless incremental updates are actually required
* adjustments
* fix test

5 months agoBatch ingestion replace (#12137)
Agustin Gonzalez [Wed, 9 Mar 2022 03:07:02 +0000 (20:07 -0700)] 
Batch ingestion replace (#12137)

* Tombstone support for replace functionality

* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec

* Update compaction test to match replace behavior

* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.

* Style plus simple queriableindex test

* Add segment cache loader tombstone test

* Add more tests

* Add a method to the LogicalSegment to test whether it has any data

* Test filter with some empty logical segments

* Refactor more compaction/dropexisting tests

* Code coverage

* Support for all empty segments

* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.

* Fix null ptr when segment does not have a queriable index

* Add support for empty replace interval (all input data has been filtered out)

* Fixed coverage & style

* Find tombstone versions from lock versions

* Test failures & style

* Interner was making this fail since the two segments were consider equal due to their id's being equal

* Cleanup tombstone version code

* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used

* Reject replace spec when input intervals are empty

* Documentation

* Style and unit test

* Restore test code deleted by mistake

* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.

* Unused imports. Dead code. Test coverage.

* Coverage.

* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.

* Fix OmniKiller + more test coverage.

* Tombstones are now marked using a shard spec

* Drop a segment factory.json in the segment cache for tombstones

* Style

* Style + coverage

* style

* Add TombstoneLoadSpec.class to mapper in test

* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java

Typo

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Update docs/configuration/index.md

Missing

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Typo

* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

* Range does not work with multi-dim

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
5 months agoadjust topn heap operation when string is dictionary encoded, but not uniquely (...
Clint Wylie [Tue, 8 Mar 2022 22:32:40 +0000 (14:32 -0800)] 
adjust topn heap operation when string is dictionary encoded, but not uniquely (#12291)

* add topn heap optimization when string is dictionary encoded, but not uniquely

* use array instead

* is same

* fix javadoc

* fix

* Update StringTopNColumnAggregatesProcessor.java

5 months agoAdd git hooks that can run multiple scripts (#12300)
Jihoon Son [Tue, 8 Mar 2022 22:16:47 +0000 (07:16 +0900)] 
Add git hooks that can run multiple scripts (#12300)

* Add git hooks that can run multiple scripts

* scripts to install/uninstall hooks

* better message for uninstall; support pre-push params

5 months agoGroupBy: Cap dictionary-building selector memory usage. (#12309)
Gian Merlino [Tue, 8 Mar 2022 21:13:11 +0000 (13:13 -0800)] 
GroupBy: Cap dictionary-building selector memory usage. (#12309)

* GroupBy: Cap dictionary-building selector memory usage.

New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.

Includes:

- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
  the v2SmallDictionary suite. (Both the selector dictionary and
  the merging dictionary will be small in that suite.)
- Tests for the new config parameter.

* Fix issues from tests.

* Add "pre-existing" to dictionary.

* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.

* Adjustments from review comments.

5 months agoBreak up parallel indexing unit test to reduce test times (#12313)
Kashif Faraz [Mon, 7 Mar 2022 23:26:24 +0000 (04:56 +0530)] 
Break up parallel indexing unit test to reduce test times (#12313)

* Break up parallel indexing unit test to reduce test times

* Fix checkstyle

5 months agoAlways reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)
Gian Merlino [Sat, 5 Mar 2022 22:39:14 +0000 (14:39 -0800)] 
Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)

* Always reopen stream in FileUtils.copyLarge, RetryingInputStream.

When an InputStream throws an exception from one of its read methods,
we should assume it's bad and reopen it.

The main changes here are:

- In FileUtils.copyLarge, replace InputStream with InputStreamSupplier.
- In RetryingInputStream, collapse retryCondition and resetCondition
  into a single condition. Also, make it required, since every usage
  is passing in a specific condition anyway.

* Test fixes.

* Fix read impl.

5 months agocorrect errors on compaction doc (#12308)
Victoria Lim [Fri, 4 Mar 2022 23:33:35 +0000 (15:33 -0800)] 
correct errors on compaction doc (#12308)

5 months agoOfficially support Java 11. (#12232)
Gian Merlino [Fri, 4 Mar 2022 22:15:45 +0000 (14:15 -0800)] 
Officially support Java 11. (#12232)

There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.

The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.