druid.git
3 months agoDocs - query caching (#11584)
Peter Marshall [Mon, 18 Apr 2022 09:00:21 +0000 (10:00 +0100)] 
Docs - query caching (#11584)

* Update caching.md

Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900

Update caching.md

A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300

* Update caching.md

Typos

* Amendments on the segment cache

Significant updates on content around the segment cache, pull process, and in-memory cache

* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update basic-cluster-tuning.md

typo

* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Whole-query caching update

Made more succinct and removed specific config to change.

* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
3 months agoFixes a small typo in ingestion spec doc (#12143)
Charles Smith [Mon, 18 Apr 2022 08:53:50 +0000 (01:53 -0700)] 
Fixes a small typo in ingestion spec doc (#12143)

* small typo

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: sthetland <steve.hetland@imply.io>
3 months agoFail fast incase a lookup load fails (#12397)
Rohan Garg [Mon, 18 Apr 2022 07:44:02 +0000 (13:14 +0530)] 
Fail fast incase a lookup load fails (#12397)

Currently while loading a lookup for the first time, loading threads blocks
for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs`
is long (like 10 minutes), such blocking can slow down the loading of other lookups.

This commit allows the thread to progress as soon as the loading of the lookup fails.

3 months agoDocs - added another common config property to tuningConfig (#11935)
Peter Marshall [Mon, 18 Apr 2022 05:41:39 +0000 (06:41 +0100)] 
Docs - added another common config property to tuningConfig (#11935)

* Update ingestion-spec.md

Added indexSpecForIntermediatePersists as a common configuration property.

* Update ingestion-spec.md

Amended to remove "below" and add link to the table.

* Update ingestion-spec.md

Removed passive.

3 months agoUpdate tutorial-compaction.md to change an unclear statement (#11988)
Alexandre BERTHIOT [Mon, 18 Apr 2022 05:25:09 +0000 (05:25 +0000)] 
Update tutorial-compaction.md to change an unclear statement (#11988)

* Update tutorial-compaction.md

Unclear statement on the explanation of tuningConfig section.

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
3 months agoFix bug in auto compaction preserveExistingMetrics feature (#12438)
Maytas Monsereenusorn [Fri, 15 Apr 2022 22:47:47 +0000 (15:47 -0700)] 
Fix bug in auto compaction preserveExistingMetrics feature (#12438)

* fix bug

* fix test

* fix IT

3 months agoMake tombstones ingestible by having them return an empty result set. (#12392)
Agustin Gonzalez [Fri, 15 Apr 2022 16:08:06 +0000 (09:08 -0700)] 
Make tombstones ingestible by having them return an empty result set. (#12392)

* Make tombstones ingestible by having them return an empty result set.

* Spotbug

* Coverage

* Coverage

* Remove unnecessary exception (checkstyle)

* Fix integration test and add one more to test dropExisting set to false over tombstones

* Force dropExisting to true in auto-compaction when the interval contains only tombstones

* Checkstyle, fix unit test

* Changed flag by mistake, fixing it

* Remove method from interface since this method is specific to only DruidSegmentInputentity

* Fix typo

* Adapt to latest code

* Update comments when only tombstones to compact

* Move empty iterator to a new DruidTombstoneSegmentReader

* Code review feedback

* Checkstyle

* Review feedback

* Coverage

3 months agoUse binary search to improve DimensionRangeShardSpec lookup (#12417)
hqx871 [Fri, 15 Apr 2022 16:07:06 +0000 (00:07 +0800)] 
Use binary search to improve DimensionRangeShardSpec lookup (#12417)

If there are many shards, mapper of IndexGeneratorJob seems to spend a lot of time in calling
DimensionRangeShardSpec.isInChunk to lookup target shard. This can be significantly improved
by using binary search instead of comparing an input row to every shardSpec.

Changes:
* Add `BaseDimensionRangeShardSpec` which provides a binary-search-based
   implementation for `createLookup`
* `DimensionRangeShardSpec`, `SingleDimensionShardSpec`, and
   `DimensionRangeBucketShardSpec` now extend `BaseDimensionRangeShardSpec`

3 months agoHandling planning with alias for time for group by and order by (#12418)
somu-imply [Fri, 15 Apr 2022 04:59:17 +0000 (21:59 -0700)] 
Handling planning with alias for time for group by and order by (#12418)

An outer scan query, that requires ordering on a column, should be considered an invalid query.

3 months agogood stuff (#12435)
Vadim Ogievetsky [Thu, 14 Apr 2022 07:23:06 +0000 (00:23 -0700)] 
good stuff (#12435)

3 months agofix issue with boolean expression input (#12429)
Clint Wylie [Wed, 13 Apr 2022 23:34:01 +0000 (16:34 -0700)] 
fix issue with boolean expression input (#12429)

3 months agoAdd docs to metric spec for auto compaction (#12415)
Maytas Monsereenusorn [Wed, 13 Apr 2022 20:27:00 +0000 (13:27 -0700)] 
Add docs to metric spec for auto compaction (#12415)

* add docs

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update index.md

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
3 months agoFix indexMerger to respect the includeAllDimensions flag (#12428)
Jihoon Son [Wed, 13 Apr 2022 19:43:11 +0000 (12:43 -0700)] 
Fix indexMerger to respect the includeAllDimensions flag (#12428)

* Fix indexMerger to respect flag includeAllDimensions flag; jsonInputFormat should set keepNullColumns if useFieldDiscovery is set

* address comments

3 months agoAdd Kinesis ListShards permission (#12387)
Katya Macedo [Wed, 13 Apr 2022 09:59:56 +0000 (04:59 -0500)] 
Add Kinesis ListShards permission (#12387)

* add Kinesis permission

* List Kinesis IAM permissions

* Adopt review suggestions

* Fix merge conflicts

3 months agoWeb console: Misc fixes and improvements (#12361)
Vadim Ogievetsky [Wed, 13 Apr 2022 05:20:28 +0000 (22:20 -0700)] 
Web console: Misc fixes and improvements  (#12361)

* Misc fixes

* pad column numbers

* make shard_type filterable

3 months agoCopy of #11309 with fixes (#12402)
Parag Jain [Mon, 11 Apr 2022 15:35:24 +0000 (21:05 +0530)] 
Copy of #11309 with fixes (#12402)

* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
3 months agoFix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial ...
Tiffany Yeh [Mon, 11 Apr 2022 14:58:09 +0000 (10:58 -0400)] 
Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248)

Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.

The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: https://github.com/zulu-openjdk/zulu-openjdk/blob/be45d20302e42df5aa95d2de078bb5e4214f5dba/centos/8u282-8.52.0.23/Dockerfile.

4 months agoBump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)
Jihoon Son [Sat, 9 Apr 2022 10:08:26 +0000 (03:08 -0700)] 
Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)

* Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724)

* update license file

4 months agoMake error messages for insert statements consistent with select statements (#12414)
Adarsh Sanjeev [Sat, 9 Apr 2022 06:51:40 +0000 (12:21 +0530)] 
Make error messages for insert statements consistent with select statements (#12414)

For a query like
INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement
INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL
returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this.

Add EOF to end of Druid SQL Insert statements
Rename SQL Insert statements in the parser to reflect the behaviour change

4 months agoImprove metrics for Auto Compaction (#12413)
Maytas Monsereenusorn [Sat, 9 Apr 2022 03:14:36 +0000 (20:14 -0700)] 
Improve metrics for Auto Compaction (#12413)

* add impl

* add docs

* fix

4 months agoAdd a new flag for ingestion to preserve existing metrics (#12185)
Maytas Monsereenusorn [Fri, 8 Apr 2022 18:02:02 +0000 (11:02 -0700)] 
Add a new flag for ingestion to preserve existing metrics (#12185)

* add impl

* add impl

* fix checkstyle

* add impl

* add unit test

* fix stuff

* fix stuff

* fix stuff

* add unit test

* add more unit tests

* add more unit tests

* add IT

* add IT

* add IT

* add IT

* add ITs

* address comments

* fix test

* fix test

* fix test

* address comments

* address comments

* address comments

* fix conflict

* fix checkstyle

* address comments

* fix test

* fix checkstyle

* fix test

* fix test

* fix IT

4 months agoUpdate index.md (#12390)
mark-imply [Fri, 8 Apr 2022 12:31:54 +0000 (06:31 -0600)] 
Update index.md (#12390)

Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.

4 months agoFix the other 2 python scripts that generates license. (#12340)
Didip Kerabat [Fri, 8 Apr 2022 11:13:17 +0000 (04:13 -0700)] 
Fix the other 2 python scripts that generates license. (#12340)

Fixes YAML.load_all issues on two of the Python scripts that generate license.

The broken Python files interfere with some of the Maven tasks.

4 months agoUpdate basic-cluster-tuning.md (#12412)
mark-imply [Fri, 8 Apr 2022 09:59:55 +0000 (03:59 -0600)] 
Update basic-cluster-tuning.md (#12412)

Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.

4 months agofix(docs): clarify what s3 permissions are needed based on the access management...
317brian [Thu, 7 Apr 2022 23:22:56 +0000 (16:22 -0700)] 
fix(docs): clarify what s3 permissions are needed based on the access management type (#12405)

* fix(docs): clarify what s3 permissions are needed based on the permissions model

* fix typo

* Update docs/development/extensions-core/s3.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
4 months agoBump minimist from 1.2.5 to 1.2.6 in /website (#12400)
dependabot[bot] [Thu, 7 Apr 2022 10:08:39 +0000 (03:08 -0700)] 
Bump minimist from 1.2.5 to 1.2.6 in /website (#12400)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoBump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)
dependabot[bot] [Wed, 6 Apr 2022 23:55:14 +0000 (16:55 -0700)] 
Bump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoclean up some bp3 classes (#12403)
Vadim Ogievetsky [Wed, 6 Apr 2022 22:27:44 +0000 (15:27 -0700)] 
clean up some bp3 classes (#12403)

4 months agoDocument data format and example for featureSpec (#12394)
Victoria Lim [Wed, 6 Apr 2022 22:17:15 +0000 (15:17 -0700)] 
Document data format and example for featureSpec (#12394)

* add data format and example for featureSpec

* add second feature in example

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agodocs(fix): add clarity around granularitySpec (#12362)
317brian [Wed, 6 Apr 2022 16:24:37 +0000 (09:24 -0700)] 
docs(fix): add clarity around granularitySpec (#12362)

* fix: add clarify around granularitySpec

* fix spacing

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
4 months agoDocument config for ingesting null columns (#12389)
Victoria Lim [Tue, 5 Apr 2022 16:15:42 +0000 (09:15 -0700)] 
Document config for ingesting null columns (#12389)

* config for ingesting null columns

* add link

* edit .spelling

* what happens if storeEmptyColumns is disabled

4 months agoupgrade surefire 3.0.0-M6 (#12395)
aggarwalakshay [Tue, 5 Apr 2022 06:56:15 +0000 (23:56 -0700)] 
upgrade surefire 3.0.0-M6 (#12395)

* upgrade surefire 3.0.0-M6

* increasing memory

4 months agoMethod to specify eternity in the scan query builder (#12223)
Paul Rogers [Mon, 4 Apr 2022 22:11:32 +0000 (15:11 -0700)] 
Method to specify eternity in the scan query builder (#12223)

* Method to specify eternity in the scan query builder

* Fix checkstyle issue

* Renamed eterity() to eternityInterval()

* Minor fixes

4 months agoBlueprint 4 (#12391)
John Gozde [Mon, 4 Apr 2022 17:34:22 +0000 (11:34 -0600)] 
Blueprint 4 (#12391)

* Update blueprint dependencies & LICENSES

* Switch to bp4 namespace; use bp-ns variable in overrides

* Add webpack alias for colors.scss

* Snapshots

* Update selectors in e2e tests

4 months agoPackage kinesis client jar within the extension (#12370)
AmatyaAvadhanula [Mon, 4 Apr 2022 16:01:18 +0000 (21:31 +0530)] 
Package kinesis client jar within the extension (#12370)

amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.

Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.

4 months agoIncrease default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE...
Tejaswini Bandlamudi [Mon, 4 Apr 2022 10:58:53 +0000 (16:28 +0530)] 
Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381)

The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.

The default value is now increased to Long.MAX_VALUE.

4 months agoAdd feature flag for Kinesis listShards API usage (#12383)
AmatyaAvadhanula [Mon, 4 Apr 2022 09:28:10 +0000 (14:58 +0530)] 
Add feature flag for Kinesis listShards API usage (#12383)

listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.

However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).

A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.

4 months agoIntroducing a new config to ignore nulls while computing String Cardinality (#12345)
somu-imply [Tue, 29 Mar 2022 21:31:36 +0000 (14:31 -0700)] 
Introducing a new config to ignore nulls while computing String Cardinality (#12345)

* Counting nulls in String cardinality with a config

* Adding tests for the new config

* Wrapping the vectorize part to allow backward compatibility

* Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes

* Updating testcase and code

* Adding null handling test to improve coverage

* Checkstyle fix

* Adding 1 more change in docs

* Making docs clearer

4 months agoDocs - S3 masking and nav update to S3 page (#11490)
Peter Marshall [Tue, 29 Mar 2022 16:13:05 +0000 (17:13 +0100)] 
Docs - S3 masking and nav update to S3 page (#11490)

* Docs: Masking S3 creds and some rewording

Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688

* Removed bold in one of the quote sections

* Update s3.md

* Update s3.md

Quick grammar change

* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md

Typo

* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md

Active lang

* Update s3.md

LAng nit

* Update native-batch.md

LAng nit

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Grammar tidy-up and link fix

Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes.

* Update docs/development/extensions-core/s3.md

* Update s3.md

Removed an Erroneous E

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agoDocs – expressions link back and timestamp hint (#11674)
Peter Marshall [Tue, 29 Mar 2022 16:12:30 +0000 (17:12 +0100)] 
Docs – expressions link back and timestamp hint (#11674)

* Update math-expr.md

Link back to transformSpec

* Update ingestion-spec.md

Moved info about using the timestamp inside transforms into the actual timestamp section.

* Update ingestion-spec.md

Active language.

4 months agoUpdate ingestion-spec.md (#12371)
mark-imply [Tue, 29 Mar 2022 16:12:02 +0000 (10:12 -0600)] 
Update ingestion-spec.md (#12371)

* Update ingestion-spec.md

Added best practice point to dimensions description.

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agoAdd an integration test for null-only columns (#12365)
Jihoon Son [Mon, 28 Mar 2022 23:40:45 +0000 (16:40 -0700)] 
Add an integration test for null-only columns (#12365)

* integration test for null-only-columns

* metadata query

* fix test

4 months agoDocs for request logging (#12363)
Victoria Lim [Mon, 28 Mar 2022 21:09:41 +0000 (14:09 -0700)] 
Docs for request logging (#12363)

* add docs for request logging

* remove stray character

* Update docs/operations/request-logging.md

Co-authored-by: TSFenwick <tsfenwick@gmail.com>
* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: TSFenwick <tsfenwick@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agofix messageGap metric (#12337)
Yuanli Han [Mon, 28 Mar 2022 16:21:06 +0000 (00:21 +0800)] 
fix messageGap metric (#12337)

4 months agoUse javaOptsArray provided in task context (#12326)
AmatyaAvadhanula [Mon, 28 Mar 2022 11:03:40 +0000 (16:33 +0530)] 
Use javaOptsArray provided in task context (#12326)

The `javaOpts` property is being read from task context but not `javaOptsArray`.
Changes:
- Read `javaOptsArray` from task context in `ForkingTaskRunner`.
- Add test to verify that `javaOptsArray` in task context takes precedence over `javaOpts`

4 months agoBump java-dogstatsd-client from 2.13.0 to 4.0.0 (#12353)
dependabot[bot] [Sat, 26 Mar 2022 23:25:13 +0000 (16:25 -0700)] 
Bump java-dogstatsd-client from 2.13.0 to 4.0.0 (#12353)

* Bump java-dogstatsd-client from 2.13.0 to 4.0.0
Bumps [java-dogstatsd-client](https://github.com/DataDog/java-dogstatsd-client) from 2.13.0 to 4.0.0.
- [Release notes](https://github.com/DataDog/java-dogstatsd-client/releases)
- [Changelog](https://github.com/DataDog/java-dogstatsd-client/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DataDog/java-dogstatsd-client/compare/v2.13.0...v4.0.0)

* migrate statsd-emitter tests from easymock to mockito
* add simple init test to make diff coverage happy

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
4 months agoDuties in Indexing group (such as Auto Compaction) does not report metrics (#12352)
Maytas Monsereenusorn [Thu, 24 Mar 2022 01:18:28 +0000 (18:18 -0700)] 
Duties in Indexing group (such as Auto Compaction) does not report metrics (#12352)

* add impl

* add unit tests

* fix checkstyle

* address comments

* fix checkstyle

4 months agoStore null columns in the segments (#12279)
Jihoon Son [Wed, 23 Mar 2022 23:54:04 +0000 (16:54 -0700)] 
Store null columns in the segments (#12279)

* Store null columns in the segments

* fix test

* remove NullNumericColumn and unused dependency

* fix compile failure

* use guava instead of apache commons

* split new tests

* unused imports

* address comments

4 months agoAdded support in urls, and grouped metrics (#12296)
syacobovitz [Tue, 22 Mar 2022 18:22:05 +0000 (20:22 +0200)] 
Added support in urls, and grouped metrics (#12296)

4 months agoFix OOM failures in dimension distribution phase of parallel indexing (#12331)
Kashif Faraz [Tue, 22 Mar 2022 13:58:15 +0000 (19:28 +0530)] 
Fix OOM failures in dimension distribution phase of parallel indexing (#12331)

Parallel indexing with range partitioning can often cause OOM in the
`ParallelIndexSupervisorTask` during the dimension distribution phase.
This typically happens because of too many `StringSketch` objects
obtained from the different `partial_dimension_distribution` sub-tasks.

We need not keep any of the sketches in memory until we need to compute
the PartitionBoundaries for the respective interval.

Changes
- Extract `StringDistribution` from `DimensionDistributionReport`s when they are received
  and write to disk inside the task/temp/distributions
- After all the subtasks have finished, iterate over all the intervals one by one
- For each interval, read the distributions from disk, merge them and create `PartitionBoundaries`.
- Cleanup task/temp/distributions directory when all `PartitionBoundaries` have been determined

4 months agoConvert inQueryThreshold into query context parameter. (#12357)
Adarsh Sanjeev [Tue, 22 Mar 2022 13:03:57 +0000 (18:33 +0530)] 
Convert inQueryThreshold into query context parameter. (#12357)

Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.

4 months agofix use of deprecated initMocks method (#12351)
Xavier Léauté [Sat, 19 Mar 2022 17:19:02 +0000 (10:19 -0700)] 
fix use of deprecated initMocks method (#12351)

follow-up to #12341
- fix use of deprecated initMocks methods and properly close mocks on teardown

4 months agoupgrade maven-pmd-plugin to fix warning (#12349)
Xavier Léauté [Sat, 19 Mar 2022 17:18:26 +0000 (10:18 -0700)] 
upgrade maven-pmd-plugin to fix warning (#12349)

we sometimes see warnings similar to the one mentioned
https://issues.apache.org/jira/browse/MPMD-325

Upgrading the plugin should hopefully reduce occurrence of those.

4 months agoBump slf4j.version from 1.7.12 to 1.7.36 (#11594)
dependabot[bot] [Fri, 18 Mar 2022 20:45:44 +0000 (13:45 -0700)] 
Bump slf4j.version from 1.7.12 to 1.7.36 (#11594)

Bump slf4j.version from 1.7.12 to 1.7.36

- [Release notes](Release notes: https://www.slf4j.org/news.html)

Updates `jcl-over-slf4j` from 1.7.12 to 1.7.36
- [Commits](https://github.com/qos-ch/slf4j/compare/v_1.7.12...v_1.7.36)

Updates `slf4j-simple` from 1.7.12 to 1.7.36
- [Commits](https://github.com/qos-ch/slf4j/compare/v_1.7.12...v_1.7.36)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
4 months agoFix auto compaction by adjusting compaction task's interval to align with segmentGran...
Maytas Monsereenusorn [Fri, 18 Mar 2022 19:46:16 +0000 (12:46 -0700)] 
Fix auto compaction by adjusting compaction task's interval to align with segmentGranularity when segmentGranularity is set (#12334)

* add impl

* add ITs

* address comments

* address comments

* address comments

* fix failure

* fix checkstyle

* fix checkstyle

4 months agoupdate surefire plugin to 3.0.0-M4 (#12342)
Xavier Léauté [Fri, 18 Mar 2022 15:20:28 +0000 (08:20 -0700)] 
update surefire plugin to 3.0.0-M4 (#12342)

stay on surefire 3.0.0-M4 until we can upgrade to 3.0.0-M6
with a fix for https://issues.apache.org/jira/browse/SUREFIRE-1815
causing issues in RetryUtilsTest.

4 months agoimprove test compatibility with Java 17 and remove deprecated methods (#12341)
Xavier Léauté [Fri, 18 Mar 2022 15:19:28 +0000 (08:19 -0700)] 
improve test compatibility with Java 17 and remove deprecated methods (#12341)

* remove use of reflection in EnvironmentVariableDynamicConfigProvider for Java 17 compatibility
* fix mocks mock objects not getting closed properly, causing issues with Java 17
* remove use of deprecated methods and rules in tests

4 months agoFix missing conversionFactor in prometheus emitter (#12338)
Aurélien Dunand [Fri, 18 Mar 2022 04:46:06 +0000 (05:46 +0100)] 
Fix missing conversionFactor in prometheus emitter (#12338)

query/node/ttfb metrics are in milliseconds.

4 months agofix build due to com.nimbusds:lang-tag update (#12348)
Xavier Léauté [Fri, 18 Mar 2022 00:44:08 +0000 (17:44 -0700)] 
fix build due to com.nimbusds:lang-tag update (#12348)

the version of com.nimbusds:oauth2-oidc-sdk we depend on does not
specific an exact version dependency for com.nimbusds:lang-tag, and
instead uses a version range (see
    https://search.maven.org/artifact/com.nimbusds/oauth2-oidc-sdk/6.5/jar)

Recently a new version of lang-tag was released requiring us to update
the license file accordingly.

4 months agoBump maven-site-plugin from 3.1 to 3.11.0 (#12310)
dependabot[bot] [Thu, 17 Mar 2022 07:17:29 +0000 (15:17 +0800)] 
Bump maven-site-plugin from 3.1 to 3.11.0 (#12310)

Bumps [maven-site-plugin](https://github.com/apache/maven-site-plugin) from 3.1 to 3.11.0.
- [Release notes](https://github.com/apache/maven-site-plugin/releases)
- [Commits](https://github.com/apache/maven-site-plugin/compare/maven-site-plugin-3.1...maven-site-plugin-3.11.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-site-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoFix a race condition in the '/tasks' Overlord API (#12330)
Jihoon Son [Thu, 17 Mar 2022 01:47:45 +0000 (10:47 +0900)] 
Fix a race condition in the '/tasks' Overlord API (#12330)

* finds complete and active tasks from the same snapshot

* overlord resource

* unit test

* integration test

* javadoc and cleanup

* more cleanup

* fix test and add more

4 months agoAdd JDK 11 (#12333)
Frank Chen [Wed, 16 Mar 2022 22:03:04 +0000 (06:03 +0800)] 
Add JDK 11 (#12333)

4 months agoAdding k8s support for human readable parsing (#12316)
Dr. Sizzles [Wed, 16 Mar 2022 03:18:47 +0000 (20:18 -0700)] 
Adding k8s support for human readable parsing (#12316)

* Adding k8s support for human readable parsing

* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>
* Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java

Co-authored-by: Frank Chen <frankchen@apache.org>
* Changes per review

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
4 months agoupgrade Error Prone to 2.11 (requires Java 11) (#12306)
Xavier Léauté [Tue, 15 Mar 2022 02:40:48 +0000 (19:40 -0700)] 
upgrade Error Prone to 2.11 (requires Java 11) (#12306)

The latest version of Error Prone now requires Java 11. Upgrading means we can
remove a lot of the maven profile complexity required to run checks with Java 8.
This also requires switching our strict build to use Java 11.

* update error-prone to 2.11
* remove need for specific maven profiles for Java 8 and Java 15
* fix additional Error Prone warnings with Java 11
* update strict build to use Java 11

4 months agoGraceful null handling and correctness in DoubleMean Aggregator (#12320)
somu-imply [Mon, 14 Mar 2022 23:52:47 +0000 (16:52 -0700)] 
Graceful null handling and correctness in DoubleMean Aggregator (#12320)

* Adding null handling for double mean aggregator

* Updating code to handle nulls in DoubleMean aggregator

* oops last one should have checkstyle issues. fixed

* Updating some code and test cases

* Checking on object is null in case of numeric aggregator

* Adding one more test to improve coverage

* Changing one test as asked in the review

* Changing one test as asked in the review for nulls

4 months agobug fix: merge results of group by limit push down (#11969)
mchades [Fri, 11 Mar 2022 17:04:34 +0000 (01:04 +0800)] 
bug fix: merge results of group by limit push down (#11969)

4 months agokubernetes: restart watch on null response (#12233)
Kyle Larose [Thu, 10 Mar 2022 20:56:40 +0000 (15:56 -0500)] 
kubernetes: restart watch on null response (#12233)

* kubernetes: restart watch on null response

Kubernetes watches allow a client to efficiently processes changes to
resources. However, they have some idiosyncrasies. In particular, they
can error out for various reasons leading to what would normally be seen
as an invalid result.

The Druid kubernetes node discovery subsystem does not handle a certain
case properly. The watch can return an item with a null object.  These
leads to a null pointer exception. When this happens, the provider needs
to restart the watch, because rerunning the watch from the same resource
version leads to the same result: yet another null pointer exception.

This commit changes the provider to handle null objects by restarting
the watch.

* review: add more coverage

This adds a bit more coverage to the K8sDruidNodeDiscoveryProvider watch
loop, and removes an unnecessay return.

* kubernetes: reduce logging verbosity

The log messages about items being NULL don't really deserve to be at a
level other than DEBUG since they are not actionable, particularly since
we automatically recover now. Move them to the DEBUG level.

4 months agoFix error message for groupByEnableMultiValueUnnesting. (#12325)
Gian Merlino [Thu, 10 Mar 2022 19:37:24 +0000 (11:37 -0800)] 
Fix error message for groupByEnableMultiValueUnnesting. (#12325)

* Fix error message for groupByEnableMultiValueUnnesting.

It referred to the incorrect context parameter.

Also, create a dedicated exception class, to allow easier detection of this
specific error.

* Fix other test.

* More better error messages.

* Test getDimensionName method.

5 months agofix supervisor auto scaler config serde bug (#12317)
Parag Jain [Thu, 10 Mar 2022 00:17:12 +0000 (05:47 +0530)] 
fix supervisor auto scaler config serde bug (#12317)

5 months agoGit hooks should fail on errors; pass args to git hooks (#12322)
Jihoon Son [Thu, 10 Mar 2022 00:07:50 +0000 (09:07 +0900)] 
Git hooks should fail on errors; pass args to git hooks (#12322)

* Git hooks should fail on errors

* don't set shell to pass args

5 months agoReuse the InputEntityReader in SettableByteEntityReader (#12269)
Abhishek Agarwal [Wed, 9 Mar 2022 22:38:31 +0000 (04:08 +0530)] 
Reuse the InputEntityReader in SettableByteEntityReader (#12269)

* Reuse the InputEntityReader in SettableByteEntityReader

* Fix logic

* Fix kafka streaming ingestion

* Add Tests for kafka input format change

* Address review comments

5 months agopush value range and set index get operations into BitmapIndex (#12315)
Clint Wylie [Wed, 9 Mar 2022 21:30:58 +0000 (13:30 -0800)] 
push value range and set index get operations into BitmapIndex (#12315)

* push value range and set index get operations into BitmapIndex

* fix bug

* oops, fix better

* better like, fix test, javadocs

* fix checkstyle

* simplify and fixes

* cache

* fix tests

* move indexOf into GenericIndexed

* oops

* fix tests

5 months agoFix join query incase of filter explosion during CNF conversion (#12324)
Rohan Garg [Wed, 9 Mar 2022 20:43:09 +0000 (02:13 +0530)] 
Fix join query incase of filter explosion during CNF conversion (#12324)

5 months agoimprove FileWriteOutBytes.readFully (#12323)
Clint Wylie [Wed, 9 Mar 2022 19:45:45 +0000 (11:45 -0800)] 
improve FileWriteOutBytes.readFully (#12323)

* improve FileWriteOutBytes.readFully

* no need to flush if out of bounds

5 months agoFacilitate lazy initialization of connections to mitigate overwhelming of Coordinator...
AmatyaAvadhanula [Wed, 9 Mar 2022 17:47:43 +0000 (23:17 +0530)] 
Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298)

Add config for eager / lazy connection initialization in ResourcePool

Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.

While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.

Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.

A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.

If set to false, lazy initialization of connection resources takes place.

NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR

Algorithm
The current implementation relies on the creation of maxSize resources eagerly.

The new implementation's behaviour is as follows:

If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.

5 months agoGuard against exponential increase of filters during CNF conversion (#12314)
Rohan Garg [Wed, 9 Mar 2022 07:49:52 +0000 (13:19 +0530)] 
Guard against exponential increase of filters during CNF conversion (#12314)

Currently, the CNF conversion of a filter is unbounded, which means that it can create as many filters as possible thereby also leading to OOMs in historical heap. We should throw an error or disable CNF conversion if the filter count starts getting out of hand. There are ways to do CNF conversion with linear increase in filters as well but that has been left out of the scope of this change since those algorithms add new variables in the predicate - which can be contentious.

5 months agouse a non-concurrent map for lookups-cached-global unless incremental updates are...
Clint Wylie [Wed, 9 Mar 2022 05:54:25 +0000 (21:54 -0800)] 
use a non-concurrent map for lookups-cached-global unless incremental updates are actually required (#12293)

* use a non-concurrent map for lookups-cached-global unless incremental updates are actually required
* adjustments
* fix test

5 months agoBatch ingestion replace (#12137)
Agustin Gonzalez [Wed, 9 Mar 2022 03:07:02 +0000 (20:07 -0700)] 
Batch ingestion replace (#12137)

* Tombstone support for replace functionality

* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec

* Update compaction test to match replace behavior

* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.

* Style plus simple queriableindex test

* Add segment cache loader tombstone test

* Add more tests

* Add a method to the LogicalSegment to test whether it has any data

* Test filter with some empty logical segments

* Refactor more compaction/dropexisting tests

* Code coverage

* Support for all empty segments

* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.

* Fix null ptr when segment does not have a queriable index

* Add support for empty replace interval (all input data has been filtered out)

* Fixed coverage & style

* Find tombstone versions from lock versions

* Test failures & style

* Interner was making this fail since the two segments were consider equal due to their id's being equal

* Cleanup tombstone version code

* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used

* Reject replace spec when input intervals are empty

* Documentation

* Style and unit test

* Restore test code deleted by mistake

* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.

* Unused imports. Dead code. Test coverage.

* Coverage.

* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.

* Fix OmniKiller + more test coverage.

* Tombstones are now marked using a shard spec

* Drop a segment factory.json in the segment cache for tombstones

* Style

* Style + coverage

* style

* Add TombstoneLoadSpec.class to mapper in test

* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java

Typo

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Update docs/configuration/index.md

Missing

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Typo

* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

* Range does not work with multi-dim

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
5 months agoadjust topn heap operation when string is dictionary encoded, but not uniquely (...
Clint Wylie [Tue, 8 Mar 2022 22:32:40 +0000 (14:32 -0800)] 
adjust topn heap operation when string is dictionary encoded, but not uniquely (#12291)

* add topn heap optimization when string is dictionary encoded, but not uniquely

* use array instead

* is same

* fix javadoc

* fix

* Update StringTopNColumnAggregatesProcessor.java

5 months agoAdd git hooks that can run multiple scripts (#12300)
Jihoon Son [Tue, 8 Mar 2022 22:16:47 +0000 (07:16 +0900)] 
Add git hooks that can run multiple scripts (#12300)

* Add git hooks that can run multiple scripts

* scripts to install/uninstall hooks

* better message for uninstall; support pre-push params

5 months agoGroupBy: Cap dictionary-building selector memory usage. (#12309)
Gian Merlino [Tue, 8 Mar 2022 21:13:11 +0000 (13:13 -0800)] 
GroupBy: Cap dictionary-building selector memory usage. (#12309)

* GroupBy: Cap dictionary-building selector memory usage.

New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.

Includes:

- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
  the v2SmallDictionary suite. (Both the selector dictionary and
  the merging dictionary will be small in that suite.)
- Tests for the new config parameter.

* Fix issues from tests.

* Add "pre-existing" to dictionary.

* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.

* Adjustments from review comments.

5 months agoBreak up parallel indexing unit test to reduce test times (#12313)
Kashif Faraz [Mon, 7 Mar 2022 23:26:24 +0000 (04:56 +0530)] 
Break up parallel indexing unit test to reduce test times (#12313)

* Break up parallel indexing unit test to reduce test times

* Fix checkstyle

5 months agoAlways reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)
Gian Merlino [Sat, 5 Mar 2022 22:39:14 +0000 (14:39 -0800)] 
Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)

* Always reopen stream in FileUtils.copyLarge, RetryingInputStream.

When an InputStream throws an exception from one of its read methods,
we should assume it's bad and reopen it.

The main changes here are:

- In FileUtils.copyLarge, replace InputStream with InputStreamSupplier.
- In RetryingInputStream, collapse retryCondition and resetCondition
  into a single condition. Also, make it required, since every usage
  is passing in a specific condition anyway.

* Test fixes.

* Fix read impl.

5 months agocorrect errors on compaction doc (#12308)
Victoria Lim [Fri, 4 Mar 2022 23:33:35 +0000 (15:33 -0800)] 
correct errors on compaction doc (#12308)

5 months agoOfficially support Java 11. (#12232)
Gian Merlino [Fri, 4 Mar 2022 22:15:45 +0000 (14:15 -0800)] 
Officially support Java 11. (#12232)

There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.

The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.

5 months agoRetain order in TaskReport. (#12005)
Gian Merlino [Fri, 4 Mar 2022 16:06:20 +0000 (08:06 -0800)] 
Retain order in TaskReport. (#12005)

5 months agoadd a new query laning metrics to visualize lane assignment (#12111)
Sandeep [Fri, 4 Mar 2022 07:21:17 +0000 (15:21 +0800)] 
add a new query laning metrics to visualize lane assignment (#12111)

* add a new query laning metrics to visualize lane assignment

* fixes :spotbugs check

* Update docs/operations/metrics.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
5 months agoSet Content-Type for String based response (#12295)
Frank Chen [Fri, 4 Mar 2022 07:17:03 +0000 (15:17 +0800)] 
Set Content-Type for String based response (#12295)

5 months agoFix ci (#12304)
Samarth Jain [Fri, 4 Mar 2022 07:05:50 +0000 (23:05 -0800)] 
Fix ci (#12304)

5 months agoBump jersey.version from 1.19.3 to 1.19.4 (#12290)
dependabot[bot] [Fri, 4 Mar 2022 01:57:20 +0000 (09:57 +0800)] 
Bump jersey.version from 1.19.3 to 1.19.4 (#12290)

* Bump jersey.version from 1.19.3 to 1.19.4

Bumps `jersey.version` from 1.19.3 to 1.19.4.

Updates `jersey-client` from 1.19.3 to 1.19.4

Updates `jersey-core` from 1.19.3 to 1.19.4

Updates `jersey-grizzly2` from 1.19.3 to 1.19.4

Updates `jersey-guice` from 1.19.3 to 1.19.4

Updates `jersey-server` from 1.19.3 to 1.19.4

Updates `jersey-servlet` from 1.19.3 to 1.19.4

Updates `jersey-json` from 1.19.3 to 1.19.4

Updates `jersey-test-framework-core` from 1.19.3 to 1.19.4

Updates `jersey-test-framework-grizzly2` from 1.19.3 to 1.19.4

---
updated-dependencies:
- dependency-name: com.sun.jersey:jersey-client
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-core
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-grizzly2
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey.contribs:jersey-guice
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-server
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-servlet
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-json
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey.jersey-test-framework:jersey-test-framework-core
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
* Update licenses.yaml

* Update licenses.yaml

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
5 months agouse virtual columns for sql simple aggregators instead of inline expressions (#12251)
Clint Wylie [Thu, 3 Mar 2022 23:05:28 +0000 (15:05 -0800)] 
use virtual columns for sql simple aggregators instead of inline expressions (#12251)

* use virtual columns for sql simple aggregators instead of inline expressions

* fixes

* always use virtual columns

* add more tests

5 months agoperf: eliminate expensive log construction in remote-task-runner shutdown (#12097)
Jason Koch [Thu, 3 Mar 2022 21:38:21 +0000 (13:38 -0800)] 
perf: eliminate expensive log construction in remote-task-runner shutdown (#12097)

5 months agoperf: improve RemoteTaskRunner task assignment loop performance (#12096)
Jason Koch [Wed, 2 Mar 2022 17:38:32 +0000 (09:38 -0800)] 
perf: improve RemoteTaskRunner task assignment loop performance (#12096)

* perf: improve ZkWorker task lookup performance

This improves the performance of the ZkWorker task lookup loop by
eliminating repeat calls to getRunningTasks() in toImmutable(),
and reduces the work performed in isRunningTask() to stream-parse
the id field instead of entire JSON blob.

5 months agoDisplay row stats for multiphase parallel indexing tasks (#12280)
Tejaswini Bandlamudi [Wed, 2 Mar 2022 04:40:31 +0000 (10:10 +0530)] 
Display row stats for multiphase parallel indexing tasks (#12280)

Row stats are reported for single phase tasks in the `/liveReports` and `/rowStats` APIs
and are also a part of the overall task report. This commit adds changes to report
row stats for multiphase tasks too.

Changes:
- Add `TaskReport` in `GeneratedPartitionsReport` generated during hash and range partitioning
- Collect the reports for `index_generate` phase in `ParallelIndexSupervisorTask`

5 months agolatest datasketches-java-3.1.0 (#12224)
Alexander Saydakov [Wed, 2 Mar 2022 01:14:42 +0000 (17:14 -0800)] 
latest datasketches-java-3.1.0 (#12224)

These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544.

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
5 months agoMake ParseExceptions more informative (#12259)
Laksh Singla [Mon, 28 Feb 2022 17:01:15 +0000 (22:31 +0530)] 
Make ParseExceptions more informative (#12259)

This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)

Following changes are addressed in this PR:

A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).

5 months agoReplace use of PowerMock with Mockito (#12282)
Xavier Léauté [Mon, 28 Feb 2022 06:47:09 +0000 (22:47 -0800)] 
Replace use of PowerMock with Mockito (#12282)

Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past.

* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future

5 months agoupdate airline dependency to 2.x (#12270)
Xavier Léauté [Sun, 27 Feb 2022 23:19:28 +0000 (15:19 -0800)] 
update airline dependency to 2.x (#12270)

* upgrade Airline to Airline 2
  https://github.com/airlift/airline is no longer maintained, updating to
  https://github.com/rvesse/airline (Airline 2) to use an actively
  maintained version, while minimizing breaking changes.

  Note, this is a backwards incompatible change, and extensions relying on
  the CliCommandCreator extension point will also need to be updated.

* fix dependency checks where jakarta.inject is now resolved first instead
  of javax.inject, due to Airline 2 using jakarta

5 months agoReduce use of mocking and simplify some tests (#12283)
Xavier Léauté [Sun, 27 Feb 2022 01:23:09 +0000 (17:23 -0800)] 
Reduce use of mocking and simplify some tests (#12283)

* remove use of mocks for ServiceMetricEvent
* simplify KafkaEmitterTests by moving to Mockito
* speed up KafkaEmitterTest by adjusting reporting frequency in tests
* remove unnecessary easymock and JUnitParams dependencies

5 months agoFixing hadoop 3 Dockerfile (#12284)
Karan Kumar [Sat, 26 Feb 2022 13:48:29 +0000 (19:18 +0530)] 
Fixing hadoop 3 Dockerfile (#12284)