2 months agoImproved docs for range partitioning. (#12350)
Gian Merlino [Mon, 16 May 2022 16:42:31 +0000 (09:42 -0700)] 
Improved docs for range partitioning. (#12350)

* Improved docs for range partitioning.

1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.

* Additional clarification.

* Update other section.

* Another adjustment.

* Updates from review.

2 months agoClarify the use of the Lookup API (#12088)
Hellmar Becker [Mon, 16 May 2022 14:50:24 +0000 (16:50 +0200)] 
Clarify the use of the Lookup API (#12088)

* Update lookups.md

* Update docs/querying/lookups.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* Update docs/querying/lookups.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2 months agodocs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)
317brian [Mon, 16 May 2022 14:48:33 +0000 (07:48 -0700)] 
docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"

This reverts commit cadd1fdc604de414379bffe9986ae64b9cf51fc6.

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md

fix spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2 months agoEnable vectorized virtual column processing by default. (#12520)
Gian Merlino [Mon, 16 May 2022 10:13:53 +0000 (03:13 -0700)] 
Enable vectorized virtual column processing by default. (#12520)

In the majority of cases, this improves performance.

There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.

IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue

2 months agoEnforce console logging for peon process (#12067)
Frank Chen [Mon, 16 May 2022 09:37:21 +0000 (17:37 +0800)] 
Enforce console logging for peon process (#12067)

Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.

But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.

So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.

2 months agoAdd setProcessingThreadNames context parameter. (#12514)
Gian Merlino [Mon, 16 May 2022 08:12:00 +0000 (01:12 -0700)] 
Add setProcessingThreadNames context parameter. (#12514)

setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.

2 months agoTask queue unblock (#12099)
Jason Koch [Sat, 14 May 2022 23:44:29 +0000 (16:44 -0700)] 
Task queue unblock (#12099)

* concurrency: introduce GuardedBy to TaskQueue

* perf: Introduce TaskQueueScaleTest to test performance of TaskQueue with large task counts

This introduces a test case to confirm how long it will take to launch and manage (aka shutdown)
a large number of threads in the TaskQueue.

h/t to @gianm for main implementation.

* perf: improve scalability of TaskQueue with large task counts

* linter fixes, expand test coverage

* pr feedback suggestion; swap to different linter

* swap to use SuppressWarnings

* Fix TaskQueueScaleTest.

Co-authored-by: Gian Merlino <gian@imply.io>
2 months agoUse datasketches version 3.2.0 (#12509)
Kashif Faraz [Fri, 13 May 2022 05:58:15 +0000 (11:28 +0530)] 
Use datasketches version 3.2.0 (#12509)

- Use apache datasketches version 3.2.0.
- Remove unsafe reflection-based usage of datasketch internals added in #12022

2 months agoAdd replace statement to sql parser (#12386)
Adarsh Sanjeev [Fri, 13 May 2022 05:26:40 +0000 (10:56 +0530)] 
Add replace statement to sql parser (#12386)

Relevant Issue: #11929

- Add custom replace statement to Druid SQL parser.
- Edit DruidPlanner to convert relevant fields to Query Context.
- Refactor common code with INSERT statements to reuse them for REPLACE where possible.

2 months agoAdd IPAddress java library as dependency and migrate IPv4 functions to use the new...
Abhishek Radhakrishnan [Thu, 12 May 2022 05:06:20 +0000 (01:06 -0400)] 
Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634)

* Add ipaddress library as dependency.

* IPv4 functions to use the inet.ipaddr package.

* Remove unused imports.

* Add new function.

* Minor rename.

* Add more unit tests.

* IPv4 address expr utils unit tests and address options.

* Adjust the IPv4Util functions.

* Move the UTs a bit around.

* Javadoc comments.

* Add license info for IPAddress.

* Fix groupId, artifact and version in license.yaml.

* Remove redundant subnet in messages - fixes UT.

* Remove unused commons-net dependency for /processing project.

* Make class and methods public so it can be accessed.

* Add initial version of benchmark

* Add subnetutils package for benchmarks.

* Auto generate ip addresses.

* Add more v4 address representations in setup to avoid bias.

* Use ThreadLocalRandom to avoid forbidden API usage.

* Adjust IPv4AddressBenchmark to adhere to codestyle rules.

* Update ipaddress library to latest 5.3.4

* Add ipaddress package dependency to benchmarks project.

2 months agoremake column indexes and query processing of filters (#12388)
Clint Wylie [Wed, 11 May 2022 06:27:08 +0000 (23:27 -0700)] 
remake column indexes and query processing of filters (#12388)

Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.

2 months agoAllow coordinator to be configured to kill segments in future (#10877)
Lucas Capistrant [Wed, 11 May 2022 02:05:15 +0000 (21:05 -0500)] 
Allow coordinator to be configured to kill segments in future (#10877)

Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.

A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.

3 months agoDocs: Fix column name in ingestion rollup doc (#12036)
Kashif Faraz [Tue, 10 May 2022 12:05:59 +0000 (05:05 -0700)] 
Docs: Fix column name in ingestion rollup doc (#12036)

Fix the referred column name from "count" to "num_rows" as "count" vs. "COUNT(*)" might be a little confusing in this example.

3 months agoAdd feature flag for sql planning of TimeBoundary queries (#12491)
Rohan Garg [Tue, 10 May 2022 09:53:42 +0000 (15:23 +0530)] 
Add feature flag for sql planning of TimeBoundary queries (#12491)

* Add feature flag for sql planning of TimeBoundary queries

* fixup! Add feature flag for sql planning of TimeBoundary queries

* Add documentation for enableTimeBoundaryPlanning

* fixup! Add documentation for enableTimeBoundaryPlanning

3 months agoVectorized version of string last aggregator (#12493)
somu-imply [Tue, 10 May 2022 00:02:38 +0000 (17:02 -0700)] 
Vectorized version of string last aggregator (#12493)

* Vectorized version of string last aggregator

* Updating string last and adding testcases

* Updating code and adding testcases for serializable pairs

* Addressing review comments

3 months agoPass metrics object for Scan, Timeseries and GroupBy queries during cursor creation...
Rohan Garg [Mon, 9 May 2022 17:40:17 +0000 (23:10 +0530)] 
Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484)

* Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation

* fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation

* Document vectorized dimension

3 months agoAdd daily stats to console (#12329)
Atul Mohan [Thu, 5 May 2022 22:31:21 +0000 (15:31 -0700)] 
Add daily stats to console (#12329)

3 months agoWeb console: add a button to get out of restricted mode, make capability detection...
Vadim Ogievetsky [Thu, 5 May 2022 22:06:59 +0000 (15:06 -0700)] 
Web console: add a button to get out of restricted mode, make capability detection more robust (#12503)

* allow unrestrict

* update tests

3 months agoUpdate automatic compaction docs with consistent terminology (#12416)
Victoria Lim [Tue, 3 May 2022 23:22:25 +0000 (16:22 -0700)] 
Update automatic compaction docs with consistent terminology (#12416)

* specify automatic compaction where applicable

* Apply suggestions from code review

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* update for style and consistency

* implement suggested feedback

* remove duplicate example

* Apply suggestions from code review

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/ingestion/compaction.md

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/operations/api-reference.md

* update .spelling

* Adopt review suggestions

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
3 months agoadd aws-java-sdk-sts to aws-common classpath (#12482)
Naya Chen [Tue, 3 May 2022 19:25:51 +0000 (12:25 -0700)] 
add aws-java-sdk-sts to aws-common classpath (#12482)

Fixes #11303

WebIdentityTokenProvider in the defaultAWSCredentialsProviderChain can not actually be used because the aws-java-sdk-sts jar is not in the classpath of S3 extension at runtime, since each extension has its own classpath. This results in the inability to assume STS role before generating authentication token.
The error message from getCredentials() is:

"Unable to load credentials from WebIdentityTokenCredentialsProvider: To use assume role profiles the aws-java-sdk-sts module must be on the class path"

This PR will fix multiple authentication modules that are dependent on the WebIdentityTokenProvider, including AWS IAM based RDS authentication and S3 authentication.

3 months agoWeb console: Misc table fixes (#12489)
Vadim Ogievetsky [Tue, 3 May 2022 19:08:08 +0000 (12:08 -0700)] 
Web console: Misc table fixes (#12489)

* Misc table fixes

* extract default className

* table spacing updates

* fix e2e action selector

* try more times

* make the web console exist again

3 months agoFix broken ForkingTaskRunnerTest (#12499)
zachjsh [Tue, 3 May 2022 08:00:36 +0000 (04:00 -0400)] 
Fix broken ForkingTaskRunnerTest (#12499)

A recent commit broke this test. This pr fixes the test.

3 months agoAdd a metric for task duration in the pending queue (#12492)
Rocky Chen [Tue, 3 May 2022 03:47:25 +0000 (20:47 -0700)] 
Add a metric for task duration in the pending queue (#12492)

This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner.

An example of the metric:

2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"}

Key changed/added classes in this PR

    Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner.
    Update related factory classes and tests.

3 months agoUpdate maven assembly plugin for druid-benchmarks (#12487)
Nishant Bangarwa [Mon, 2 May 2022 16:43:19 +0000 (09:43 -0700)] 
Update maven assembly plugin for druid-benchmarks (#12487)

3 months agoAdd authentication call before cleaning up intermediate files in hadoop ingestions...
Lucas Capistrant [Mon, 2 May 2022 13:40:44 +0000 (08:40 -0500)] 
Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030)

* Add authentication call before cleaning up intermediate files in hadoop ingestions

* fix checkstyle

* remove debug log

3 months agoUpgrade dependency-check-maven to 7.0.4 (#12441)
aggarwalakshay [Sun, 1 May 2022 14:45:58 +0000 (07:45 -0700)] 
Upgrade dependency-check-maven to 7.0.4 (#12441)

3 months agodocs: fix typo (#12494)
317brian [Sun, 1 May 2022 14:44:31 +0000 (07:44 -0700)] 
docs: fix typo (#12494)

3 months agoImprove build performance of modules (#12486)
MC-JY [Sun, 1 May 2022 14:43:11 +0000 (22:43 +0800)] 
Improve build performance of modules (#12486)

* improve build performance of modules

* improve build performance of modules

* Update pom.xml

* improve build performance of modules

3 months agoImprove error messages when URI points to a file that doesn't exist (#12490)
Tejaswini Bandlamudi [Sun, 1 May 2022 05:56:16 +0000 (11:26 +0530)] 
Improve error messages when URI points to a file that doesn't exist (#12490)

3 months agoGroupBy: Reduce allocations by reusing entry and key holders. (#12474)
Gian Merlino [Fri, 29 Apr 2022 06:21:13 +0000 (23:21 -0700)] 
GroupBy: Reduce allocations by reusing entry and key holders. (#12474)

* GroupBy: Reduce allocations by reusing entry and key holders.

Two main changes:

1) Reuse Entry objects returned by various implementations of

2) Reuse key objects contained within those Entry objects.

This is allowed by the contract, which states that entries must be
processed and immediately discarded. However, not all call sites
respected this, so this patch also updates those call sites.

One particularly sneaky way that the old code retained entries too long
is due to Guava's MergingIterator and CombiningIterator. Internally,
these both advance to the next value prior to returning the current
value. So, this patch addresses that in two ways:

1) For merging, we have our own implementation MergeIterator already,
   although it had the same problem. So, this patch updates our
   implementation to return the current item prior to advancing to the
   next item. It also adds a forbidden-api entry to ensure that this
   safer implementation is used instead of Guava's.

2) For combining, we address the problem in a different way: by copying
   the key when creating the new, combined entry.

* Attempt to fix test.

* Remove unused import.

3 months agoremove arbitrary granularity spec from docs (#12460)
Charles Smith [Thu, 28 Apr 2022 23:36:54 +0000 (16:36 -0700)] 
remove arbitrary granularity spec from docs (#12460)

* remove arbitrary granularity spec from docs

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
3 months agoImprove exception message for native binary operators (#12335)
Frank Chen [Thu, 28 Apr 2022 02:20:16 +0000 (10:20 +0800)] 
Improve exception message for native binary operators (#12335)

* Improve exception message

* Update message

3 months agoDimensionRangeShardSpec speed boost. (#12477)
Gian Merlino [Wed, 27 Apr 2022 21:20:35 +0000 (14:20 -0700)] 
DimensionRangeShardSpec speed boost. (#12477)

* DimensionRangeShardSpec speed boost.

Calling isEmpty() and equals() on RangeSets is expensive, because these
fall back on default implementations that call size(). And size() is
_also_ a default implementation that iterates the entire collection.

* Fix and test from code review.

3 months agoReduce allocations due to Jackson serialization. (#12468)
Gian Merlino [Wed, 27 Apr 2022 21:17:26 +0000 (14:17 -0700)] 
Reduce allocations due to Jackson serialization. (#12468)

* Reduce allocations due to Jackson serialization.

This patch attacks two sources of allocations during Jackson

1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
   DefaultSerializerProvider instance for each call. It has lots of
   fields and creates pressure on the garbage collector. So, this patch
   adds helper functions in JacksonUtils that enable reuse of
   SerializerProvider objects and updates various call sites to make
   use of this.

2) GroupByQueryToolChest copies the ObjectMapper for every query to
   install a special module that supports backwards compatibility with
   map-based rows. This isn't needed if resultAsArray is set and
   all servers are running Druid 0.16.0 or later. This release was a
   while ago. So, this patch disables backwards compatibility by default,
   which eliminates the need to copy the heavyweight ObjectMapper. The
   patch also introduces a configuration option that allows admins to
   explicitly enable backwards compatibility.

* Add test.

* Update additional call sites and add to forbidden APIs.

3 months agoSQL: Create millisecond precision timestamp literals. (#12407)
Gian Merlino [Wed, 27 Apr 2022 21:17:07 +0000 (14:17 -0700)] 
SQL: Create millisecond precision timestamp literals. (#12407)

* SQL: Create millisecond precision timestamp literals.

Fixes a bug where implicit casts of strings to timestamps would use seconds
precision rather than milliseconds. The new test case
exercises this.

* Update sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java

Co-authored-by: Frank Chen <frankchen@apache.org>
* Correct precision handling.

- Set default precision to 3 (millis) for things involving timestamps.
- Respect precision specified in types when available.

* Silence, checkstyle.

Co-authored-by: Frank Chen <frankchen@apache.org>
3 months agoJvmMonitor: Handle more generation and collector scenarios. (#12469)
Gian Merlino [Wed, 27 Apr 2022 18:18:40 +0000 (11:18 -0700)] 
JvmMonitor: Handle more generation and collector scenarios. (#12469)

* JvmMonitor: Handle more generation and collector scenarios.

ZGC on Java 11 only has a generation 1 (there is no 0). This causes
a NullPointerException when trying to extract the spacesCount for
generation 0. In addition, ZGC on Java 15 has a collector number 2
but no spaces in generation 2, which breaks the assumption that
collectors always have same-numbered spaces.

This patch adjusts things to be more robust, enabling the JvmMonitor
to work properly for ZGC on both Java 11 and 15.

* Test adjustments.

* Improve surefire arglines.

* Need a placeholder

3 months agoFor the various Yielder objects, don't create new Yielders and instead mutate state...
Gian Merlino [Wed, 27 Apr 2022 17:52:20 +0000 (10:52 -0700)] 
For the various Yielder objects, don't create new Yielders and instead mutate state. (#12475)

Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
3 months agoBump up the versions (#12480)
Abhishek Agarwal [Wed, 27 Apr 2022 08:58:20 +0000 (14:28 +0530)] 
Bump up the versions (#12480)

3 months agoValidate select columns for insert statement (#12431)
Adarsh Sanjeev [Wed, 27 Apr 2022 06:55:49 +0000 (12:25 +0530)] 
Validate select columns for insert statement (#12431)

Unnamed columns in the select part of insert SQL statements currently create a table with the column name such as "EXPR$3". This PR adds a check for this.

3 months agoVectorize numeric latest aggregators (#12439)
somu-imply [Tue, 26 Apr 2022 18:33:08 +0000 (11:33 -0700)] 
Vectorize numeric latest aggregators (#12439)

* Vectorizing Latest aggregator Part 1

* Updating benchmark tests

* Changing appropriate logic for vectors for null handling

* Introducing an abstract class and moving the commonalities there

* Adding vectorization for StringLast aggregator (initial version)

* Updated bufferized version of numeric aggregators

* Adding some javadocs

* Making sure this PR vectorizes numeric latest agg only

* Adding another benchmarking test

* Fixing intellij inspections

* Adding tests for double

* Adding test cases for long and float

* Updating testcases

* Checkstyle oops..

* One tiny change in test case

* Fixing spotbug and rhs not being used

3 months agoWorker level task metrics (#12446)
zachjsh [Tue, 26 Apr 2022 16:44:44 +0000 (12:44 -0400)] 
Worker level task metrics (#12446)

* * fix metric name inconsistency

* * add task slot metrics for middle managers

* * add new WorkerTaskCountStatsMonitor to report task count metrics
  from worker

* * more stuff

* * remove unused variable

* * more stuff

* * add javadocs

* * fix checkstyle

* * fix hadoop test failure

* * cleanup

* * add more code coverage in tests

* * fix test failure

* * add docs

* * increase code coverage

* * fix spelling

* * fix failing tests

* * remove dead code

* * fix spelling

3 months agoEnable Arm builds (#12451)
Will Xu [Tue, 26 Apr 2022 14:44:40 +0000 (07:44 -0700)] 
Enable Arm builds (#12451)

This PR enables ARM builds on Travis. I've ported over the changes from @martin-g on reducing heap requirements for some of the tests to ensure they run well on Travis arm instances.

3 months agoConvert simple min/max SQL queries on __time to timeBoundary queries (#12472)
Rohan Garg [Mon, 25 Apr 2022 15:18:58 +0000 (20:48 +0530)] 
Convert simple min/max SQL queries on __time to timeBoundary queries (#12472)

* Support array based results in timeBoundary query

* Fix bug with query interval in timeBoundary

* Convert min(__time) and max(__time) SQL queries to timeBoundary

* Add tests for timeBoundary backed SQL queries

* Fix query plans for existing tests

* fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary

* fixup! Add tests for timeBoundary backed SQL queries

* fixup! Fix bug with query interval in timeBoundary

3 months agoUpdate native-batch.md (#12478)
Peter Marshall [Mon, 25 Apr 2022 13:44:17 +0000 (14:44 +0100)] 
Update native-batch.md (#12478)

Fixed indent on the Granularity Spec section and removed some superfluous tabbings.

3 months agoFix formatting in stats.md (#12470)
Apoorv Gupta [Sat, 23 Apr 2022 03:35:08 +0000 (20:35 -0700)] 
Fix formatting in stats.md (#12470)

* Fix formatting in stats.md

* Update stats.md

* Update docs/development/extensions-core/stats.md

Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/development/extensions-core/stats.md

Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
3 months agoMetrics for shenandoah based on this source code: https://github.com/openjdk/jdk...
Didip Kerabat [Fri, 22 Apr 2022 18:44:05 +0000 (11:44 -0700)] 
Metrics for shenandoah based on this source code: https://github.com/openjdk/jdk/blob/554caf33a01ac9ca2e3e9170557e8348750f3971/src/hotspot/share/gc/shenandoah/shenandoahMonitoringSupport.cpp#L65 (#12369)

Co-authored-by: Didip Kerabat <didip@apple.com>
3 months agoQueryScheduler: Log per-query message at DEBUG level. (#12467)
Gian Merlino [Fri, 22 Apr 2022 18:22:34 +0000 (11:22 -0700)] 
QueryScheduler: Log per-query message at DEBUG level. (#12467)

We generally want to avoid having any routine per-query messages at
INFO level, because they pollute logs.

3 months agostringFirst and stringLast supported in ingestion (#12466)
Victoria Lim [Fri, 22 Apr 2022 02:28:49 +0000 (19:28 -0700)] 
stringFirst and stringLast supported in ingestion (#12466)

3 months agoupdated docs for sql query context (#12406)
Victoria Lim [Thu, 21 Apr 2022 18:19:39 +0000 (11:19 -0700)] 
updated docs for sql query context (#12406)

3 months agoSupress CVE 2022 26612 (#12463)
Tejaswini Bandlamudi [Thu, 21 Apr 2022 15:48:20 +0000 (21:18 +0530)] 
Supress CVE 2022 26612 (#12463)

* supress CVE-2022-26612

* adding packageUrl

* suppressing CVE-2022-26612

* adding packageUrl

* moving to hadoop section

3 months agoAdd support for authorizing query context params (#12396)
Jihoon Son [Thu, 21 Apr 2022 08:51:16 +0000 (01:51 -0700)] 
Add support for authorizing query context params (#12396)

The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause
1) a bad UX if the context param is not matured yet or
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    "action" : "WRITE"
  "resourceNamePattern" : "maxSubqueryRows"
Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.

3 months agoEmit vectorized metric dimension by default (#12464)
Rohan Garg [Thu, 21 Apr 2022 04:14:55 +0000 (09:44 +0530)] 
Emit vectorized metric dimension by default (#12464)

3 months agoFix GCS based ingestion if bucket name contains underscores (#12445)
Tejaswini Bandlamudi [Thu, 21 Apr 2022 03:52:35 +0000 (09:22 +0530)] 
Fix GCS based ingestion if bucket name contains underscores (#12445)

GCP allows bucket names to contain underscores. When a location in such a bucket
is mapped to `java.net.URI`, `URI.getHost()` returns null. `URI.getHost()` is used as
the bucket name in `CloudObjectLocation`, leading to an NPE.

This commit uses `URI.getAuthority()` as the bucket name if `URI.getHost()` is null.

3 months agoupdate httpclient due to cve (#12422)
PJ Fanning [Thu, 21 Apr 2022 02:12:19 +0000 (04:12 +0200)] 
update httpclient due to cve (#12422)


3 months agoissue-12426 upgrade k8s client due to cve (#12427)
PJ Fanning [Thu, 21 Apr 2022 02:11:55 +0000 (04:11 +0200)] 
issue-12426 upgrade k8s client due to cve (#12427)

* issue-12426 upgrade k8s client due to cve

* compile issues

* try to fix license check

3 months agoUpdating an error msg (#12450)
somu-imply [Wed, 20 Apr 2022 14:56:09 +0000 (07:56 -0700)] 
Updating an error msg (#12450)

* Updating an error msg

* Added an extra [] so removing it

3 months agoSuppress CVE-2021-43138 (#12437)
Jihoon Son [Tue, 19 Apr 2022 03:00:06 +0000 (20:00 -0700)] 
Suppress CVE-2021-43138 (#12437)

* Suppress CVE-2021-43138

* revert netty 3.10.5.Final

3 months agoDocument expression post-aggregators (#11896)
jacobtolar [Tue, 19 Apr 2022 02:36:19 +0000 (21:36 -0500)] 
Document expression post-aggregators (#11896)

* Document expression post-aggregators

* Update docs/querying/post-aggregations.md

Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
3 months agoRemove h2 database from dependency (#12447)
Frank Chen [Tue, 19 Apr 2022 02:25:17 +0000 (10:25 +0800)] 
Remove h2 database from dependency (#12447)

3 months agoDocument running it tests from intellij IDE (#12440)
TSFenwick [Tue, 19 Apr 2022 02:24:46 +0000 (19:24 -0700)] 
Document running it tests from intellij IDE (#12440)

* document running IT tests in intellij

* clean up unnecessary changes

* address comments

3 months agorecommendation for comparing strings and numbers (#12442)
Victoria Lim [Mon, 18 Apr 2022 16:28:32 +0000 (09:28 -0700)] 
recommendation for comparing strings and numbers (#12442)

3 months agoDocs - query caching (#11584)
Peter Marshall [Mon, 18 Apr 2022 09:00:21 +0000 (10:00 +0100)] 
Docs - query caching (#11584)

* Update caching.md

Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900

Update caching.md

A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300

* Update caching.md


* Amendments on the segment cache

Significant updates on content around the segment cache, pull process, and in-memory cache

* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update basic-cluster-tuning.md


* Update docs/querying/caching.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Whole-query caching update

Made more succinct and removed specific config to change.

* Update docs/design/historical.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
3 months agoFixes a small typo in ingestion spec doc (#12143)
Charles Smith [Mon, 18 Apr 2022 08:53:50 +0000 (01:53 -0700)] 
Fixes a small typo in ingestion spec doc (#12143)

* small typo

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: sthetland <steve.hetland@imply.io>
3 months agoFail fast incase a lookup load fails (#12397)
Rohan Garg [Mon, 18 Apr 2022 07:44:02 +0000 (13:14 +0530)] 
Fail fast incase a lookup load fails (#12397)

Currently while loading a lookup for the first time, loading threads blocks
for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs`
is long (like 10 minutes), such blocking can slow down the loading of other lookups.

This commit allows the thread to progress as soon as the loading of the lookup fails.

3 months agoDocs - added another common config property to tuningConfig (#11935)
Peter Marshall [Mon, 18 Apr 2022 05:41:39 +0000 (06:41 +0100)] 
Docs - added another common config property to tuningConfig (#11935)

* Update ingestion-spec.md

Added indexSpecForIntermediatePersists as a common configuration property.

* Update ingestion-spec.md

Amended to remove "below" and add link to the table.

* Update ingestion-spec.md

Removed passive.

3 months agoUpdate tutorial-compaction.md to change an unclear statement (#11988)
Alexandre BERTHIOT [Mon, 18 Apr 2022 05:25:09 +0000 (05:25 +0000)] 
Update tutorial-compaction.md to change an unclear statement (#11988)

* Update tutorial-compaction.md

Unclear statement on the explanation of tuningConfig section.

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
3 months agoFix bug in auto compaction preserveExistingMetrics feature (#12438)
Maytas Monsereenusorn [Fri, 15 Apr 2022 22:47:47 +0000 (15:47 -0700)] 
Fix bug in auto compaction preserveExistingMetrics feature (#12438)

* fix bug

* fix test

* fix IT

3 months agoMake tombstones ingestible by having them return an empty result set. (#12392)
Agustin Gonzalez [Fri, 15 Apr 2022 16:08:06 +0000 (09:08 -0700)] 
Make tombstones ingestible by having them return an empty result set. (#12392)

* Make tombstones ingestible by having them return an empty result set.

* Spotbug

* Coverage

* Coverage

* Remove unnecessary exception (checkstyle)

* Fix integration test and add one more to test dropExisting set to false over tombstones

* Force dropExisting to true in auto-compaction when the interval contains only tombstones

* Checkstyle, fix unit test

* Changed flag by mistake, fixing it

* Remove method from interface since this method is specific to only DruidSegmentInputentity

* Fix typo

* Adapt to latest code

* Update comments when only tombstones to compact

* Move empty iterator to a new DruidTombstoneSegmentReader

* Code review feedback

* Checkstyle

* Review feedback

* Coverage

3 months agoUse binary search to improve DimensionRangeShardSpec lookup (#12417)
hqx871 [Fri, 15 Apr 2022 16:07:06 +0000 (00:07 +0800)] 
Use binary search to improve DimensionRangeShardSpec lookup (#12417)

If there are many shards, mapper of IndexGeneratorJob seems to spend a lot of time in calling
DimensionRangeShardSpec.isInChunk to lookup target shard. This can be significantly improved
by using binary search instead of comparing an input row to every shardSpec.

* Add `BaseDimensionRangeShardSpec` which provides a binary-search-based
   implementation for `createLookup`
* `DimensionRangeShardSpec`, `SingleDimensionShardSpec`, and
   `DimensionRangeBucketShardSpec` now extend `BaseDimensionRangeShardSpec`

3 months agoHandling planning with alias for time for group by and order by (#12418)
somu-imply [Fri, 15 Apr 2022 04:59:17 +0000 (21:59 -0700)] 
Handling planning with alias for time for group by and order by (#12418)

An outer scan query, that requires ordering on a column, should be considered an invalid query.

3 months agogood stuff (#12435)
Vadim Ogievetsky [Thu, 14 Apr 2022 07:23:06 +0000 (00:23 -0700)] 
good stuff (#12435)

3 months agofix issue with boolean expression input (#12429)
Clint Wylie [Wed, 13 Apr 2022 23:34:01 +0000 (16:34 -0700)] 
fix issue with boolean expression input (#12429)

3 months agoAdd docs to metric spec for auto compaction (#12415)
Maytas Monsereenusorn [Wed, 13 Apr 2022 20:27:00 +0000 (13:27 -0700)] 
Add docs to metric spec for auto compaction (#12415)

* add docs

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update index.md

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
3 months agoFix indexMerger to respect the includeAllDimensions flag (#12428)
Jihoon Son [Wed, 13 Apr 2022 19:43:11 +0000 (12:43 -0700)] 
Fix indexMerger to respect the includeAllDimensions flag (#12428)

* Fix indexMerger to respect flag includeAllDimensions flag; jsonInputFormat should set keepNullColumns if useFieldDiscovery is set

* address comments

3 months agoAdd Kinesis ListShards permission (#12387)
Katya Macedo [Wed, 13 Apr 2022 09:59:56 +0000 (04:59 -0500)] 
Add Kinesis ListShards permission (#12387)

* add Kinesis permission

* List Kinesis IAM permissions

* Adopt review suggestions

* Fix merge conflicts

3 months agoWeb console: Misc fixes and improvements (#12361)
Vadim Ogievetsky [Wed, 13 Apr 2022 05:20:28 +0000 (22:20 -0700)] 
Web console: Misc fixes and improvements  (#12361)

* Misc fixes

* pad column numbers

* make shard_type filterable

3 months agoCopy of #11309 with fixes (#12402)
Parag Jain [Mon, 11 Apr 2022 15:35:24 +0000 (21:05 +0530)] 
Copy of #11309 with fixes (#12402)

* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
3 months agoFix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial ...
Tiffany Yeh [Mon, 11 Apr 2022 14:58:09 +0000 (10:58 -0400)] 
Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248)

Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.

The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: https://github.com/zulu-openjdk/zulu-openjdk/blob/be45d20302e42df5aa95d2de078bb5e4214f5dba/centos/8u282-

4 months agoBump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)
Jihoon Son [Sat, 9 Apr 2022 10:08:26 +0000 (03:08 -0700)] 
Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)

* Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724)

* update license file

4 months agoMake error messages for insert statements consistent with select statements (#12414)
Adarsh Sanjeev [Sat, 9 Apr 2022 06:51:40 +0000 (12:21 +0530)] 
Make error messages for insert statements consistent with select statements (#12414)

For a query like
INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement
INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL
returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this.

Add EOF to end of Druid SQL Insert statements
Rename SQL Insert statements in the parser to reflect the behaviour change

4 months agoImprove metrics for Auto Compaction (#12413)
Maytas Monsereenusorn [Sat, 9 Apr 2022 03:14:36 +0000 (20:14 -0700)] 
Improve metrics for Auto Compaction (#12413)

* add impl

* add docs

* fix

4 months agoAdd a new flag for ingestion to preserve existing metrics (#12185)
Maytas Monsereenusorn [Fri, 8 Apr 2022 18:02:02 +0000 (11:02 -0700)] 
Add a new flag for ingestion to preserve existing metrics (#12185)

* add impl

* add impl

* fix checkstyle

* add impl

* add unit test

* fix stuff

* fix stuff

* fix stuff

* add unit test

* add more unit tests

* add more unit tests

* add IT

* add IT

* add IT

* add IT

* add ITs

* address comments

* fix test

* fix test

* fix test

* address comments

* address comments

* address comments

* fix conflict

* fix checkstyle

* address comments

* fix test

* fix checkstyle

* fix test

* fix test

* fix IT

4 months agoUpdate index.md (#12390)
mark-imply [Fri, 8 Apr 2022 12:31:54 +0000 (06:31 -0600)] 
Update index.md (#12390)

Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.

4 months agoFix the other 2 python scripts that generates license. (#12340)
Didip Kerabat [Fri, 8 Apr 2022 11:13:17 +0000 (04:13 -0700)] 
Fix the other 2 python scripts that generates license. (#12340)

Fixes YAML.load_all issues on two of the Python scripts that generate license.

The broken Python files interfere with some of the Maven tasks.

4 months agoUpdate basic-cluster-tuning.md (#12412)
mark-imply [Fri, 8 Apr 2022 09:59:55 +0000 (03:59 -0600)] 
Update basic-cluster-tuning.md (#12412)

Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.

4 months agofix(docs): clarify what s3 permissions are needed based on the access management...
317brian [Thu, 7 Apr 2022 23:22:56 +0000 (16:22 -0700)] 
fix(docs): clarify what s3 permissions are needed based on the access management type (#12405)

* fix(docs): clarify what s3 permissions are needed based on the permissions model

* fix typo

* Update docs/development/extensions-core/s3.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
4 months agoBump minimist from 1.2.5 to 1.2.6 in /website (#12400)
dependabot[bot] [Thu, 7 Apr 2022 10:08:39 +0000 (03:08 -0700)] 
Bump minimist from 1.2.5 to 1.2.6 in /website (#12400)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

- dependency-name: minimist
  dependency-type: indirect

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoBump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)
dependabot[bot] [Wed, 6 Apr 2022 23:55:14 +0000 (16:55 -0700)] 
Bump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

- dependency-name: minimist
  dependency-type: indirect

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoclean up some bp3 classes (#12403)
Vadim Ogievetsky [Wed, 6 Apr 2022 22:27:44 +0000 (15:27 -0700)] 
clean up some bp3 classes (#12403)

4 months agoDocument data format and example for featureSpec (#12394)
Victoria Lim [Wed, 6 Apr 2022 22:17:15 +0000 (15:17 -0700)] 
Document data format and example for featureSpec (#12394)

* add data format and example for featureSpec

* add second feature in example

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
4 months agodocs(fix): add clarity around granularitySpec (#12362)
317brian [Wed, 6 Apr 2022 16:24:37 +0000 (09:24 -0700)] 
docs(fix): add clarity around granularitySpec (#12362)

* fix: add clarify around granularitySpec

* fix spacing

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
4 months agoDocument config for ingesting null columns (#12389)
Victoria Lim [Tue, 5 Apr 2022 16:15:42 +0000 (09:15 -0700)] 
Document config for ingesting null columns (#12389)

* config for ingesting null columns

* add link

* edit .spelling

* what happens if storeEmptyColumns is disabled

4 months agoupgrade surefire 3.0.0-M6 (#12395)
aggarwalakshay [Tue, 5 Apr 2022 06:56:15 +0000 (23:56 -0700)] 
upgrade surefire 3.0.0-M6 (#12395)

* upgrade surefire 3.0.0-M6

* increasing memory

4 months agoMethod to specify eternity in the scan query builder (#12223)
Paul Rogers [Mon, 4 Apr 2022 22:11:32 +0000 (15:11 -0700)] 
Method to specify eternity in the scan query builder (#12223)

* Method to specify eternity in the scan query builder

* Fix checkstyle issue

* Renamed eterity() to eternityInterval()

* Minor fixes

4 months agoBlueprint 4 (#12391)
John Gozde [Mon, 4 Apr 2022 17:34:22 +0000 (11:34 -0600)] 
Blueprint 4 (#12391)

* Update blueprint dependencies & LICENSES

* Switch to bp4 namespace; use bp-ns variable in overrides

* Add webpack alias for colors.scss

* Snapshots

* Update selectors in e2e tests

4 months agoPackage kinesis client jar within the extension (#12370)
AmatyaAvadhanula [Mon, 4 Apr 2022 16:01:18 +0000 (21:31 +0530)] 
Package kinesis client jar within the extension (#12370)

amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.

Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.

4 months agoIncrease default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE...
Tejaswini Bandlamudi [Mon, 4 Apr 2022 10:58:53 +0000 (16:28 +0530)] 
Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381)

The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the

The default value is now increased to Long.MAX_VALUE.

4 months agoAdd feature flag for Kinesis listShards API usage (#12383)
AmatyaAvadhanula [Mon, 4 Apr 2022 09:28:10 +0000 (14:58 +0530)] 
Add feature flag for Kinesis listShards API usage (#12383)

listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.

However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).

A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.

4 months agoIntroducing a new config to ignore nulls while computing String Cardinality (#12345)
somu-imply [Tue, 29 Mar 2022 21:31:36 +0000 (14:31 -0700)] 
Introducing a new config to ignore nulls while computing String Cardinality (#12345)

* Counting nulls in String cardinality with a config

* Adding tests for the new config

* Wrapping the vectorize part to allow backward compatibility

* Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes

* Updating testcase and code

* Adding null handling test to improve coverage

* Checkstyle fix

* Adding 1 more change in docs

* Making docs clearer

4 months agoDocs - S3 masking and nav update to S3 page (#11490)
Peter Marshall [Tue, 29 Mar 2022 16:13:05 +0000 (17:13 +0100)] 
Docs - S3 masking and nav update to S3 page (#11490)

* Docs: Masking S3 creds and some rewording

Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688

* Removed bold in one of the quote sections

* Update s3.md

* Update s3.md

Quick grammar change

* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md


* Update docs/development/extensions-core/s3.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md

Active lang

* Update s3.md

LAng nit

* Update native-batch.md

LAng nit

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Grammar tidy-up and link fix

Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes.

* Update docs/development/extensions-core/s3.md

* Update s3.md

Removed an Erroneous E

Co-authored-by: Charles Smith <techdocsmith@gmail.com>