druid.git
29 hours agoFlakiness and exceptions during tests (#12705) master
Abhishek Agarwal [Tue, 28 Jun 2022 05:06:23 +0000 (10:36 +0530)] 
Flakiness and exceptions during tests (#12705)

3 days agoAdd IT-related changes pulled out of PR #12368 (#12673)
Paul Rogers [Sat, 25 Jun 2022 20:43:59 +0000 (13:43 -0700)] 
Add IT-related changes pulled out of PR #12368 (#12673)

This commit contains changes made to the existing ITs to support the new ITs.

Changes:
- Make the "custom node role" code usable by the new ITs.
- Use flag `-DskipITs` to skips the integration tests but runs unit tests.
- Use flag `-DskipUTs` skips unit tests but runs the "new" integration tests.
- Expand the existing Druid profile, `-P skip-tests` to skip both ITs and UTs.

4 days agoRevert changes from #12672 (#12703)
Paul Rogers [Sat, 25 Jun 2022 03:40:44 +0000 (20:40 -0700)] 
Revert changes from #12672 (#12703)

* Revert changes from #12672

* Reverted more conflicting changes

Changes are not needed given previous reversions.

4 days agoRevert "SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600...
Gian Merlino [Sat, 25 Jun 2022 03:38:26 +0000 (20:38 -0700)] 
Revert "SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600)" (#12679)

This reverts commit 8fbf92e047f792ff1c69bf67d14784ac55eee88f.

4 days agoUpdate ORC to 1.7.5 (#12667)
William Hyun [Fri, 24 Jun 2022 23:08:42 +0000 (16:08 -0700)] 
Update ORC to 1.7.5 (#12667)

4 days agoFix flaky KafkaIndexTaskTest. (#12657)
Gian Merlino [Fri, 24 Jun 2022 20:53:51 +0000 (13:53 -0700)] 
Fix flaky KafkaIndexTaskTest. (#12657)

* Fix flaky KafkaIndexTaskTest.

The testRunTransactionModeRollback case had many race conditions. Most notably,
it would commit a transaction and then immediately check to see that the results
were *not* indexed. This is racey because it relied on the indexing thread being
slower than the test thread.

Now, the case waits for the transaction to be processed by the indexing thread
before checking the results.

* Changes from review.

5 days agoAble to filter Cloud objects with glob notation. (#12659)
Didip Kerabat [Fri, 24 Jun 2022 06:10:08 +0000 (23:10 -0700)] 
Able to filter Cloud objects with glob notation. (#12659)

In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.

5 days agoThrow BadQueryContextException if context params cannot be parsed (#12680)
Tejaswini Bandlamudi [Fri, 24 Jun 2022 03:51:25 +0000 (09:21 +0530)] 
Throw BadQueryContextException if context params cannot be parsed (#12680)

5 days agoDisable autokill of segments by default. (#12693)
Gian Merlino [Fri, 24 Jun 2022 00:17:11 +0000 (17:17 -0700)] 
Disable autokill of segments by default. (#12693)

Also add clarifying commentary to the documentation about how durationToRetain works.

5 days agoCleanup changes pulled out of PR #12368 (#12672)
Paul Rogers [Thu, 23 Jun 2022 17:49:50 +0000 (10:49 -0700)] 
Cleanup changes pulled out of PR #12368 (#12672)

This commit contains the cleanup needed for the new integration test framework.

Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`

5 days agoFix hadoop library location for integration tests (#12497)
Jihoon Son [Thu, 23 Jun 2022 15:39:54 +0000 (08:39 -0700)] 
Fix hadoop library location for integration tests (#12497)

6 days agoFix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658)
Gian Merlino [Thu, 23 Jun 2022 05:29:16 +0000 (22:29 -0700)] 
Fix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658)

The TestEmitter is used from different threads without concurrency
control. This patch makes the emitter thread-safe.

7 days agoAdd query context param `forceExpressionVirtualColumns` to always use "expression...
Kashif Faraz [Wed, 22 Jun 2022 10:03:50 +0000 (15:33 +0530)] 
Add query context param `forceExpressionVirtualColumns` to always use "expression"-type virtual columns in query plan (#12583)

SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE`
are planned as specialized virtual columns instead of the default `expression`-type virtual columns.
This commit adds a new context parameter to force the `expression`-type virtual columns.

Changes
- Add query context param `forceExpressionVirtualColumns`
- Use context param to determine if specialized virtual columns should be used or not
- Moved some tests into `CalciteExplainQueryTest`

7 days agoAdd CVEs for Hadoop3 (#12336)
AmatyaAvadhanula [Wed, 22 Jun 2022 08:42:17 +0000 (14:12 +0530)] 
Add CVEs for Hadoop3 (#12336)

* Add CVEs

* Move CVEs under hadoop3 section

7 days agoUpdate default value of `inputSegmentSizeBytes` in configuration docs (#12678)
Tejaswini Bandlamudi [Wed, 22 Jun 2022 03:35:03 +0000 (09:05 +0530)] 
Update default value of `inputSegmentSizeBytes` in configuration docs (#12678)

7 days agoAdd TIME_IN_INTERVAL SQL operator. (#12662)
Gian Merlino [Tue, 21 Jun 2022 20:05:37 +0000 (13:05 -0700)] 
Add TIME_IN_INTERVAL SQL operator. (#12662)

* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.

7 days agoReduce interval creation cost for segment cost computation (#12670)
AmatyaAvadhanula [Tue, 21 Jun 2022 12:09:43 +0000 (17:39 +0530)] 
Reduce interval creation cost for segment cost computation (#12670)

Changes:
- Reuse created interval in `SegmentId.getInterval()`
- Intern intervals to save on memory footprint

8 days agoLazy Initialisation of Orc extensions module (#12663)
Tejaswini Bandlamudi [Tue, 21 Jun 2022 05:43:10 +0000 (11:13 +0530)] 
Lazy Initialisation of Orc extensions module (#12663)

* Lazy initialization of Orc extension

* nit

* moving intialize method to OrcInputFormat

10 days agoScanQuery: Fix JsonIgnore for isLegacy. (#12674)
Gian Merlino [Sat, 18 Jun 2022 22:55:54 +0000 (15:55 -0700)] 
ScanQuery: Fix JsonIgnore for isLegacy. (#12674)

True, false, and null have different meanings: true/false mean "legacy"
and "not legacy"; null means use the default set by ScanQueryConfig.
So, we need to respect this in the JsonIgnore setup.

11 days agoFix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669)
Gian Merlino [Fri, 17 Jun 2022 23:15:50 +0000 (16:15 -0700)] 
Fix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669)

* Fix self-referential shape inspection in BaseExpressionColumnValueSelector.

The new test would throw StackOverflowError on the old code.

* Restore prior test.

11 days agosplit out null value index (#12627)
Clint Wylie [Fri, 17 Jun 2022 22:29:23 +0000 (15:29 -0700)] 
split out null value index (#12627)

* split out null value index

* gg spotbugs

* fix stuff

12 days agoRemove null and empty fields from native queries (#12634) 12668/head
Paul Rogers [Thu, 16 Jun 2022 21:07:25 +0000 (14:07 -0700)] 
Remove null and empty fields from native queries (#12634)

* Remove null and empty fields from native queries

* Test fixes

* Attempted IT fix.

* Revisions from review comments

* Build fixes resulting from changes suggested by reviews

* IT fix for changed segment size

12 days agoSegments doc update (#12344)
Jill Osborne [Thu, 16 Jun 2022 20:25:17 +0000 (21:25 +0100)] 
Segments doc update (#12344)

* Corrected heading levels in segments doc

* IMPLY-18394: Updated Segments doc

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update segments.md

* Updated links to changed headings in Segments doc

* Corrected spelling error

* Update segments.md

Incorporated suggestions from Paul Rogers.

* Update index.md

* Update segments.md

* Update segments.md

* Update segments.md

* Update compaction.md

* Update docs/design/segments.md

fix typo

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
12 days agoOptimize overlord GET /tasks memory usage (#12404)
AmatyaAvadhanula [Thu, 16 Jun 2022 17:00:37 +0000 (22:30 +0530)] 
Optimize overlord GET /tasks memory usage (#12404)

The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)

Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )

The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.

12 days agoAdd a builder class for TestDruidCoordinatorConfig (#12624)
Lucas Capistrant [Thu, 16 Jun 2022 14:11:31 +0000 (09:11 -0500)] 
Add a builder class for TestDruidCoordinatorConfig (#12624)

* Add a builder class for TestDruidCoordinatorConfig

* updates after review

* Fix formatting

13 days agoUpdate screenshots for Druid console doc (#12593)
Victoria Lim [Wed, 15 Jun 2022 23:42:20 +0000 (16:42 -0700)] 
Update screenshots for Druid console doc (#12593)

* druid console doc updates

* remove extra image

* Apply suggestions from code review

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* updated screenshot labels

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
13 days agoForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592)
Gian Merlino [Wed, 15 Jun 2022 22:56:32 +0000 (15:56 -0700)] 
ForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592)

* ForkingTaskRunner: Set ActiveProcessorCount for tasks.

This prevents various automatically-sized thread pools from being unreasonably
large (we don't want each task to size its pools as if it is the only thing on
the entire machine).

* Fix tests.

* Add missing LifecycleStart annotation.

* ForkingTaskRunner needs ManageLifecycle.

13 days agoClean up query contexts (#12633)
Paul Rogers [Wed, 15 Jun 2022 18:31:22 +0000 (11:31 -0700)] 
Clean up query contexts (#12633)

* Clean up query contexts

Uses constants in place of literal strings for context keys.
Moves some QueryContext methods to QueryContexts for reuse.

* Revisions from review comments

2 weeks agoSupport LoadScope for Peons + Access Modifier Updates (#12640)
Rohan Garg [Wed, 15 Jun 2022 04:52:50 +0000 (10:22 +0530)] 
Support LoadScope for Peons + Access Modifier Updates (#12640)

* Support LoadScope for Peons

* Update access modifiers for GroupByEngineV2

2 weeks agoNettyHttpClient: Fix double-return on certain exceptions. (#12626)
Gian Merlino [Wed, 15 Jun 2022 04:40:47 +0000 (21:40 -0700)] 
NettyHttpClient: Fix double-return on certain exceptions. (#12626)

The "exceptionCaught" handler may get called multiple times. We should
only return the channel to the pool the first time. Returning it more
than once leads to a warning like "Resource at key[%s] was returned
multiple times?"

2 weeks agoAdd QoSFilters first in the chain. (#12625)
Gian Merlino [Tue, 14 Jun 2022 20:37:00 +0000 (13:37 -0700)] 
Add QoSFilters first in the chain. (#12625)

* Add QoSFilters first in the chain.

When a request is suspended and later resumed due to QoS constraints,
its filter chain is restarted. Placing QoSFilters first in the chain
avoids double-execution of other filters.

Fixes an issue where requests deferred by QoS would report 403 Forbidden
due to double-execution of SecuritySanityCheckFilter.

* Smaller changes.

* Add QoS filters in BaseJettyTest.

* Remove unused parameter.

2 weeks agoNettyHttpClient: Replace ReadTimeoutException with our own exception. (#12635)
Gian Merlino [Tue, 14 Jun 2022 20:34:46 +0000 (13:34 -0700)] 
NettyHttpClient: Replace ReadTimeoutException with our own exception. (#12635)

* NettyHttpClient: Replace ReadTimeoutException with our own exception.

* Replace exception with same type.

* Remove unused import.

2 weeks agoWeb console: totalNumMergeTasks can be set on range also (#12648)
Vadim Ogievetsky [Tue, 14 Jun 2022 18:18:17 +0000 (11:18 -0700)] 
Web console: totalNumMergeTasks can be set on range also (#12648)

* totalNumMergeTasks can be set on range also

* fix formatting

2 weeks agoFix version in master (#12644)
Atul Mohan [Tue, 14 Jun 2022 06:02:46 +0000 (23:02 -0700)] 
Fix version in master (#12644)

2 weeks agoPush join build table values as filter incase of duplicates (#12225)
Rohan Garg [Tue, 14 Jun 2022 00:18:27 +0000 (05:48 +0530)] 
Push join build table values as filter incase of duplicates (#12225)

* Push join build table values as filter

* Add tests for JoinableFactoryWrapper

* fixup! Push join build table values as filter

* fixup! Add tests for JoinableFactoryWrapper

* fixup! Push join build table values as filter

2 weeks agofix: update footer copyright year (#12594)
317brian [Mon, 13 Jun 2022 23:29:58 +0000 (16:29 -0700)] 
fix: update footer copyright year (#12594)

2 weeks agoUpdate node to 14.19.3. (#12632)
Gian Merlino [Fri, 10 Jun 2022 17:18:12 +0000 (10:18 -0700)] 
Update node to 14.19.3. (#12632)

2 weeks agoDocs for automatic compaction (#12569)
Victoria Lim [Thu, 9 Jun 2022 21:55:12 +0000 (14:55 -0700)] 
Docs for automatic compaction (#12569)

* docs for auto-compaction

* fix broken links

* another link

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>
* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>
* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
* reorg content for skipOffset

* Update docs/ingestion/automatic-compaction.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
3 weeks agoUse DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613)
TSFenwick [Wed, 8 Jun 2022 07:22:50 +0000 (00:22 -0700)] 
Use DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613)

Fixes an issue where sql query request logs do not include the default query context
values set via `druid.query.default.context.xyz` runtime properties.

# Change summary
* Inject `DefaultQueryConfig` into `SqlLifecycleFactory`
* Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle`

# Description
- This change does not affect query execution. This is because the
  `DefaultQueryConfig` was already being used in `QueryLifecycle`,
   which is initialized when the SQL is translated to a native query.
- This also handles any potential use case where a context parameter should be
   handled at the SQL stage itself.

3 weeks agoSqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600)
Gian Merlino [Tue, 7 Jun 2022 18:33:46 +0000 (11:33 -0700)] 
SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600)

* SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments.

Segments with endpoints prior to year 0 or after year 9999 may overlap
the search intervals but not match the generated SQL conditions. So, we
need to add an additional OR condition to catch these.

I checked a real, live MySQL metadata store to confirm that the query
still uses metadata store indexes. It does.

* Add comments.

3 weeks agoAdd remedial information in error message when type is unknown (#12612)
Abhishek Agarwal [Tue, 7 Jun 2022 14:52:45 +0000 (20:22 +0530)] 
Add remedial information in error message when type is unknown (#12612)

Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.

3 weeks agoAdd validation for invalid partitioned by granularities (#12589)
Laksh Singla [Mon, 6 Jun 2022 16:30:29 +0000 (22:00 +0530)] 
Add validation for invalid partitioned by granularities (#12589)

* Add validation for invalid partitioned by granularities

* review comments

* improve error message, change location of the method

* remove imports

* use StringUtils.lowercase

Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
3 weeks agoImprove SQL validation error messages (#12611)
Adarsh Sanjeev [Mon, 6 Jun 2022 10:44:28 +0000 (16:14 +0530)] 
Improve SQL validation error messages (#12611)

Update the SQL validation error message to specify whether
the ingest is INSERT or REPLACE for better user experience.

3 weeks agoCompressionStrategyTest: Fix thread-unsafe Closer usage. (#12605)
Gian Merlino [Sat, 4 Jun 2022 17:57:13 +0000 (10:57 -0700)] 
CompressionStrategyTest: Fix thread-unsafe Closer usage. (#12605)

Closer is not thread-safe, so we need one per thread in the
concurrency tests.

3 weeks agoAdd caching and CSP response headers. (#12609)
Gian Merlino [Sat, 4 Jun 2022 16:16:49 +0000 (09:16 -0700)] 
Add caching and CSP response headers. (#12609)

* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
3 weeks agofix typo (#12607)
Victoria Lim [Sat, 4 Jun 2022 05:14:18 +0000 (22:14 -0700)] 
fix typo (#12607)

3 weeks agoService stdout log files, move logs to log/. (#12570)
Gian Merlino [Fri, 3 Jun 2022 05:14:29 +0000 (22:14 -0700)] 
Service stdout log files, move logs to log/. (#12570)

* Service stdout log files, move logs to log/.

Two changes that make log behavior cleaner:

1) Redirect messages from the Java runtime to their own log files.
   Otherwise, they would get jumbled up in the output of the all-in-one
   start command.

2) Use log/ instead of bin/log/ for the default log directory. Makes them
   easier to find.

Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.

* Spelling.

* See if code formatting affects spelling.

3 weeks agoAddition to Multitenancy considerations doc (#12567)
Jill Osborne [Thu, 2 Jun 2022 17:32:14 +0000 (18:32 +0100)] 
Addition to Multitenancy considerations doc (#12567)

* Small addition to Multitenancy considerations doc

* Update docs/querying/multitenancy.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update multitenancy.md

Edit suggested by @kfaraz

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
3 weeks agoBump eventsource from 1.1.0 to 1.1.1 in /web-console (#12595)
dependabot[bot] [Thu, 2 Jun 2022 05:04:30 +0000 (22:04 -0700)] 
Bump eventsource from 1.1.0 to 1.1.1 in /web-console (#12595)

Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agoBump eventsource from 1.0.7 to 1.1.1 in /website (#12596)
dependabot[bot] [Thu, 2 Jun 2022 05:04:04 +0000 (22:04 -0700)] 
Bump eventsource from 1.0.7 to 1.1.1 in /website (#12596)

Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.0.7...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agofix regression with ipv4_match and prefixes (#12542)
Clint Wylie [Wed, 1 Jun 2022 21:03:08 +0000 (14:03 -0700)] 
fix regression with ipv4_match and prefixes (#12542)

* fix issue with ipv4_match and prefixes

3 weeks agoBump lodash from 4.17.15 to 4.17.21 in /website (#12409)
dependabot[bot] [Wed, 1 Jun 2022 20:56:22 +0000 (13:56 -0700)] 
Bump lodash from 4.17.15 to 4.17.21 in /website (#12409)

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.15...4.17.21)

---
updated-dependencies:
- dependency-name: lodash
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agoBump opentelemetry-instrumentation-bom-alpha (#12531)
dependabot[bot] [Wed, 1 Jun 2022 20:51:39 +0000 (13:51 -0700)] 
Bump opentelemetry-instrumentation-bom-alpha (#12531)

Bumps [opentelemetry-instrumentation-bom-alpha](https://github.com/open-telemetry/opentelemetry-java-instrumentation) from 1.7.0-alpha to 1.14.0-alpha.
- [Release notes](https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-java-instrumentation/commits)

---
updated-dependencies:
- dependency-name: io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agofix backwards compatibility for explicit null columns (#12585)
Clint Wylie [Wed, 1 Jun 2022 19:39:48 +0000 (12:39 -0700)] 
fix backwards compatibility for explicit null columns (#12585)

3 weeks agoSuppress CVEs (#12590)
AmatyaAvadhanula [Wed, 1 Jun 2022 15:52:32 +0000 (21:22 +0530)] 
Suppress CVEs (#12590)

4 weeks agofix test comment (#12584)
Clint Wylie [Tue, 31 May 2022 19:39:20 +0000 (12:39 -0700)] 
fix test comment (#12584)

4 weeks agofix compression-strategy-test (#12575)
Clint Wylie [Tue, 31 May 2022 18:48:32 +0000 (11:48 -0700)] 
fix compression-strategy-test (#12575)

fixes an issue caused by a test modification in #12408 that was closing buffers allocated by the compression strategy instead of allowing the closer to do it

4 weeks agoRowBasedColumnSelectorFactory: Add "useStringValueOfNullInLists" parameter. (#12578)
Gian Merlino [Tue, 31 May 2022 18:38:56 +0000 (11:38 -0700)] 
RowBasedColumnSelectorFactory: Add "useStringValueOfNullInLists" parameter. (#12578)

RowBasedColumnSelectorFactory inherited strange behavior from
Rows.objectToStrings for nulls that appear in lists: instead of being
left as a null, it is replaced with the string "null". Some callers may
need compatibility with this strange behavior, but it should be opt-in.

Query-time call sites are changed to opt-out of this behavior, since it
is not consistent with query-time expectations. The IncrementalIndex
ingestion-time call site retains the old behavior, as this is traditionally
when Rows.objectToStrings would be used.

4 weeks agoCompressionUtils: Increase gzip buffer size. (#12579)
Gian Merlino [Tue, 31 May 2022 18:38:13 +0000 (11:38 -0700)] 
CompressionUtils: Increase gzip buffer size. (#12579)

4 weeks agoAdd RowIdSupplier to ColumnSelectorFactory. (#12577)
Gian Merlino [Tue, 31 May 2022 18:38:03 +0000 (11:38 -0700)] 
Add RowIdSupplier to ColumnSelectorFactory. (#12577)

* Add RowIdSupplier to ColumnSelectorFactory.

This enables virtual columns to cache their outputs in case they are
called multiple times on the same underlying row. This is common for
numeric selectors, where the common pattern is to call isNull() and
then follow with getLong(), getFloat(), or getDouble(). Here, output
caching reduces the number of expression evals by half.

* Fix tests.

4 weeks agofix virtual column cycle bug, sql virtual column optimize bug (#12576)
Clint Wylie [Tue, 31 May 2022 06:51:21 +0000 (23:51 -0700)] 
fix virtual column cycle bug, sql virtual column optimize bug (#12576)

* fix virtual column cycle bug, sql virtual column optimize bug

* more test

4 weeks agoAdding zstandard compression library (#12408)
Dr. Sizzles [Sun, 29 May 2022 00:01:44 +0000 (17:01 -0700)] 
Adding zstandard compression library (#12408)

* Adding zstandard compression library

* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.

* Fixing zstandard version to latest stable version in pom's and updating license files

* Removing zstd from benchmarks and adding to processing (poms)

* fix the intellij inspection issue

* Removing the prefix v for the version in the license check for ztsd

* Fixing license checks

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
4 weeks agoUpgrade ORC to 1.7.4 (#12572)
Dongjoon Hyun [Sat, 28 May 2022 12:14:36 +0000 (05:14 -0700)] 
Upgrade ORC to 1.7.4 (#12572)

This commit upgrades Apache ORC library from 1.7.2 to 1.7.4.
Apache ORC 1.7.4 is the maintenance release with the following bug fixes.

https://orc.apache.org/news/2022/04/15/ORC-1.7.4/
https://github.com/apache/orc/releases/tag/v1.7.4

5 weeks agomake query context changes backwards compatible (#12564)
Clint Wylie [Wed, 25 May 2022 09:54:41 +0000 (02:54 -0700)] 
make query context changes backwards compatible (#12564)

Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers.

The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values.

I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff.

The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.

5 weeks agoobject[] handling for DimensionHandlers for arrays (#12552)
Karan Kumar [Wed, 25 May 2022 09:54:18 +0000 (15:24 +0530)] 
object[] handling for DimensionHandlers for arrays (#12552)

Description
Fixes a bug when running q's like

 SELECT cntarray,
       Count(*)
FROM   (SELECT dim1,
               dim2,
               Array_agg(cnt) AS cntarray
        FROM   (SELECT dim1,
                       dim2,
                       dim3,
                       Count(*) AS cnt
                FROM   foo
                GROUP  BY 1,
                          2,
                          3)
        GROUP  BY 1,
                  2)
GROUP  BY 1
This generates an error:

org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList
        at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx]
Because it's an array of numbers it looks like it does the convertToList call, which looks like:

  @Nullable
  public static ComparableList convertToList(Object obj)
  {
    if (obj == null) {
      return null;
    }
    if (obj instanceof List) {
      return new ComparableList((List) obj);
    }
    if (obj instanceof ComparableList) {
      return (ComparableList) obj;
    }
    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
  }
I.e. it doesn't know about arrays. Added the array handling as part of this PR.

5 weeks agoSuppress false CVE on druid-indexing-hadoop artifact (#12562)
Abhishek Agarwal [Tue, 24 May 2022 10:30:58 +0000 (16:00 +0530)] 
Suppress false CVE on druid-indexing-hadoop artifact (#12562)

5 weeks agoUse a different repository to download sigar artifacts. (#12561)
Abhishek Agarwal [Tue, 24 May 2022 09:12:51 +0000 (14:42 +0530)] 
Use a different repository to download sigar artifacts. (#12561)

5 weeks agoEmit state of replace and append for native batch tasks (#12488)
Agustin Gonzalez [Mon, 23 May 2022 19:32:47 +0000 (12:32 -0700)] 
Emit state of replace and append for native batch tasks (#12488)

* Emit state of replace and append for native batch tasks

* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)

* Add metric to compaction job

* Avoid null ptr exc when null emitter

* Coverage

* Emit tombstone & segment counts

* Tasks need a type

* Spelling

* Integrate BatchIngestionMode in batch ingestion tasks functionality

* Typos

* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.

* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.

* Spelling

* Avoid polluting the Task interface

* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.

5 weeks agoAdd error message for incorrectly ordered clause in sql (#12558)
Adarsh Sanjeev [Mon, 23 May 2022 07:11:18 +0000 (12:41 +0530)] 
Add error message for incorrectly ordered clause in sql (#12558)

In the case that the clustered by is before the partitioned by for an sql query, the error message is a bit confusing.

insert into foo select * from bar clustered by dim1 partitioned by all

Error: SQL parse failed

Encountered "PARTITIONED" at line 1, column 88.

Was expecting one of: <EOF> "," ... "ASC" ... "DESC" ... "NULLS" ... "." ... "NOT" ... "IN" ... "<" ... "<=" ... ">" ... ">=" ... "=" ... "<>" ... "!=" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "+" ... "-" ... "*" ... "/" ... "%" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "CONTAINS" ... "OVERLAPS" ... "EQUALS" ... "PRECEDES" ... "SUCCEEDS" ... "IMMEDIATELY" ... "MULTISET" ... "[" ... "FORMAT" ... "(" ... Less...

org.apache.calcite.sql.parser.SqlParseException
This is a bit confusing and adding a check could be added to throw a more user friendly message stating that the order should be reversed.

Add error message for incorrectly ordered clause in sql.

5 weeks agoSuppress CVEs (#12553)
AmatyaAvadhanula [Mon, 23 May 2022 07:05:23 +0000 (12:35 +0530)] 
Suppress CVEs (#12553)

5 weeks agoConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling...
Gian Merlino [Sat, 21 May 2022 17:28:54 +0000 (10:28 -0700)] 
ConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling. (#12513)

* ConcurrentGrouper: Add option to always slice up merge buffers thread-locally.

Normally, the ConcurrentGrouper shares merge buffers across processing
threads until spilling starts, and then switches to a thread-local model.
This minimizes memory use and reduces likelihood of spilling, which is
good, but it creates thread contention. The new mergeThreadLocal option
causes a query to start in thread-local mode immediately, and allows us
to experiment with the relative performance of the two modes.

* Fix grammar in docs.

* Fix race in ConcurrentGrouper.

* Fix issue with timeouts.

* Remove unused import.

* Add "tradeoff" to dictionary.

5 weeks agoFix zookeeper spelling (#12556)
Katya Macedo [Sat, 21 May 2022 08:14:02 +0000 (03:14 -0500)] 
Fix zookeeper spelling (#12556)

5 weeks agoupdate to latest lz4 1.8.0 (#12557)
Clint Wylie [Sat, 21 May 2022 08:02:20 +0000 (01:02 -0700)] 
update to latest lz4 1.8.0 (#12557)

5 weeks agoDeal with potential cardinality estimate being negative and add logging to hash deter...
Agustin Gonzalez [Fri, 20 May 2022 17:51:06 +0000 (10:51 -0700)] 
Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase (#12443)

* Deal with potential cardinality estimate being negative and add logging

* Fix typo in name

* Refine and minimize logging

* Make it info based on code review

* Create a named constant for the magic number

5 weeks agoFix usage of maxColumnsToMerge in auto-compaction tuning config (#12551)
superivaj [Fri, 20 May 2022 16:53:08 +0000 (18:53 +0200)] 
Fix usage of maxColumnsToMerge in auto-compaction tuning config (#12551)

Issue:
Even though `CompactionTuningConfig` allows a `maxColumnsToMerge` config
(to optimize memory usage, particulary for datasources with many dimensions),
the corresponding client object `ClientCompactionTaskQueryTuningConfig`
(used by the coordinator duty `CompactSegments` to trigger auto-compaction)
does not contain this field. Thus, the value of `maxColumnsToMerge` specified
in any datasource compaction config is ignored.

Changes:
- Add field `maxColumnsToMerge` in `ClientCompactionTaskQueryTuningConfig`
  and `UserCompactionTaskQueryTuningConfig`
- Fix tests

5 weeks agoDirect UTF-8 access for "in" filters. (#12517)
Gian Merlino [Fri, 20 May 2022 08:51:28 +0000 (01:51 -0700)] 
Direct UTF-8 access for "in" filters. (#12517)

* Direct UTF-8 access for "in" filters.

Directly related:

1) InDimFilter: Store stored Strings (in ValuesSet) plus sorted UTF-8
   ByteBuffers (in valuesUtf8). Use valuesUtf8 whenever possible. If
   necessary, the input set is copied into a ValuesSet. Much logic is
   simplified, because we always know what type the values set will be.
   I think that there won't even be an efficiency loss in most cases.
   InDimFilter is most frequently created by deserialization, and this
   patch updates the JsonCreator constructor to deserialize
   directly into a ValuesSet.

2) Add Utf8ValueSetIndex, which InDimFilter uses to avoid UTF-8 decodes
   during index lookups.

3) Add unsigned comparator to ByteBufferUtils and use it in
   GenericIndexed.BYTE_BUFFER_STRATEGY. This is important because UTF-8
   bytes can be compared as bytes if, and only if, the comparison
   is unsigned.

4) Add specialization to GenericIndexed.singleThreaded().indexOf that
   avoids needless ByteBuffer allocations.

5) Clarify that objects returned by ColumnIndexSupplier.as are not
   thread-safe. DictionaryEncodedStringIndexSupplier now calls
   singleThreaded() on all relevant GenericIndexed objects, saving
   a ByteBuffer allocation per access.

Also:

1) Fix performance regression in LikeFilter: since #12315, it applied
   the suffix matcher to all values in range even for type MATCH_ALL.

2) Add ObjectStrategy.canCompare() method. This fixes LikeFilterBenchmark,
   which was broken due to calls to strategy.compare in
   GenericIndexed.fromIterable.

* Add like-filter implementation tests.

* Add in-filter implementation tests.

* Add tests, fix issues.

* Fix style.

* Adjustments from review.

5 weeks agoWeb console: fix go to segments not working (#12541)
Vadim Ogievetsky [Thu, 19 May 2022 21:34:03 +0000 (14:34 -0700)] 
Web console: fix go to segments not working (#12541)

* use correct filter syntax

* fix tests

5 weeks agoRemoteTaskRunner: Fix NPE in streamTaskReports. (#12006)
Gian Merlino [Thu, 19 May 2022 21:23:55 +0000 (14:23 -0700)] 
RemoteTaskRunner: Fix NPE in streamTaskReports. (#12006)

* RemoteTaskRunner: Fix NPE in streamTaskReports.

It is possible for a work item to drop out of runningTasks after the
ZkWorker is retrieved. In this case, the current code would throw
an NPE.

* Additional tests and additional fixes.

* Fix import.

5 weeks agoSQL: Add is_active to sys.segments, update examples and docs. (#11550)
Gian Merlino [Thu, 19 May 2022 21:23:28 +0000 (14:23 -0700)] 
SQL: Add is_active to sys.segments, update examples and docs. (#11550)

* SQL: Add is_active to sys.segments, update examples and docs.

is_active is short for:

  (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1

It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.

The web console already adds this filter to a lot of its queries,
proving its usefulness.

This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.

* Wording updates.

* Adjustments for spellcheck.

* Adjust IT.

5 weeks agoDo not alter query timeout in ScanQueryEngine (#12271)
machine424 [Thu, 19 May 2022 16:24:42 +0000 (18:24 +0200)] 
Do not alter query timeout in ScanQueryEngine (#12271)

Add test to detect timeout mutability

5 weeks agoupgrade core Apache Kafka dependencies to 3.2.0 (#12538)
Xavier Léauté [Thu, 19 May 2022 16:04:52 +0000 (09:04 -0700)] 
upgrade core Apache Kafka dependencies to 3.2.0 (#12538)

Announcement: https://blogs.apache.org/kafka/entry/what-s-new-in-apache8
Release notes: https://downloads.apache.org/kafka/3.2.0/RELEASE_NOTES.html

5 weeks agoSlightly improve RTR log messages. (#12540)
Gian Merlino [Thu, 19 May 2022 14:43:55 +0000 (07:43 -0700)] 
Slightly improve RTR log messages. (#12540)

1) Align "Assigning task" log messages between RTR and HRTR.

2) Remove confusing reference to "Coordinator".

3) Move "Not assigning task" message from INFO to DEBUG. It's not super
   important to see this message: we mainly want to see what _does_ get
   assigned.

4) Reword "Task switched from pending to running" message to better
   match the structure of the  "Assigning task" message from the same
   method.

5 weeks agoAdd builder for TaskToolbox. (#12539)
Gian Merlino [Thu, 19 May 2022 14:43:50 +0000 (07:43 -0700)] 
Add builder for TaskToolbox. (#12539)

* Add builder for TaskToolbox.

The main purpose of this change is to make it easier to create
TaskToolboxes in tests. However, the builder is used in production
too, by TaskToolboxFactory.

* Fix imports, adjust formatting.

* Fix import.

5 weeks agoFree ByteBuffers in tests and fix some bugs. (#12521)
Gian Merlino [Thu, 19 May 2022 14:42:29 +0000 (07:42 -0700)] 
Free ByteBuffers in tests and fix some bugs. (#12521)

* Ensure ByteBuffers allocated in tests get freed.

Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.

To fix this:

1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
   This makes it easy to free the direct buffer. Currently, it's only used
   in tests, because production code seems OK.

2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
   to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
   ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
   reason for the buffer to be off-heap). Make sure to close all direct
   holders when done.

* Changes based on CI results.

* A different approach.

* Roll back BitmapOperationTest stuff.

* Try additional surefire memory.

* Revert "Roll back BitmapOperationTest stuff."

This reverts commit 49f846d9e3d0904df6c685d403766c07531b15e5.

* Add TestBufferPool.

* Revert Xmx change in tests.

* Better behaved NestedQueryPushDownTest. Exit tests on OOME.

* Fix TestBufferPool.

* Remove T1C from ARM tests.

* Somewhat safer.

* Fix tests.

* Fix style stuff.

* Additional debugging.

* Reset null / expr configs better.

* ExpressionLambdaAggregatorFactory thread-safety.

* Alter forkNode to try to get better info when a JVM crashes.

* Fix buffer retention in ExpressionLambdaAggregatorFactory.

* Remove unused import.

5 weeks agoUpdates default inputSegmentSizeBytes in Compaction config (#12534)
Tejaswini Bandlamudi [Thu, 19 May 2022 09:13:34 +0000 (14:43 +0530)] 
Updates default inputSegmentSizeBytes in Compaction config (#12534)

Fixes Cannot serialize BigInt value as JSON error while loading compaction config in console.

5 weeks agoCVE suppression (#12535)
AmatyaAvadhanula [Thu, 19 May 2022 05:51:48 +0000 (11:21 +0530)] 
CVE suppression (#12535)

6 weeks agoSql docs items (#12530)
Charles Smith [Tue, 17 May 2022 23:56:31 +0000 (16:56 -0700)] 
Sql docs items (#12530)

* touch up sql refactor

* brush up SQL refactor

* incorporate feedback

* reorder sql

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
6 weeks agoFix typo, add comma (#12529)
Katya Macedo [Tue, 17 May 2022 23:42:47 +0000 (18:42 -0500)] 
Fix typo, add comma (#12529)

6 weeks agoAdd cluster by support for replace syntax (#12524)
Adarsh Sanjeev [Tue, 17 May 2022 09:45:29 +0000 (15:15 +0530)] 
Add cluster by support for replace syntax (#12524)

* Add cluster by support for replace syntax

* Add unit test for with list

6 weeks agoprint replication levels in coordinator segment logs (#12511)
Clint Wylie [Tue, 17 May 2022 09:24:13 +0000 (02:24 -0700)] 
print replication levels in coordinator segment logs (#12511)

* print replication levels in coordinator segment logs

* add served segment count to stats

* also for drops

6 weeks agoImprove error messages from SQL REPLACE syntax (#12523)
Adarsh Sanjeev [Tue, 17 May 2022 04:25:58 +0000 (09:55 +0530)] 
Improve error messages from SQL REPLACE syntax (#12523)

- Add user friendly error messages for missing or incorrect OVERWRITE clause for REPLACE SQL query
- Move validation of missing OVERWRITE clause at code level instead of parser for custom error message

6 weeks agoImproved docs for range partitioning. (#12350)
Gian Merlino [Mon, 16 May 2022 16:42:31 +0000 (09:42 -0700)] 
Improved docs for range partitioning. (#12350)

* Improved docs for range partitioning.

1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.

* Additional clarification.

* Update other section.

* Another adjustment.

* Updates from review.

6 weeks agoClarify the use of the Lookup API (#12088)
Hellmar Becker [Mon, 16 May 2022 14:50:24 +0000 (16:50 +0200)] 
Clarify the use of the Lookup API (#12088)

* Update lookups.md

* Update docs/querying/lookups.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* Update docs/querying/lookups.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
6 weeks agodocs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)
317brian [Mon, 16 May 2022 14:48:33 +0000 (07:48 -0700)] 
docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"

This reverts commit cadd1fdc604de414379bffe9986ae64b9cf51fc6.

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md

fix spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
6 weeks agoEnable vectorized virtual column processing by default. (#12520)
Gian Merlino [Mon, 16 May 2022 10:13:53 +0000 (03:13 -0700)] 
Enable vectorized virtual column processing by default. (#12520)

In the majority of cases, this improves performance.

There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.

IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue

6 weeks agoEnforce console logging for peon process (#12067)
Frank Chen [Mon, 16 May 2022 09:37:21 +0000 (17:37 +0800)] 
Enforce console logging for peon process (#12067)

Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.

But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.

So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.

6 weeks agoAdd setProcessingThreadNames context parameter. (#12514)
Gian Merlino [Mon, 16 May 2022 08:12:00 +0000 (01:12 -0700)] 
Add setProcessingThreadNames context parameter. (#12514)

setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.

6 weeks agoTask queue unblock (#12099)
Jason Koch [Sat, 14 May 2022 23:44:29 +0000 (16:44 -0700)] 
Task queue unblock (#12099)

* concurrency: introduce GuardedBy to TaskQueue

* perf: Introduce TaskQueueScaleTest to test performance of TaskQueue with large task counts

This introduces a test case to confirm how long it will take to launch and manage (aka shutdown)
a large number of threads in the TaskQueue.

h/t to @gianm for main implementation.

* perf: improve scalability of TaskQueue with large task counts

* linter fixes, expand test coverage

* pr feedback suggestion; swap to different linter

* swap to use SuppressWarnings

* Fix TaskQueueScaleTest.

Co-authored-by: Gian Merlino <gian@imply.io>
6 weeks agoUse datasketches version 3.2.0 (#12509)
Kashif Faraz [Fri, 13 May 2022 05:58:15 +0000 (11:28 +0530)] 
Use datasketches version 3.2.0 (#12509)

Changes:
- Use apache datasketches version 3.2.0.
- Remove unsafe reflection-based usage of datasketch internals added in #12022

6 weeks agoAdd replace statement to sql parser (#12386)
Adarsh Sanjeev [Fri, 13 May 2022 05:26:40 +0000 (10:56 +0530)] 
Add replace statement to sql parser (#12386)

Relevant Issue: #11929

- Add custom replace statement to Druid SQL parser.
- Edit DruidPlanner to convert relevant fields to Query Context.
- Refactor common code with INSERT statements to reuse them for REPLACE where possible.