Charles Smith [Tue, 17 May 2022 23:56:31 +0000 (16:56 -0700)]
Sql docs items (#12530)
* touch up sql refactor
* brush up SQL refactor
* incorporate feedback
* reorder sql
* Update docs/querying/sql.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Katya Macedo [Tue, 17 May 2022 23:42:47 +0000 (18:42 -0500)]
Fix typo, add comma (#12529)
Adarsh Sanjeev [Tue, 17 May 2022 09:45:29 +0000 (15:15 +0530)]
Add cluster by support for replace syntax (#12524)
* Add cluster by support for replace syntax
* Add unit test for with list
Clint Wylie [Tue, 17 May 2022 09:24:13 +0000 (02:24 -0700)]
print replication levels in coordinator segment logs (#12511)
* print replication levels in coordinator segment logs
* add served segment count to stats
* also for drops
Adarsh Sanjeev [Tue, 17 May 2022 04:25:58 +0000 (09:55 +0530)]
Improve error messages from SQL REPLACE syntax (#12523)
- Add user friendly error messages for missing or incorrect OVERWRITE clause for REPLACE SQL query
- Move validation of missing OVERWRITE clause at code level instead of parser for custom error message
Gian Merlino [Mon, 16 May 2022 16:42:31 +0000 (09:42 -0700)]
Improved docs for range partitioning. (#12350)
* Improved docs for range partitioning.
1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.
* Additional clarification.
* Update other section.
* Another adjustment.
* Updates from review.
Hellmar Becker [Mon, 16 May 2022 14:50:24 +0000 (16:50 +0200)]
Clarify the use of the Lookup API (#12088)
* Update lookups.md
* Update docs/querying/lookups.md
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* Update docs/querying/lookups.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
317brian [Mon, 16 May 2022 14:48:33 +0000 (07:48 -0700)]
docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)
* docs(fix): clarify how worker.version and minWorkerVersion comparison works
* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"
This reverts commit
cadd1fdc604de414379bffe9986ae64b9cf51fc6.
* docs(fix): clarify how worker.version and minWorkerVersion comparison works
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md
fix spelling
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Gian Merlino [Mon, 16 May 2022 10:13:53 +0000 (03:13 -0700)]
Enable vectorized virtual column processing by default. (#12520)
In the majority of cases, this improves performance.
There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.
IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue
Frank Chen [Mon, 16 May 2022 09:37:21 +0000 (17:37 +0800)]
Enforce console logging for peon process (#12067)
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.
But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.
So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
Gian Merlino [Mon, 16 May 2022 08:12:00 +0000 (01:12 -0700)]
Add setProcessingThreadNames context parameter. (#12514)
setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.
Jason Koch [Sat, 14 May 2022 23:44:29 +0000 (16:44 -0700)]
Task queue unblock (#12099)
* concurrency: introduce GuardedBy to TaskQueue
* perf: Introduce TaskQueueScaleTest to test performance of TaskQueue with large task counts
This introduces a test case to confirm how long it will take to launch and manage (aka shutdown)
a large number of threads in the TaskQueue.
h/t to @gianm for main implementation.
* perf: improve scalability of TaskQueue with large task counts
* linter fixes, expand test coverage
* pr feedback suggestion; swap to different linter
* swap to use SuppressWarnings
* Fix TaskQueueScaleTest.
Co-authored-by: Gian Merlino <gian@imply.io>
Kashif Faraz [Fri, 13 May 2022 05:58:15 +0000 (11:28 +0530)]
Use datasketches version 3.2.0 (#12509)
Changes:
- Use apache datasketches version 3.2.0.
- Remove unsafe reflection-based usage of datasketch internals added in #12022
Adarsh Sanjeev [Fri, 13 May 2022 05:26:40 +0000 (10:56 +0530)]
Add replace statement to sql parser (#12386)
Relevant Issue: #11929
- Add custom replace statement to Druid SQL parser.
- Edit DruidPlanner to convert relevant fields to Query Context.
- Refactor common code with INSERT statements to reuse them for REPLACE where possible.
Abhishek Radhakrishnan [Thu, 12 May 2022 05:06:20 +0000 (01:06 -0400)]
Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634)
* Add ipaddress library as dependency.
* IPv4 functions to use the inet.ipaddr package.
* Remove unused imports.
* Add new function.
* Minor rename.
* Add more unit tests.
* IPv4 address expr utils unit tests and address options.
* Adjust the IPv4Util functions.
* Move the UTs a bit around.
* Javadoc comments.
* Add license info for IPAddress.
* Fix groupId, artifact and version in license.yaml.
* Remove redundant subnet in messages - fixes UT.
* Remove unused commons-net dependency for /processing project.
* Make class and methods public so it can be accessed.
* Add initial version of benchmark
* Add subnetutils package for benchmarks.
* Auto generate ip addresses.
* Add more v4 address representations in setup to avoid bias.
* Use ThreadLocalRandom to avoid forbidden API usage.
* Adjust IPv4AddressBenchmark to adhere to codestyle rules.
* Update ipaddress library to latest 5.3.4
* Add ipaddress package dependency to benchmarks project.
Clint Wylie [Wed, 11 May 2022 06:27:08 +0000 (23:27 -0700)]
remake column indexes and query processing of filters (#12388)
Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
Lucas Capistrant [Wed, 11 May 2022 02:05:15 +0000 (21:05 -0500)]
Allow coordinator to be configured to kill segments in future (#10877)
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.
A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.
Kashif Faraz [Tue, 10 May 2022 12:05:59 +0000 (05:05 -0700)]
Docs: Fix column name in ingestion rollup doc (#12036)
Fix the referred column name from "count" to "num_rows" as "count" vs. "COUNT(*)" might be a little confusing in this example.
Rohan Garg [Tue, 10 May 2022 09:53:42 +0000 (15:23 +0530)]
Add feature flag for sql planning of TimeBoundary queries (#12491)
* Add feature flag for sql planning of TimeBoundary queries
* fixup! Add feature flag for sql planning of TimeBoundary queries
* Add documentation for enableTimeBoundaryPlanning
* fixup! Add documentation for enableTimeBoundaryPlanning
somu-imply [Tue, 10 May 2022 00:02:38 +0000 (17:02 -0700)]
Vectorized version of string last aggregator (#12493)
* Vectorized version of string last aggregator
* Updating string last and adding testcases
* Updating code and adding testcases for serializable pairs
* Addressing review comments
Rohan Garg [Mon, 9 May 2022 17:40:17 +0000 (23:10 +0530)]
Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484)
* Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation
* fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation
* Document vectorized dimension
Atul Mohan [Thu, 5 May 2022 22:31:21 +0000 (15:31 -0700)]
Add daily stats to console (#12329)
Vadim Ogievetsky [Thu, 5 May 2022 22:06:59 +0000 (15:06 -0700)]
Web console: add a button to get out of restricted mode, make capability detection more robust (#12503)
* allow unrestrict
* update tests
Victoria Lim [Tue, 3 May 2022 23:22:25 +0000 (16:22 -0700)]
Update automatic compaction docs with consistent terminology (#12416)
* specify automatic compaction where applicable
* Apply suggestions from code review
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* update for style and consistency
* implement suggested feedback
* remove duplicate example
* Apply suggestions from code review
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/ingestion/compaction.md
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/operations/api-reference.md
* update .spelling
* Adopt review suggestions
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Naya Chen [Tue, 3 May 2022 19:25:51 +0000 (12:25 -0700)]
add aws-java-sdk-sts to aws-common classpath (#12482)
Fixes #11303
WebIdentityTokenProvider in the defaultAWSCredentialsProviderChain can not actually be used because the aws-java-sdk-sts jar is not in the classpath of S3 extension at runtime, since each extension has its own classpath. This results in the inability to assume STS role before generating authentication token.
The error message from getCredentials() is:
"Unable to load credentials from WebIdentityTokenCredentialsProvider: To use assume role profiles the aws-java-sdk-sts module must be on the class path"
This PR will fix multiple authentication modules that are dependent on the WebIdentityTokenProvider, including AWS IAM based RDS authentication and S3 authentication.
Vadim Ogievetsky [Tue, 3 May 2022 19:08:08 +0000 (12:08 -0700)]
Web console: Misc table fixes (#12489)
* Misc table fixes
* extract default className
* table spacing updates
* fix e2e action selector
* try more times
* make the web console exist again
zachjsh [Tue, 3 May 2022 08:00:36 +0000 (04:00 -0400)]
Fix broken ForkingTaskRunnerTest (#12499)
A recent commit broke this test. This pr fixes the test.
Rocky Chen [Tue, 3 May 2022 03:47:25 +0000 (20:47 -0700)]
Add a metric for task duration in the pending queue (#12492)
This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner.
An example of the metric:
```
2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"}
```
------------------------------------------
Key changed/added classes in this PR
Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner.
Update related factory classes and tests.
Nishant Bangarwa [Mon, 2 May 2022 16:43:19 +0000 (09:43 -0700)]
Update maven assembly plugin for druid-benchmarks (#12487)
Lucas Capistrant [Mon, 2 May 2022 13:40:44 +0000 (08:40 -0500)]
Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030)
* Add authentication call before cleaning up intermediate files in hadoop ingestions
* fix checkstyle
* remove debug log
aggarwalakshay [Sun, 1 May 2022 14:45:58 +0000 (07:45 -0700)]
Upgrade dependency-check-maven to 7.0.4 (#12441)
317brian [Sun, 1 May 2022 14:44:31 +0000 (07:44 -0700)]
docs: fix typo (#12494)
MC-JY [Sun, 1 May 2022 14:43:11 +0000 (22:43 +0800)]
Improve build performance of modules (#12486)
* improve build performance of modules
* improve build performance of modules
* Update pom.xml
* improve build performance of modules
Tejaswini Bandlamudi [Sun, 1 May 2022 05:56:16 +0000 (11:26 +0530)]
Improve error messages when URI points to a file that doesn't exist (#12490)
Gian Merlino [Fri, 29 Apr 2022 06:21:13 +0000 (23:21 -0700)]
GroupBy: Reduce allocations by reusing entry and key holders. (#12474)
* GroupBy: Reduce allocations by reusing entry and key holders.
Two main changes:
1) Reuse Entry objects returned by various implementations of
Grouper.iterator.
2) Reuse key objects contained within those Entry objects.
This is allowed by the contract, which states that entries must be
processed and immediately discarded. However, not all call sites
respected this, so this patch also updates those call sites.
One particularly sneaky way that the old code retained entries too long
is due to Guava's MergingIterator and CombiningIterator. Internally,
these both advance to the next value prior to returning the current
value. So, this patch addresses that in two ways:
1) For merging, we have our own implementation MergeIterator already,
although it had the same problem. So, this patch updates our
implementation to return the current item prior to advancing to the
next item. It also adds a forbidden-api entry to ensure that this
safer implementation is used instead of Guava's.
2) For combining, we address the problem in a different way: by copying
the key when creating the new, combined entry.
* Attempt to fix test.
* Remove unused import.
Charles Smith [Thu, 28 Apr 2022 23:36:54 +0000 (16:36 -0700)]
remove arbitrary granularity spec from docs (#12460)
* remove arbitrary granularity spec from docs
* Update docs/ingestion/ingestion-spec.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Frank Chen [Thu, 28 Apr 2022 02:20:16 +0000 (10:20 +0800)]
Improve exception message for native binary operators (#12335)
* Improve exception message
* Update message
Gian Merlino [Wed, 27 Apr 2022 21:20:35 +0000 (14:20 -0700)]
DimensionRangeShardSpec speed boost. (#12477)
* DimensionRangeShardSpec speed boost.
Calling isEmpty() and equals() on RangeSets is expensive, because these
fall back on default implementations that call size(). And size() is
_also_ a default implementation that iterates the entire collection.
* Fix and test from code review.
Gian Merlino [Wed, 27 Apr 2022 21:17:26 +0000 (14:17 -0700)]
Reduce allocations due to Jackson serialization. (#12468)
* Reduce allocations due to Jackson serialization.
This patch attacks two sources of allocations during Jackson
serialization:
1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
DefaultSerializerProvider instance for each call. It has lots of
fields and creates pressure on the garbage collector. So, this patch
adds helper functions in JacksonUtils that enable reuse of
SerializerProvider objects and updates various call sites to make
use of this.
2) GroupByQueryToolChest copies the ObjectMapper for every query to
install a special module that supports backwards compatibility with
map-based rows. This isn't needed if resultAsArray is set and
all servers are running Druid 0.16.0 or later. This release was a
while ago. So, this patch disables backwards compatibility by default,
which eliminates the need to copy the heavyweight ObjectMapper. The
patch also introduces a configuration option that allows admins to
explicitly enable backwards compatibility.
* Add test.
* Update additional call sites and add to forbidden APIs.
Gian Merlino [Wed, 27 Apr 2022 21:17:07 +0000 (14:17 -0700)]
SQL: Create millisecond precision timestamp literals. (#12407)
* SQL: Create millisecond precision timestamp literals.
Fixes a bug where implicit casts of strings to timestamps would use seconds
precision rather than milliseconds. The new test case
testCountStarWithBetweenTimeFilterUsingMillisecondsInStringLiterals
exercises this.
* Update sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
Co-authored-by: Frank Chen <frankchen@apache.org>
* Correct precision handling.
- Set default precision to 3 (millis) for things involving timestamps.
- Respect precision specified in types when available.
* Silence, checkstyle.
Co-authored-by: Frank Chen <frankchen@apache.org>
Gian Merlino [Wed, 27 Apr 2022 18:18:40 +0000 (11:18 -0700)]
JvmMonitor: Handle more generation and collector scenarios. (#12469)
* JvmMonitor: Handle more generation and collector scenarios.
ZGC on Java 11 only has a generation 1 (there is no 0). This causes
a NullPointerException when trying to extract the spacesCount for
generation 0. In addition, ZGC on Java 15 has a collector number 2
but no spaces in generation 2, which breaks the assumption that
collectors always have same-numbered spaces.
This patch adjusts things to be more robust, enabling the JvmMonitor
to work properly for ZGC on both Java 11 and 15.
* Test adjustments.
* Improve surefire arglines.
* Need a placeholder
Gian Merlino [Wed, 27 Apr 2022 17:52:20 +0000 (10:52 -0700)]
For the various Yielder objects, don't create new Yielders and instead mutate state. (#12475)
Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
Abhishek Agarwal [Wed, 27 Apr 2022 08:58:20 +0000 (14:28 +0530)]
Bump up the versions (#12480)
Adarsh Sanjeev [Wed, 27 Apr 2022 06:55:49 +0000 (12:25 +0530)]
Validate select columns for insert statement (#12431)
Unnamed columns in the select part of insert SQL statements currently create a table with the column name such as "EXPR$3". This PR adds a check for this.
somu-imply [Tue, 26 Apr 2022 18:33:08 +0000 (11:33 -0700)]
Vectorize numeric latest aggregators (#12439)
* Vectorizing Latest aggregator Part 1
* Updating benchmark tests
* Changing appropriate logic for vectors for null handling
* Introducing an abstract class and moving the commonalities there
* Adding vectorization for StringLast aggregator (initial version)
* Updated bufferized version of numeric aggregators
* Adding some javadocs
* Making sure this PR vectorizes numeric latest agg only
* Adding another benchmarking test
* Fixing intellij inspections
* Adding tests for double
* Adding test cases for long and float
* Updating testcases
* Checkstyle oops..
* One tiny change in test case
* Fixing spotbug and rhs not being used
zachjsh [Tue, 26 Apr 2022 16:44:44 +0000 (12:44 -0400)]
Worker level task metrics (#12446)
* * fix metric name inconsistency
* * add task slot metrics for middle managers
* * add new WorkerTaskCountStatsMonitor to report task count metrics
from worker
* * more stuff
* * remove unused variable
* * more stuff
* * add javadocs
* * fix checkstyle
* * fix hadoop test failure
* * cleanup
* * add more code coverage in tests
* * fix test failure
* * add docs
* * increase code coverage
* * fix spelling
* * fix failing tests
* * remove dead code
* * fix spelling
Will Xu [Tue, 26 Apr 2022 14:44:40 +0000 (07:44 -0700)]
Enable Arm builds (#12451)
This PR enables ARM builds on Travis. I've ported over the changes from @martin-g on reducing heap requirements for some of the tests to ensure they run well on Travis arm instances.
Rohan Garg [Mon, 25 Apr 2022 15:18:58 +0000 (20:48 +0530)]
Convert simple min/max SQL queries on __time to timeBoundary queries (#12472)
* Support array based results in timeBoundary query
* Fix bug with query interval in timeBoundary
* Convert min(__time) and max(__time) SQL queries to timeBoundary
* Add tests for timeBoundary backed SQL queries
* Fix query plans for existing tests
* fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary
* fixup! Add tests for timeBoundary backed SQL queries
* fixup! Fix bug with query interval in timeBoundary
Peter Marshall [Mon, 25 Apr 2022 13:44:17 +0000 (14:44 +0100)]
Update native-batch.md (#12478)
Fixed indent on the Granularity Spec section and removed some superfluous tabbings.
Apoorv Gupta [Sat, 23 Apr 2022 03:35:08 +0000 (20:35 -0700)]
Fix formatting in stats.md (#12470)
* Fix formatting in stats.md
* Update stats.md
* Update docs/development/extensions-core/stats.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/development/extensions-core/stats.md
Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
Didip Kerabat [Fri, 22 Apr 2022 18:44:05 +0000 (11:44 -0700)]
Metrics for shenandoah based on this source code: https://github.com/openjdk/jdk/blob/
554caf33a01ac9ca2e3e9170557e8348750f3971/src/hotspot/share/gc/shenandoah/shenandoahMonitoringSupport.cpp#L65 (#12369)
Co-authored-by: Didip Kerabat <didip@apple.com>
Gian Merlino [Fri, 22 Apr 2022 18:22:34 +0000 (11:22 -0700)]
QueryScheduler: Log per-query message at DEBUG level. (#12467)
We generally want to avoid having any routine per-query messages at
INFO level, because they pollute logs.
Victoria Lim [Fri, 22 Apr 2022 02:28:49 +0000 (19:28 -0700)]
stringFirst and stringLast supported in ingestion (#12466)
Victoria Lim [Thu, 21 Apr 2022 18:19:39 +0000 (11:19 -0700)]
updated docs for sql query context (#12406)
Tejaswini Bandlamudi [Thu, 21 Apr 2022 15:48:20 +0000 (21:18 +0530)]
Supress CVE 2022 26612 (#12463)
* supress CVE-2022-26612
* adding packageUrl
* suppressing CVE-2022-26612
* adding packageUrl
* moving to hadoop section
Jihoon Son [Thu, 21 Apr 2022 08:51:16 +0000 (01:51 -0700)]
Add support for authorizing query context params (#12396)
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.
Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause
1) a bad UX if the context param is not matured yet or
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.
This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.
{
"resourceAction" : {
"resource" : {
"name" : "maxSubqueryRows",
"type" : "QUERY_CONTEXT"
},
"action" : "WRITE"
},
"resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.
When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,
HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.
The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
Rohan Garg [Thu, 21 Apr 2022 04:14:55 +0000 (09:44 +0530)]
Emit vectorized metric dimension by default (#12464)
Tejaswini Bandlamudi [Thu, 21 Apr 2022 03:52:35 +0000 (09:22 +0530)]
Fix GCS based ingestion if bucket name contains underscores (#12445)
GCP allows bucket names to contain underscores. When a location in such a bucket
is mapped to `java.net.URI`, `URI.getHost()` returns null. `URI.getHost()` is used as
the bucket name in `CloudObjectLocation`, leading to an NPE.
This commit uses `URI.getAuthority()` as the bucket name if `URI.getHost()` is null.
PJ Fanning [Thu, 21 Apr 2022 02:12:19 +0000 (04:12 +0200)]
update httpclient due to cve (#12422)
https://github.com/apache/druid/issues/12421
PJ Fanning [Thu, 21 Apr 2022 02:11:55 +0000 (04:11 +0200)]
issue-12426 upgrade k8s client due to cve (#12427)
* issue-12426 upgrade k8s client due to cve
* compile issues
* try to fix license check
somu-imply [Wed, 20 Apr 2022 14:56:09 +0000 (07:56 -0700)]
Updating an error msg (#12450)
* Updating an error msg
* Added an extra [] so removing it
Jihoon Son [Tue, 19 Apr 2022 03:00:06 +0000 (20:00 -0700)]
Suppress CVE-2021-43138 (#12437)
* Suppress CVE-2021-43138
* revert netty 3.10.5.Final
jacobtolar [Tue, 19 Apr 2022 02:36:19 +0000 (21:36 -0500)]
Document expression post-aggregators (#11896)
* Document expression post-aggregators
* Update docs/querying/post-aggregations.md
Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
Frank Chen [Tue, 19 Apr 2022 02:25:17 +0000 (10:25 +0800)]
Remove h2 database from dependency (#12447)
TSFenwick [Tue, 19 Apr 2022 02:24:46 +0000 (19:24 -0700)]
Document running it tests from intellij IDE (#12440)
* document running IT tests in intellij
* clean up unnecessary changes
* address comments
Victoria Lim [Mon, 18 Apr 2022 16:28:32 +0000 (09:28 -0700)]
recommendation for comparing strings and numbers (#12442)
Peter Marshall [Mon, 18 Apr 2022 09:00:21 +0000 (10:00 +0100)]
Docs - query caching (#11584)
* Update caching.md
Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900
Update caching.md
A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300
* Update caching.md
Typos
* Amendments on the segment cache
Significant updates on content around the segment cache, pull process, and in-memory cache
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update basic-cluster-tuning.md
typo
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Whole-query caching update
Made more succinct and removed specific config to change.
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Charles Smith [Mon, 18 Apr 2022 08:53:50 +0000 (01:53 -0700)]
Fixes a small typo in ingestion spec doc (#12143)
* small typo
* Update docs/ingestion/ingestion-spec.md
Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: sthetland <steve.hetland@imply.io>
Rohan Garg [Mon, 18 Apr 2022 07:44:02 +0000 (13:14 +0530)]
Fail fast incase a lookup load fails (#12397)
Currently while loading a lookup for the first time, loading threads blocks
for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs`
is long (like 10 minutes), such blocking can slow down the loading of other lookups.
This commit allows the thread to progress as soon as the loading of the lookup fails.
Peter Marshall [Mon, 18 Apr 2022 05:41:39 +0000 (06:41 +0100)]
Docs - added another common config property to tuningConfig (#11935)
* Update ingestion-spec.md
Added indexSpecForIntermediatePersists as a common configuration property.
* Update ingestion-spec.md
Amended to remove "below" and add link to the table.
* Update ingestion-spec.md
Removed passive.
Alexandre BERTHIOT [Mon, 18 Apr 2022 05:25:09 +0000 (05:25 +0000)]
Update tutorial-compaction.md to change an unclear statement (#11988)
* Update tutorial-compaction.md
Unclear statement on the explanation of tuningConfig section.
* Update docs/tutorials/tutorial-compaction.md
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Maytas Monsereenusorn [Fri, 15 Apr 2022 22:47:47 +0000 (15:47 -0700)]
Fix bug in auto compaction preserveExistingMetrics feature (#12438)
* fix bug
* fix test
* fix IT
Agustin Gonzalez [Fri, 15 Apr 2022 16:08:06 +0000 (09:08 -0700)]
Make tombstones ingestible by having them return an empty result set. (#12392)
* Make tombstones ingestible by having them return an empty result set.
* Spotbug
* Coverage
* Coverage
* Remove unnecessary exception (checkstyle)
* Fix integration test and add one more to test dropExisting set to false over tombstones
* Force dropExisting to true in auto-compaction when the interval contains only tombstones
* Checkstyle, fix unit test
* Changed flag by mistake, fixing it
* Remove method from interface since this method is specific to only DruidSegmentInputentity
* Fix typo
* Adapt to latest code
* Update comments when only tombstones to compact
* Move empty iterator to a new DruidTombstoneSegmentReader
* Code review feedback
* Checkstyle
* Review feedback
* Coverage
hqx871 [Fri, 15 Apr 2022 16:07:06 +0000 (00:07 +0800)]
Use binary search to improve DimensionRangeShardSpec lookup (#12417)
If there are many shards, mapper of IndexGeneratorJob seems to spend a lot of time in calling
DimensionRangeShardSpec.isInChunk to lookup target shard. This can be significantly improved
by using binary search instead of comparing an input row to every shardSpec.
Changes:
* Add `BaseDimensionRangeShardSpec` which provides a binary-search-based
implementation for `createLookup`
* `DimensionRangeShardSpec`, `SingleDimensionShardSpec`, and
`DimensionRangeBucketShardSpec` now extend `BaseDimensionRangeShardSpec`
somu-imply [Fri, 15 Apr 2022 04:59:17 +0000 (21:59 -0700)]
Handling planning with alias for time for group by and order by (#12418)
An outer scan query, that requires ordering on a column, should be considered an invalid query.
Vadim Ogievetsky [Thu, 14 Apr 2022 07:23:06 +0000 (00:23 -0700)]
good stuff (#12435)
Clint Wylie [Wed, 13 Apr 2022 23:34:01 +0000 (16:34 -0700)]
fix issue with boolean expression input (#12429)
Maytas Monsereenusorn [Wed, 13 Apr 2022 20:27:00 +0000 (13:27 -0700)]
Add docs to metric spec for auto compaction (#12415)
* add docs
* Update docs/configuration/index.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Update index.md
* Update docs/configuration/index.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Jihoon Son [Wed, 13 Apr 2022 19:43:11 +0000 (12:43 -0700)]
Fix indexMerger to respect the includeAllDimensions flag (#12428)
* Fix indexMerger to respect flag includeAllDimensions flag; jsonInputFormat should set keepNullColumns if useFieldDiscovery is set
* address comments
Katya Macedo [Wed, 13 Apr 2022 09:59:56 +0000 (04:59 -0500)]
Add Kinesis ListShards permission (#12387)
* add Kinesis permission
* List Kinesis IAM permissions
* Adopt review suggestions
* Fix merge conflicts
Vadim Ogievetsky [Wed, 13 Apr 2022 05:20:28 +0000 (22:20 -0700)]
Web console: Misc fixes and improvements (#12361)
* Misc fixes
* pad column numbers
* make shard_type filterable
Parag Jain [Mon, 11 Apr 2022 15:35:24 +0000 (21:05 +0530)]
Copy of #11309 with fixes (#12402)
* Optionally load segment index files into page cache on bootstrap and new segment download
* Fix unit test failure
* Fix test case
* fix spelling
* fix spelling
* fix test and test coverage issues
Co-authored-by: Jian Wang <wjhypo@gmail.com>
Tiffany Yeh [Mon, 11 Apr 2022 14:58:09 +0000 (10:58 -0400)]
Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248)
Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.
The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: https://github.com/zulu-openjdk/zulu-openjdk/blob/
be45d20302e42df5aa95d2de078bb5e4214f5dba/centos/8u282-8.52.0.23/Dockerfile.
Jihoon Son [Sat, 9 Apr 2022 10:08:26 +0000 (03:08 -0700)]
Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724) (#12410)
* Bump PostgreSQL JDBC driver to 42.3.3 (CVE-2022-21724)
* update license file
Adarsh Sanjeev [Sat, 9 Apr 2022 06:51:40 +0000 (12:21 +0530)]
Make error messages for insert statements consistent with select statements (#12414)
For a query like
INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement
INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL
returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this.
Add EOF to end of Druid SQL Insert statements
Rename SQL Insert statements in the parser to reflect the behaviour change
Maytas Monsereenusorn [Sat, 9 Apr 2022 03:14:36 +0000 (20:14 -0700)]
Improve metrics for Auto Compaction (#12413)
* add impl
* add docs
* fix
Maytas Monsereenusorn [Fri, 8 Apr 2022 18:02:02 +0000 (11:02 -0700)]
Add a new flag for ingestion to preserve existing metrics (#12185)
* add impl
* add impl
* fix checkstyle
* add impl
* add unit test
* fix stuff
* fix stuff
* fix stuff
* add unit test
* add more unit tests
* add more unit tests
* add IT
* add IT
* add IT
* add IT
* add ITs
* address comments
* fix test
* fix test
* fix test
* address comments
* address comments
* address comments
* fix conflict
* fix checkstyle
* address comments
* fix test
* fix checkstyle
* fix test
* fix test
* fix IT
mark-imply [Fri, 8 Apr 2022 12:31:54 +0000 (06:31 -0600)]
Update index.md (#12390)
Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.
Didip Kerabat [Fri, 8 Apr 2022 11:13:17 +0000 (04:13 -0700)]
Fix the other 2 python scripts that generates license. (#12340)
Fixes YAML.load_all issues on two of the Python scripts that generate license.
The broken Python files interfere with some of the Maven tasks.
mark-imply [Fri, 8 Apr 2022 09:59:55 +0000 (03:59 -0600)]
Update basic-cluster-tuning.md (#12412)
Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.
317brian [Thu, 7 Apr 2022 23:22:56 +0000 (16:22 -0700)]
fix(docs): clarify what s3 permissions are needed based on the access management type (#12405)
* fix(docs): clarify what s3 permissions are needed based on the permissions model
* fix typo
* Update docs/development/extensions-core/s3.md
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
dependabot[bot] [Thu, 7 Apr 2022 10:08:39 +0000 (03:08 -0700)]
Bump minimist from 1.2.5 to 1.2.6 in /website (#12400)
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)
---
updated-dependencies:
- dependency-name: minimist
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
dependabot[bot] [Wed, 6 Apr 2022 23:55:14 +0000 (16:55 -0700)]
Bump minimist from 1.2.5 to 1.2.6 in /web-console (#12401)
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)
---
updated-dependencies:
- dependency-name: minimist
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Vadim Ogievetsky [Wed, 6 Apr 2022 22:27:44 +0000 (15:27 -0700)]
clean up some bp3 classes (#12403)
Victoria Lim [Wed, 6 Apr 2022 22:17:15 +0000 (15:17 -0700)]
Document data format and example for featureSpec (#12394)
* add data format and example for featureSpec
* add second feature in example
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
317brian [Wed, 6 Apr 2022 16:24:37 +0000 (09:24 -0700)]
docs(fix): add clarity around granularitySpec (#12362)
* fix: add clarify around granularitySpec
* fix spacing
* Update docs/ingestion/compaction.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Victoria Lim [Tue, 5 Apr 2022 16:15:42 +0000 (09:15 -0700)]
Document config for ingesting null columns (#12389)
* config for ingesting null columns
* add link
* edit .spelling
* what happens if storeEmptyColumns is disabled
aggarwalakshay [Tue, 5 Apr 2022 06:56:15 +0000 (23:56 -0700)]
upgrade surefire 3.0.0-M6 (#12395)
* upgrade surefire 3.0.0-M6
* increasing memory
Paul Rogers [Mon, 4 Apr 2022 22:11:32 +0000 (15:11 -0700)]
Method to specify eternity in the scan query builder (#12223)
* Method to specify eternity in the scan query builder
* Fix checkstyle issue
* Renamed eterity() to eternityInterval()
* Minor fixes
John Gozde [Mon, 4 Apr 2022 17:34:22 +0000 (11:34 -0600)]
Blueprint 4 (#12391)
* Update blueprint dependencies & LICENSES
* Switch to bp4 namespace; use bp-ns variable in overrides
* Add webpack alias for colors.scss
* Snapshots
* Update selectors in e2e tests