ZhangJian He [Wed, 6 Apr 2022 15:37:28 +0000 (23:37 +0800)]
[BUILD] fix master branch broken http-core license check
### Changes
- the bkctl and bk-server has different `http-core` version
- unify the `http-core` version to 4.4.15
Reviewers: Andrey Yegorov <None>, Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>
This closes #3183 from Shoothzj/fix-broken-license-check
Nicolò Boschi [Wed, 6 Apr 2022 09:17:07 +0000 (11:17 +0200)]
[website] add CI checks to validate the website (#3164)
dependabot[bot] [Wed, 6 Apr 2022 09:07:50 +0000 (11:07 +0200)]
Bump minimist from 1.2.5 to 1.2.6 in /site3/website (#3179)
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)
---
updated-dependencies:
- dependency-name: minimist
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
dependabot[bot] [Wed, 6 Apr 2022 09:06:54 +0000 (11:06 +0200)]
Bump node-forge from 1.2.1 to 1.3.1 in /site3/website (#3180)
Bumps [node-forge](https://github.com/digitalbazaar/forge) from 1.2.1 to 1.3.1.
- [Release notes](https://github.com/digitalbazaar/forge/releases)
- [Changelog](https://github.com/digitalbazaar/forge/blob/main/CHANGELOG.md)
- [Commits](https://github.com/digitalbazaar/forge/compare/v1.2.1...v1.3.1)
---
updated-dependencies:
- dependency-name: node-forge
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
ZhangJian He [Wed, 6 Apr 2022 08:13:13 +0000 (16:13 +0800)]
Use netty maxDirectMemory instead of DirectMemoryUtils
### Motivation
Our `DirectMemoryUtils` has huge limit, it can't work well with other jvm. The Netty `PlatformDependent.maxDirectMemory();` is more generic.
### Changes
Use `PlatformDependent.maxDirectMemory();` instead of `DirectMemoryUtils`
Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Nicolò Boschi <boschi1997@gmail.com>
This closes #2989 from Shoothzj/direct-memory
ZhangJian He [Wed, 6 Apr 2022 05:46:17 +0000 (13:46 +0800)]
[build] support apple m1 build (#3175)
Nicolò Boschi [Sun, 3 Apr 2022 16:09:49 +0000 (18:09 +0200)]
[netty] remove no longer used properties io.netty.recycler.linkCapacity and io.netty.recycler.maxCapacity.default (#3172)
* [netty] remove no longer used properties io.netty.recycler.linkCapacity and io.netty.recycler.maxCapacity.default
* gradle
ZhangJian He [Sun, 3 Apr 2022 08:00:21 +0000 (16:00 +0800)]
Bump netty version to 4.1.75.Final, grpc to 1.45.1 (#3163)
ZhangJian He [Sat, 2 Apr 2022 19:22:13 +0000 (03:22 +0800)]
[build] Complement missing maven plugin version (#3166)
ZhangJian He [Sat, 2 Apr 2022 19:21:18 +0000 (03:21 +0800)]
[security] Bump bc fips version from 1.0.2.1 to 1.0.2.3 (#3087)
Nicolò Boschi [Fri, 1 Apr 2022 19:54:24 +0000 (21:54 +0200)]
[WEBSITE] Update current stable version to 4.14.4
### Motivation
The current stable version is 4.14.4 but in the website is still 4.11.1
### Changes
* Edit variables to point to 4.14.4
* Cleanup staging website "download" page
Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>
This closes #3154 from nicoloboschi/update-stable-version
Nicolò Boschi [Fri, 1 Apr 2022 07:37:57 +0000 (09:37 +0200)]
[build] Fix various spotbugs warnings (#3160)
Hang Chen [Thu, 31 Mar 2022 13:39:47 +0000 (21:39 +0800)]
Fix region/rack aware placement police replace bookie bug (#2642)
Nicolò Boschi [Thu, 31 Mar 2022 11:45:12 +0000 (13:45 +0200)]
[website] Update committers page
### Changes
Update committers info
Reviewers: Andrey Yegorov <None>
This closes #3161 from nicoloboschi/update-committers-nicoloboschi
Hang Chen [Thu, 31 Mar 2022 07:08:13 +0000 (15:08 +0800)]
Revert rocksdb compaction on checkpoint to reduce cpu intensive (#3144)
Hang Chen [Thu, 31 Mar 2022 05:42:34 +0000 (13:42 +0800)]
catch onBookieRackChange exception (#3060)
### Motivation
When we update the bookie rack info, it will use all the bookie list to update rack topology. However If one bookie update failed and throw exception out, it will throw the exception out and the remains bookie info won't be updated into the rack topology, which will affect the ledger ensemble selection.
### Changes
Catch the bookie topology update exception to ensure the remaining bookies' info can be updated into the rack topology.
ZhangJian He [Wed, 30 Mar 2022 23:10:25 +0000 (07:10 +0800)]
use mockito.any instead of deprecated mockito.anyObject
### Changes
use mockito.any instead of deprecated mockito.anyObject
Reviewers: Andrey Yegorov <None>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3152 from Shoothzj/use-mockito-any-instead-of-anyObject
Enrico Olivelli [Wed, 30 Mar 2022 10:41:33 +0000 (12:41 +0200)]
Upgrade ZooKeeper to 3.8.0 (#3145)
ZhangJian He [Tue, 29 Mar 2022 06:29:00 +0000 (14:29 +0800)]
fix duplicate typeline for prometheus type (#3137)
Andrey Yegorov [Mon, 28 Mar 2022 23:01:51 +0000 (16:01 -0700)]
[maven-release-plugin] prepare for next development iteration
Andrey Yegorov [Mon, 28 Mar 2022 23:01:48 +0000 (16:01 -0700)]
[maven-release-plugin] prepare branch branch-4.15
Rajan Dhabalia [Mon, 28 Mar 2022 22:54:35 +0000 (15:54 -0700)]
Fix NPE while reordering read-sequence for local-bookie ensemble policy
### Motivation
When Bookie sanity and autoreovery use the same conf file which has flag `reorderReadSequenceEnabled=true` then bookie-sanity command throws NPE as `LocalBookieEnsemblePlacementPolicy::reorderReadLACSequence` returns null writesets which causes the sanity failure.
```
00:46:46.202 [BookKeeperClientWorker-OrderedExecutor-11-0] ERROR o.a.b.common.util.SafeRunnable - Unexpected throwable caught
java.lang.NullPointerException: null
at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:399)
at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.read(PendingReadOp.java:385)
at org.apache.bookkeeper.client.PendingReadOp.initiate(PendingReadOp.java:529)
at org.apache.bookkeeper.client.LedgerRecoveryOp.doRecoveryRead(LedgerRecoveryOp.java:148)
at org.apache.bookkeeper.client.LedgerRecoveryOp.access$000(LedgerRecoveryOp.java:37)
at org.apache.bookkeeper.client.LedgerRecoveryOp$1.readLastConfirmedDataComplete(LedgerRecoveryOp.java:109)
at org.apache.bookkeeper.client.ReadLastConfirmedOp.readEntryComplete(ReadLastConfirmedOp.java:135)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion$1.readEntryComplete(PerChannelBookieClient.java:1829)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.handleReadResponse(PerChannelBookieClient.java:1910)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.handleV3Response(PerChannelBookieClient.java:1885)
at org.apache.bookkeeper.proto.PerChannelBookieClient$3.safeRun(PerChannelBookieClient.java:1446)
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
```
### Modification
Fix NPE for local ensemble policy while reading entry with `reorderReadSequenceEnabled` flag enabled.
Reviewers: Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3127 from rdhabalia/repl_seq
赵延 [Mon, 28 Mar 2022 22:23:14 +0000 (06:23 +0800)]
Fix doc code problem.
fix docs code demo problem.
Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Andrey Yegorov <None>
This closes #3142 from horizonzy/fix-docs-code-problem
Enrico Olivelli [Mon, 28 Mar 2022 15:53:04 +0000 (17:53 +0200)]
BookieAutoRecoveryTest.testEmptyLedgerLosesQuorumEventually fix flaky test, ensure that the Auditor is alive (#3149)
Nicolò Boschi [Sun, 27 Mar 2022 12:37:16 +0000 (14:37 +0200)]
[security] Upgrade jackson-databind to get rid of CVE-2020-36518 (#3140)
* [security] Upgrade jackson-databind to get rid of CVE-2020-36518
Hang Chen [Sun, 27 Mar 2022 09:53:36 +0000 (17:53 +0800)]
Upgrade rocksdb to 6.29.4.1 (#3143)
Andrey Yegorov [Thu, 24 Mar 2022 16:15:09 +0000 (09:15 -0700)]
Bringing back maven build (#3130)
* Revert "[build] remove Maven POM files (#3009)"
This reverts commit
e089b51ab5e1cf5f061c81463d27f33a21198271.
* rxjava: add maven dependency
(cherry picked from commit
ac73541ce79953141a08c60642cd39c7984ade1e)
* Bring guava to the same version as gradle
* ignore deprecation warnings in tests
* mockito-inline, as in gradle + suppress warnings
* suppressed warning
* Exclude site3/ from RAT check
* CI to use (mostly) maven
* OWASP check with maven
* Up'd versions to match gradle, corrected license files: looks like gradle build didn't force versions consistently
* Removed current-version-image to match https://github.com/apache/bookkeeper/pull/3027
* Shading patetrn to match gradle
* Fixed/suppressed CVEs
* Attempt to fix failing tests in CompactionByEntriesWithMetadataCacheTest
Co-authored-by: lushiji <lushiji@didiglobal.com>
congbo [Thu, 24 Mar 2022 08:22:02 +0000 (16:22 +0800)]
PendingReadOp: Fix ledgerEntryImpl reuse problem (#3110)
Kezhu Wang [Wed, 23 Mar 2022 08:01:53 +0000 (16:01 +0800)]
Set BOOKIE_HTTP_PORT to make it optional in docker run (#3096)
Fixes #3075.
Lari Hotari [Wed, 23 Mar 2022 08:01:00 +0000 (10:01 +0200)]
Issue #3105: Optimize OrderedExecutor performance by using GrowableArrayBlockingQueue (#3108)
Fixes #3105
Nicolò Boschi [Wed, 23 Mar 2022 08:00:34 +0000 (09:00 +0100)]
Remove unused site2 directory (#3116)
Andras Beni [Mon, 21 Mar 2022 15:06:01 +0000 (16:06 +0100)]
Log NoLedgerException on debug level (#3117)
NoLedgerException does not signify an error in the Bookie that needs
to be fixed. Instead it is - at most - a user error that the user is
notified about via the status code ENOLEDGER.
Logging this problem at error level introduces an odd difference
between the behavior of readLac using v2 versus v3 protocol version.
In the former case ReadEntryProcessor logs the same problem at debug
level. As a result changing protocol version appers to be introducing
an error.
Nicolò Boschi [Fri, 18 Mar 2022 16:40:43 +0000 (17:40 +0100)]
[WEBSITE] Staging website: images are not visible
### Motivation
In the staging website image links are broken
### Changes
* Fixed the links with the correct syntax (I checked every `img` reference)
Reviewers: Andrey Yegorov <None>
This closes #3126 from nicoloboschi/website/fix-imgs
Nicolò Boschi [Fri, 18 Mar 2022 10:43:42 +0000 (11:43 +0100)]
[website] fix GITHUB_TOKEN used for deployment (#3125)
Nicolò Boschi [Fri, 18 Mar 2022 08:59:11 +0000 (09:59 +0100)]
[website] use GH_TOKEN to push to asf-site and asf-staging branches (#3124)
Nicolò Boschi [Thu, 17 Mar 2022 16:27:50 +0000 (17:27 +0100)]
[website] Fix deploy push to git - use ssh (#3123)
Nicolò Boschi [Thu, 17 Mar 2022 16:09:04 +0000 (17:09 +0100)]
[website] fix deploy staging script (#3122)
Nicolò Boschi [Thu, 17 Mar 2022 16:08:09 +0000 (17:08 +0100)]
[website] fix current site deployment (#3120)
Nicolò Boschi [Thu, 17 Mar 2022 11:40:36 +0000 (12:40 +0100)]
[website] deploy staging after every change (#3118)
Enrico Olivelli [Wed, 16 Mar 2022 18:15:49 +0000 (19:15 +0100)]
Gradle build: add mavenLocal() repository
### Motivation
In Gradle you need to add `mavenLocal()` repository if you want to test local versions of third party libraries built with Maven (like ZooKeeper, Curator...)
### Changes
Add `mavenLocal()` repository
Reviewers: Matteo Merli <mmerli@apache.org>, Nicolò Boschi <boschi1997@gmail.com>, Andrey Yegorov <None>
This closes #3114 from eolivelli/impl/gradle-maven-local
Nicolò Boschi [Wed, 16 Mar 2022 16:06:24 +0000 (17:06 +0100)]
[website] New Website built with Docusaurus v2 (#3088)
StevenLuMT [Wed, 16 Mar 2022 10:10:59 +0000 (18:10 +0800)]
add stats for throttled-write (#3102)
Descriptions of the changes in this PR:
### Motivation
method:triggerFlushAndAddEntry costing time is a changing,so add a stats metric focus on this method
### Changes
1.the previous counter metrics(throttledWriteRequests) are retained
2.add throttledWriteStats to record cost time and count for the method(triggerFlushAndAddEntry)
ZhangJian He [Tue, 15 Mar 2022 23:42:56 +0000 (07:42 +0800)]
Bump testcontainers version to 1.16.3
### Motivation
- Bump the testcontainers version to make tests can run on the latest docker on `Mac Intel Chip`
- Bump `docker-java` version to `3.2.13`
Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Andrey Yegorov <None>
This closes #3101 from Shoothzj/bump-test-container-version
ZhangJian He [Tue, 15 Mar 2022 23:32:24 +0000 (07:32 +0800)]
bump lombok from 1.18.20 to 1.18.22 to support java17 compile
### Motivation
- required for compilation on JDK 17
- see https://projectlombok.org/changelog
Reviewers: Andrey Yegorov <None>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3097 from Shoothzj/bump-lombok-1-18-22
Lari Hotari [Tue, 15 Mar 2022 16:42:49 +0000 (18:42 +0200)]
Run protobuf code generation automatically in IntelliJ and fix config (#3107)
LinChen [Tue, 15 Mar 2022 10:55:42 +0000 (18:55 +0800)]
ConcurrentLong map and set": add unit tests for reduce unnecessary expansions (#3092)
LinChen [Tue, 15 Mar 2022 10:28:29 +0000 (18:28 +0800)]
ConcurrentOpenHashSet: fix reduce unnecessary expansions (#3082)
Nicolò Boschi [Mon, 14 Mar 2022 11:10:35 +0000 (12:10 +0100)]
[website] update website every time there is a change (#3090)
Hang Chen [Sun, 13 Mar 2022 12:50:17 +0000 (20:50 +0800)]
fix bkperf message rate limit to 2GB/s (#3100)
Andrey Yegorov [Fri, 11 Mar 2022 23:07:12 +0000 (15:07 -0800)]
Preparing for the release 4.15
Descriptions of the changes in this PR:
Updated py client's version to 4.15
Motivation
Preparing for the release
https://bookkeeper.apache.org/community/release_guide/#change-python-client-version
Reviewers: Enrico Olivelli <eolivelli@gmail.com>
This closes #3095 from dlg99/python_client_rel_4.15
ken [Fri, 11 Mar 2022 20:13:16 +0000 (04:13 +0800)]
fix a metric error in bookieStats
Descriptions of the changes in this PR:
### Motivation
fix a metric error in bookieStats
### Changes
getReadEntryStats().registerFailedValue(entrySize) -> getReadBytesStats().registerFailedValue(entrySize)
Reviewers: Andrey Yegorov <None>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3083 from TakaHiro0208/fix_bookieStats_metric_error
Yang Yang [Fri, 11 Mar 2022 19:57:51 +0000 (03:57 +0800)]
Add a REST API to get or update bookie readOnly state
### Motivation
This PR is a part of the work to improve the process of removing bookies from the cluster. Specifically, it implements the `readOnly` API described in the [mail](http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/202109.mbox/raw/%3CCAJdLeK03g8K0h6swn%3D9yVP1Ze2zHxe8TDobK6a-zpTdABkeQEA%40mail.gmail.com%3E).
### Changes
- Add an REST API at `/api/v1/bookie/state/readonly`
- The `GET` method returns the current `readOnly` status
- The `PUT` method updates the `readOnly` status if needed.
### TODOs
- Update the document once the PR is accepted.
- Update the `BookieStateManager` & `BookieImpl` to persist the information that the state change is triggered by the external API request and do not change the state based on the notification from the dirs monitoring service.
Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None>
This closes #2799 from fantapsody/readonly-api
Nicolò Boschi [Thu, 10 Mar 2022 07:32:58 +0000 (08:32 +0100)]
[website] Fix deploy action (#3089)
Kezhu Wang [Thu, 10 Mar 2022 00:09:25 +0000 (08:09 +0800)]
Fix Journal.ForceWriteThread.forceWriteRequests.put deadlock
Descriptions of the changes in this PR:
### Motivation
`Journal.ForceWriteThread` could deadlock as it is the sole consumer of `Journal.forceWriteRequests` while it send group marker blocking using `BlockingQueue.put`.
This PR try to fix this.
### Changes
* Add testing code to deadlock `Journal.ForceWriteThread` on `forceWriteRequests.put`.
* Send force write group marker non-blocking to avoid deadlock `ForceWriteThread`.
Master Issue: #2948
Reviewers: Andrey Yegorov <None>
This closes #2962 from kezhuw/fix-Journal.ForceWriteThread.forceWriteRequests.put-deadlock
StevenLuMT [Thu, 10 Mar 2022 00:05:59 +0000 (08:05 +0800)]
change rocksdb init: use OptionsUtil
Descriptions of the changes in this PR:
### Motivation
1. some old parameters in rocksDB is not configurable
2. for all the tuning of rocksdb in the future, there is no need to update the code or introduce configuration to bookie
### Changes
1) rocks all old parameter change to be configurable
2) use OptionsUtil to init all params for rocksdb
the old pr #3006 has some rebase error,open a new pr
Reviewers: Andrey Yegorov <None>, LinChen <None>
This closes #3056 from StevenLuMT/master_improveRocksDB
Hang Chen [Thu, 10 Mar 2022 00:04:20 +0000 (08:04 +0800)]
Add throttle for rebuild entryMetadataMap
### Motivation
When a bookie restart, the garbageCollectorThread will rebuild entryMetadataMap from all the entry log files in ledger directory. For normal case, it will extract the EntryLogMetadata from the index in entry log file. However, if there's no index, then fallback to scanning the entry log file.
In user's production environment, the log files without index occupied 4%. The total entry log files is 80000, and the log files without index is 3000. The default entry log file size is 2GB, and the garbageCollectorThread will read 3000 * 2GB = 6TB data without speed limit, which will cause ledger disk IO util runs high for dozens of minutes and affect ledger read and write latency.
### Modification
1. Add read speed rate limiter for scanning entry log file in entryMetadataMap rebuild.
Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>
This closes #2963 from hangc0276/chenhang/add_throttle_for_build_entryMetadataMap
wuYin [Thu, 10 Mar 2022 00:03:16 +0000 (08:03 +0800)]
check all bookies of writeset are writable
### Motivation
#1088 introduced ensemble writable checking before sending requests, but we should check bookies of writeset, instead of the first few bookies in current ensemble.
### Changes
Get the bookies of writeset from ensemble and check writeable.
Related change: https://github.com/apache/bookkeeper/pull/1088/files#diff-1d893bb31553b5e1f55c8301d04ae15f38e0d35f531f9dd22475128b7972ddf9R1108
Reviewers: Andrey Yegorov <None>
This closes #3055 from wuYin/writeset-writable
Hang Chen [Thu, 10 Mar 2022 00:02:00 +0000 (08:02 +0800)]
Add sizeInBytes interface for ConcurrentLong map and set
### Motivation
We provide some concurrent maps and sets for specific usage, and provide size() and capacity() interface for user to get the real item number and the max item number.
However, if user want to monitor how much memory those current maps and set allocated, there is not interface to expose this metric.
### Changes
Add `sizeInBytes()` interface to expose the memory size has been allocated for those concurrent maps and sets.
Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3068 from hangc0276/chenhang/add_sizeInBytes_interface_for_concurrent_maps
Hang Chen [Thu, 10 Mar 2022 00:00:08 +0000 (08:00 +0800)]
Fix publish do not include test jar
### Motivation
When use `./gradlew publishToMavenLocal` command to publish jars to local maven repository, it doesn't include test jars.
### Changes
When publish, including the test jars.
Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, ZhangJian He <shoothzj@gmail.com>, Yong Zhang <zhangyong1025.zy@gmail.com>
This closes #3071 from hangc0276/chenhang/fix_publish_do_not_include_test_jar
Nicolò Boschi [Tue, 8 Mar 2022 17:22:55 +0000 (18:22 +0100)]
[website] restore javadoc generation and automate website deployment (#3081)
* [website] restore javadoc generation and automate website deployment
lin chen [Thu, 3 Mar 2022 17:42:10 +0000 (01:42 +0800)]
support shrink for ConcurrentLong map or set (#3074)
* support shrink for ConcurrentLong map or set
* fix unit test
* check style
* add shrink unit test.
* fix unit test
Nicolò Boschi [Wed, 2 Mar 2022 08:05:11 +0000 (09:05 +0100)]
[ci] Move CI to JDK11 (#3027)
Nicolò Boschi [Tue, 1 Mar 2022 21:04:56 +0000 (22:04 +0100)]
[tests] Replace powermockito usages with mockito-inline - 'tools' submodule (#3077)
lin chen [Sun, 27 Feb 2022 21:14:48 +0000 (05:14 +0800)]
reduce unnecessary expansions for ConcurrentLong map and set (#3072)
Andrey Yegorov [Fri, 25 Feb 2022 17:26:37 +0000 (09:26 -0800)]
ISSUE #2898: DistributedLogManager can skip over a segment on read.
Descriptions of the changes in this PR:
### Motivation
DLM test suite was flaky.
Repro/troubleshooting shows that DLM can skip over a data segment on read.
### Changes
Test + fix (don't move segment if it moved already).
Master Issue: #2898
Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3064 from dlg99/fix/dlm-issue2898, closes #2898
Nicolò Boschi [Fri, 25 Feb 2022 17:22:41 +0000 (18:22 +0100)]
[CI] remaining-tests is running all the tests
### Motivation
The 'remaining-tests' is supposed to exclude a lot of tests covered by the other checks but the syntax used is wrong
### Changes
* Correctly excluded all the tests not needed in 'remaining-tests' suite
Reviewers: Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com>
This closes #3078 from nicoloboschi/ci/remove-duplicated-tests
ZhangJian He [Fri, 25 Feb 2022 06:36:05 +0000 (14:36 +0800)]
Allow enabling http tls (#2995)
Andrey Yegorov [Wed, 23 Feb 2022 21:03:11 +0000 (13:03 -0800)]
Issue 2974: better thread selection for the Ordered Executor (#3023)
Andrey Yegorov [Wed, 23 Feb 2022 21:01:36 +0000 (13:01 -0800)]
[ISSUE 3031] fixed test; chooseThread() uses orderingKey as a param, not a thread index (#3032)
lin chen [Wed, 23 Feb 2022 10:16:56 +0000 (18:16 +0800)]
fix checkAllLedgersDuration compute (#2970)
lin chen [Wed, 23 Feb 2022 10:15:14 +0000 (18:15 +0800)]
update doc RecoveryBookieService: bookie_dest has been removed (#2961)
Hang Chen [Tue, 22 Feb 2022 19:24:57 +0000 (03:24 +0800)]
fix gradle publishToMavenLocal failed (#3069)
Hang Chen [Tue, 22 Feb 2022 07:00:48 +0000 (15:00 +0800)]
Avoiding call fileChannelProvider init multiple times (#3046)
lin chen [Tue, 22 Feb 2022 05:01:03 +0000 (13:01 +0800)]
Optimize memory:Support shrinking in ConcurrentLongLongPairHashMap (#3061)
* support shrink
* Reduce unnecessary rehash
* check style
* fix: unnecessary rehash
* add unit test: testExpandAndShrink
* fix unit test: testExpandAndShrink
* fix test:
1.verify that the map is able to expand after shrink;
2.does not keep shrinking at every remove() operation;
* 1.add builder;
2.add config:
①MapFillFactor;②MapIdleFactor;③autoShrink;④expandFactor;⑤shrinkFactor
* check style
* 1.check style;
2.add check :
shrinkFactor>1
expandFactor>1
* check style
* keep and Deprecate all the public constructors.
* add final for autoShrink
* fix unit test testExpandAndShrink, set autoShrink true
* add method for update parameters value:
①setMapFillFactor
②setMapIdleFactor
③setExpandFactor
④setShrinkFactor
⑤setAutoShrink
* use lombok.Setter replace lombok.Data
* use pulic for getUsedBucketCount
* 1.check parameters;
2.fix the shrinkage condition:
①newCapacity > size: in order to prevent the infinite loop of rehash, newCapacity should be larger than the currently used size;
②newCapacity > resizeThresholdUp: in order to prevent continuous expansion and contraction, newCapacity should be greater than the expansion threshold;
* 1.update parameters check;
2.fix newCapacity calculation when shrinking :
rehash((int) Math.max(size / mapFillFactor, capacity / shrinkFactor));
* remove set methods:
①setMapFillFactor
②setMapIdleFactor
③setExpandFactor
④setShrinkFactor
⑤setAutoShrink
* Repair shrinkage conditions: ①newCapacity must be the nth power of 2; ②reduce unnecessary shrinkage;
* Repair shrinkage conditions
* add shrinkage when clear
* 1.add test for clear shrink
2. fix initCapacity value
Andrey Yegorov [Tue, 22 Feb 2022 02:17:38 +0000 (18:17 -0800)]
ISSUE 3044: ETCD tests hang. Added global timeout, fork tests jvm, fixed noop slf4j to see log in case of hang (#3051)
Descriptions of the changes in this PR:
### Motivation
ETCD test flake / hang occasionally causing CI job timeout.
### Changes
Added global timeout - kill test early if hanged
fork tests jvm - I think it helped locally (no repro) but possibly just reduced frequency of hangs
fixed noop slf4j warning, also to see log in case of hang
Master Issue: #3044
StevenLuMT [Tue, 22 Feb 2022 00:42:13 +0000 (08:42 +0800)]
set Mod initial Delay time to simply avoid GarbageCollectorThread working at the same time (#3012)
Descriptions of the changes in this PR:
### Motivation
when number of ledger's Dir are more than 1,the same of GarbageCollectorThread will do the same thing,
Especially:
1) deleting ledger, then SyncThread will be timed to do rocksDB compact
2) compact: entry, cost cpu.
### Changes
set a Mod initial Delay time to simply avoid GarbageCollectorThread working at the same time
Andrey Yegorov [Mon, 21 Feb 2022 01:54:24 +0000 (17:54 -0800)]
ISSUE 3034: Fixed flaky test (#3065)
Descriptions of the changes in this PR:
### Motivation
MockExecutorControllerWithSchedulerTest is flaky
### Changes
testExecute flakes because runnable runs actually asynchronously in this case, modified the test.
Master Issue: #3034
Andrey Yegorov [Tue, 15 Feb 2022 20:05:44 +0000 (12:05 -0800)]
ISSUE #3034: Add extra checks in the mock to help with error troubleshooting on CI
### Motivation
Flaky test on CI
### Changes
Added extra check & logging to simplify troubleshooting of the flaky test on CI.
Cannot repro the failure locally after running 100+ times in a loop.
Master Issue: #3034
Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>
This closes #3049 from dlg99/fix/issue3034, closes #3034
Andrey Yegorov [Tue, 15 Feb 2022 17:05:17 +0000 (09:05 -0800)]
ISSUE #3040: RocksDB segfaulted during CompactionTest
Descriptions of the changes in this PR:
### Motivation
RocksDB segfaulted during CompactionTest
### Changes
RocksDB can segfault if one tries to use it after close.
[Shutdown/compaction sequence](https://github.com/apache/bookkeeper/issues/3040#issuecomment-
1036508397) can lead to such situation. The fix prevents segfault.
CompactionTests were updated at some point to use metadata cache and non-cached case is not tested.
I added the test suites for this case.
Master Issue: #3040
Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3043 from dlg99/fix/issue3040, closes #3040
ZhangJian He [Tue, 15 Feb 2022 09:01:14 +0000 (17:01 +0800)]
Bump netty version to 4.1.74.Final (#3045)
### Motivation
Changelog: https://netty.io/news/2022/02/08/4-1-74-Final.html
Netty 4.1.74 had solved several dns resolver bug
### Modifications
* Upgrade Netty from 4.1.73.Final to 4.1.74.Final
* Netty 4.1.74.Final depends on netty-tc-native 2.0.48, also updates
StevenLuMT [Tue, 15 Feb 2022 08:55:57 +0000 (16:55 +0800)]
update metrics (#2999)
Descriptions of the changes in this PR:
### Motivation
some metric's value is not right,so update it
the current is problem-driven, and a comprehensive review will be done later.
### Changes
update 2 metric:
1.Bookie: ReadBytes use entrySize
2.Journal: report journal write error metric
Nicolò Boschi [Tue, 15 Feb 2022 01:43:28 +0000 (02:43 +0100)]
[CI] Dump stacktrace when a job is cancelled
### Motivation
Sometimes CI jobs fail due to timeout. It would be useful understand what the latest test was doing before being interrupted.
### Changes
* Added a new script for dumping stacktrace.
* Added in all the jobs the step in case of `cancelled()` is true.
Reviewers: Andrey Yegorov <None>
This closes #3042 from nicoloboschi/ci-thread-dump
wenbingshen [Mon, 14 Feb 2022 22:50:25 +0000 (06:50 +0800)]
Use OutOfMemoryPolicy when the direct memory is insufficient when reading the entry in ReadCache
### Motivation
Original PR: https://github.com/apache/bookkeeper/pull/1755,
It should be that this PR forgot to modify the memory application method.
When the direct memory is insufficient, it does not fall back to the jvm memory, and the bookie hangs directly.


### Changes
Use `OutOfMemoryPolicy` when the direct memory is insufficient when reading the entry in `ReadCache`.
Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None>
This closes #2836 from wenbingshen/useOutOfMemoryPolicyInReadCache
mauricebarnum [Mon, 14 Feb 2022 20:19:54 +0000 (12:19 -0800)]
fix gradle implicit dependency (#3029)
```
> Task :bookkeeper-tools-framework:compileTestJava
Execution optimizations have been disabled for task ':bookkeeper-tools-framework:compileTestJava' to ensure correctness due to the following reasons:
- Gradle detected a problem with the following location: '/Users/mbarnum/src/bookkeeper/tools/framework/build/classes/java/main'. Reason: Task ':bookkeeper-tools-framework:compileTestJava' uses this output of task ':tools:framework:compileJava' without declaring an explicit or implicit dependency. This can lead to incorrect results being produced, depending on what order the tasks are executed. Please refer to https://docs.gradle.org/7.3.3/userguide/validation_problems.html#implicit_dependency for more details about this problem.
```
Qiang Zhao [Mon, 14 Feb 2022 19:36:06 +0000 (03:36 +0800)]
Add flaky-test template to track many flaky-test.
### Motivation
I found many flaky-test like #3031 #3034 #3033.
Because many flaky tests are actually production code issues so I think it's a good way to add flaky-test template to track them
### Changes
- Add flaky-test template.
Reviewers: Andrey Yegorov <None>
This closes #3035 from mattisonchao/template_flaky_test
Eric Shen [Mon, 14 Feb 2022 19:32:55 +0000 (03:32 +0800)]
fix(cli): incorrect description for autodiscovery
Signed-off-by: Eric Shen <ericshenyuhaooutlook.com>
Descriptions of the changes in this PR:
### Motivation
The description of `bin/bookkeeper autorecovery` is wrong, it won't start in daemon.
### Changes
* Changed the description in bookkeeper shell
* Update the doc
Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>
This closes #2910 from ericsyh/fix-bk-cli
shustsud [Mon, 14 Feb 2022 19:20:49 +0000 (04:20 +0900)]
Explicit error message if an exception other than BKNoSuchLedgerExistsOnMetadataServerException occurs in over-replicated ledger GC
### Motivation
- Even if an exception other than BKNoSuchLedgerExistsOnMetadataServerException occurs of readLedgerMetadata in over-replicated ledger GC, nothing will be output to the log.
(https://github.com/apache/bookkeeper/pull/2844#discussion_r735219876)
### Changes
- If an exception other than BKNoSuchLedgerExistsOnMetadataServerException occurs in readLedgerMetadata, output information to the log.
Reviewers: Andrey Yegorov <None>, Nicolò Boschi <boschi1997@gmail.com>
This closes #2873 from shustsud/improved_error_handling
gaozhangmin [Mon, 14 Feb 2022 19:17:28 +0000 (03:17 +0800)]
Replication stat num-under-replicated-ledgers changed as with the process of replication
Motivation
Now ReplicationStats numUnderReplicatedLedger registers when `publishSuspectedLedgersAsync`, but its value doesn't decrease as with the ledger replicated successfully, We cannot know the progress of replication from the stat.
Changes
registers a notifyUnderReplicationLedgerChanged when auditor starts. numUnderReplicatedLedger value will decrease when the ledger path under replicate deleted.
Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None>
This closes #2805 from gaozhangmin/replication-stats-num-under-replicated-ledgers
Jack Vanlightly [Mon, 14 Feb 2022 19:11:38 +0000 (20:11 +0100)]
BP-46: Running without journal proposal
Includes the BP-46 design proposal markdown document.
Master Issue: #2705
Reviewers: Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com>
This closes #2706 from Vanlightly/bp-44
gaozhangmin [Mon, 14 Feb 2022 19:01:22 +0000 (03:01 +0800)]
delete duplicated semicolon
As title, delete duplicated semicolon
Reviewers: Andrey Yegorov <None>
This closes #2810 from gaozhangmin/remove-duplicated-semicolon
Hang Chen [Mon, 14 Feb 2022 19:00:06 +0000 (03:00 +0800)]
make rocksdb format version configurable
### Motivation
Fix #2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html
https://github.com/facebook/rocksdb/blob/
d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394
```C++
// We currently have five versions:
// 0 -- This version is currently written out by all RocksDB's versions by
// default. Can be read by really old RocksDB's. Doesn't support changing
// checksum (default is CRC32).
// 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default
// checksum, like xxHash. It is written by RocksDB when
// BlockBasedTableOptions::checksum is something other than kCRC32c. (version
// 0 is silently upconverted)
// 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we
// encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
// don't plan to run RocksDB before version 3.10, you should probably use
// this.
// 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we
// encode the keys in index blocks. If you don't plan to run RocksDB before
// version 5.15, you should probably use this.
// This option only affects newly written tables. When reading existing
// tables, the information about version is read from the footer.
// 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we
// encode the values in index blocks. If you don't plan to run RocksDB before
// version 5.16 and you are using index_block_restart_interval > 1, you should
// probably use this as it would reduce the index size.
// This option only affects newly written tables. When reading existing
// tables, the information about version is read from the footer.
// 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
// filters use a generally faster and more accurate Bloom filter
// implementation, with a different schema.
uint32_t format_version = 5;
```
Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version
In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.
### Changes
1. Make the format_version configurable.
Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com>
This closes #2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable
Jack Vanlightly [Mon, 14 Feb 2022 18:56:19 +0000 (19:56 +0100)]
Ensure BookKeeper process receives sigterm in docker container
### Motivation
Current official docker images do not handle the SIGTERM sent by the docker runtime and so get killed after the timeout. No graceful shutdown occurs.
The reason is that the entrypoint does not use `exec` when executing the `bin/bookkeeper` shell script and so the BookKeeper process cannot receive signals from the docker runtime.
### Changes
Use `exec` when calling the `bin/bookkeeper` shell script.
Reviewers: Nicolò Boschi <boschi1997@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, Lari Hotari <None>, Matteo Merli <mmerli@apache.org>
This closes #2857 from Vanlightly/docker-image-handle-sigterm
Hang Chen [Mon, 14 Feb 2022 02:39:23 +0000 (10:39 +0800)]
change log level from error to warn when dns resolver initialize failed (#2856)
Descriptions of the changes in this PR:
### Motivation
When start bookie, it will throws the following error message when dns resolver initialize failed.
```
[main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.bookkeeper.net.ScriptBasedMapping, used default subnet resolver : java.lang.RuntimeException: No network topology script is found when using script based DNS resolver.
```
It is confusing for users.
### Modification
1. change the log level from error to warn.
Andrey Yegorov [Fri, 11 Feb 2022 07:16:30 +0000 (23:16 -0800)]
[ISSUE 3038] Fixed flaky CompactionTest.testMinorCompactionWithMaxTimeMillis (#3039)
Hang Chen [Fri, 11 Feb 2022 03:12:27 +0000 (11:12 +0800)]
Support multi ledger directories for rocksdb backend entryMetadataMap (#2965)
### Motivation
When we use RocksDB backend entryMetadataMap for multi ledger directories configured, the bookie start up failed, and throw the following exception.
```
12:24:28.530 [main] ERROR org.apache.pulsar.PulsarStandaloneStarter - Failed to start pulsar service.
java.io.IOException: Error open RocksDB database
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:202) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.storage.ldb.PersistentEntryLogMetadataMap.<init>(PersistentEntryLogMetadataMap.java:87) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.createEntryLogMetadataMap(GarbageCollectorThread.java:265) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.<init>(GarbageCollectorThread.java:154) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.<init>(GarbageCollectorThread.java:133) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:182) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:190) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.bookkeeper.bookie.BookieResources.createLedgerStorage(BookieResources.java:110) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.buildBookie(LocalBookkeeperEnsemble.java:328) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.8.1.jar:2.8.1]
at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runBookies(LocalBookkeeperEnsemble.java:391) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.8.1.jar:2.8.1]
at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startStandalone(LocalBookkeeperEnsemble.java:521) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.8.1.jar:2.8.1]
at org.apache.pulsar.PulsarStandalone.start(PulsarStandalone.java:264) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
at org.apache.pulsar.PulsarStandaloneStarter.main(PulsarStandaloneStarter.java:121) [org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
Caused by: org.rocksdb.RocksDBException: lock hold by current process, acquire time
1640492668 acquiring thread
123145515651072: data/standalone/bookkeeper00/entrylogIndexCache/metadata-cache/LOCK: No locks available
at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
... 15 more
```
The reason is multi garbageCollectionThread will open the same RocksDB and own the LOCK, and then throw the above exception.
### Modification
1. Change the default GcEntryLogMetadataCachePath from `getLedgerDirNames()[0] + "/" + ENTRYLOG_INDEX_CACHE` to `null`. If it is `null`, it will use each ledger's directory.
2. Remove the internal directory `entrylogIndexCache`. The data structure looks like:
```
└── current
├── lastMark
├── ledgers
│ ├── 000003.log
│ ├── CURRENT
│ ├── IDENTITY
│ ├── LOCK
│ ├── LOG
│ ├── MANIFEST-000001
│ └── OPTIONS-000005
├── locations
│ ├── 000003.log
│ ├── CURRENT
│ ├── IDENTITY
│ ├── LOCK
│ ├── LOG
│ ├── MANIFEST-000001
│ └── OPTIONS-000005
└── metadata-cache
├── 000003.log
├── CURRENT
├── IDENTITY
├── LOCK
├── LOG
├── MANIFEST-000001
└── OPTIONS-000005
```
3. If user configured `GcEntryLogMetadataCachePath` in `bk_server.conf`, it only support one ledger directory configured for `ledgerDirectories`. Otherwise, the best practice is to keep it default.
4. The PR is better to release with #1949
Hang Chen [Fri, 11 Feb 2022 03:11:03 +0000 (11:11 +0800)]
Add rack name invalid check (#2980)
### Motivation
When we set region or rack placement policy, but the region or rack name set to `/` or empty string, it will throw the following exception on handling bookies join.
```
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1841) ~[?:?]
at org.apache.bookkeeper.net.NetworkTopologyImpl$InnerNode.getNextAncestorName(NetworkTopologyImpl.java:144) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.net.NetworkTopologyImpl$InnerNode.add(NetworkTopologyImpl.java:180) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:425) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:717) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:663) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.discover.ZKRegistrationClient$WatchTask.accept(ZKRegistrationClient.java:147) [io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at org.apache.bookkeeper.discover.ZKRegistrationClient$WatchTask.accept(ZKRegistrationClient.java:70) [io.streamnative-bookkeeper-server-4.14.3.1.jar:4.14.3.1]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?]
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
```
The root cause is that the node networkLocation is empty string and then use `substring(1)` operation, which will lead to `StringIndexOutOfBoundsException`
### Modification
1. Add `n.getNetworkLocation()` is empty check on `isAncestor` method to make the exception more clear.
Hang Chen [Fri, 11 Feb 2022 03:10:28 +0000 (11:10 +0800)]
Skip update entryLogMetaMap if not modified (#2964)
### Motivation
After we support RocksDB backend entryMetaMap, we should avoid updating the entryMetaMap if unnecessary.
In `doGcEntryLogs` method, it iterate through the entryLogMetaMap and update the meta if ledgerNotExists. We should check whether the meta has been modified in `removeIfLedgerNotExists`. If not modified, we can avoid update the entryLogMetaMap.
### Modification
1. Add a flag to represent whether the meta has been modified in `removeIfLedgerNotExists` method. If not, skip update the entryLogMetaMap.
Andrey Yegorov [Wed, 9 Feb 2022 21:13:44 +0000 (13:13 -0800)]
Upgrade RocksDB
Descriptions of the changes in this PR:
Dependency change
### Motivation
I encountered https://github.com/apache/bookkeeper/issues/3024 and noticed that newer version of RocksDB includes multiple fixes for concurrency issues with various side-effects and fixes for a few crashes.
I upgraded, ran `org.apache.bookkeeper.bookie.BookieJournalTest` test in a loop and didn't repro the crash so far.
It is hard to say 100% if it is fixed given it was not happening all the time.
### Changes
Upgraded RocksDB
Master Issue: #3024
Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Nicolò Boschi <boschi1997@gmail.com>
This closes #3026 from dlg99/rocksdb-upgrade
Qiang Zhao [Wed, 9 Feb 2022 19:24:21 +0000 (03:24 +0800)]
Fix performance issue to avoid unnessary loop. (#3030)
Yang Yang [Wed, 9 Feb 2022 01:54:54 +0000 (09:54 +0800)]
Support specifying bookie http port as a command argument (#2769)
### Motivation
I was trying to start multiple bookies locally and found it's a bit inconvenient to specify different http ports for different bookies.
### Changes
Add a command-line argument `httpport` to the bookie command to support specifying bookie http port from the command line.