huangxingbo [Thu, 19 May 2022 10:16:12 +0000 (18:16 +0800)]
[hotfix][python] move the method _from_java_type_wrapper from linalg.py to wrapper.py
huangxingbo [Thu, 19 May 2022 09:46:05 +0000 (17:46 +0800)]
Revert "[hotfix][python] Fix the _from_java_type_wrapper is not imported"
This reverts commit
dd161698
huangxingbo [Thu, 19 May 2022 09:32:16 +0000 (17:32 +0800)]
[hotfix][python] Fix the _from_java_type_wrapper is not imported
yunfengzhou-hub [Wed, 18 May 2022 10:45:24 +0000 (18:45 +0800)]
[FLINK-27096] Add LabeledPointWithWeightGenerator and Benchmark Configuration for KMeans and NaiveBayes
This closes #100.
weibo [Tue, 17 May 2022 03:07:27 +0000 (11:07 +0800)]
[FLINK-27294] Add Transformer for BinaryClassificationEvaluator
This closes #86.
huangxingbo [Fri, 13 May 2022 08:57:12 +0000 (16:57 +0800)]
[FLINK-27404][python] Add algorithms completeness test in ML Python API
This closes #99.
huangxingbo [Wed, 11 May 2022 03:24:07 +0000 (11:24 +0800)]
[FLINK-27403][ml][python] Align existing feature engineering algorithms in ML Python API
This closes #98.
Mr-Mu [Tue, 10 May 2022 08:14:07 +0000 (03:14 -0500)]
[hotfix] Fix finding buckets error in Bucketizer (#96)
This closes #96.
Zhipeng Zhang [Sat, 7 May 2022 07:53:35 +0000 (15:53 +0800)]
[FLINK-27091] Add Transformer and Estimator of LinearSVC
This closes #93.
Dong Lin [Fri, 6 May 2022 14:17:49 +0000 (22:17 +0800)]
[hotfix] Bump dependency versions
This closes #94.
yunfengzhou-hub [Wed, 20 Apr 2022 06:26:25 +0000 (14:26 +0800)]
[FLINK-27096] Add a script to visualize benchmark results
This PR also enriches benchmark result file's content with input parameters
This closes #87.
Zhipeng Zhang [Fri, 6 May 2022 08:02:55 +0000 (16:02 +0800)]
[FLINK-27093] Add Transformer and Estimator for LinearRegression
This closes #90.
huangxingbo [Mon, 25 Apr 2022 07:02:52 +0000 (15:02 +0800)]
[FLINK-26269][python] Add clustering algorithm support for KMeans in ML Python API
This closes #91.
Dong Lin [Wed, 27 Apr 2022 03:40:27 +0000 (11:40 +0800)]
[hotfix] Update .asf.yaml file to disable the merge button and update collaborators etc.
This closes #92.
Zhipeng Zhang [Sun, 24 Apr 2022 04:32:28 +0000 (12:32 +0800)]
Merge pull request #89 from lindong28/improve-github-action
[hotfix] Suppress the output of downloading messages in the github action log
This closes #89.
Dong Lin [Sun, 24 Apr 2022 03:59:03 +0000 (11:59 +0800)]
[hotfix] Suppress the output of downloading messages in the github action log
huangxingbo [Thu, 21 Apr 2022 03:01:10 +0000 (11:01 +0800)]
[FLINK-26268][ml][python] Add classfication algorithm support for LogisticRegression, KNN and NaiveBayes in ML Python API
This closes #88.
dependabot[bot] [Thu, 14 Apr 2022 11:56:44 +0000 (19:56 +0800)]
Bump junit from 4.12 to 4.13.1 (#79)
Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1.
Signed-off-by: dependabot[bot] <support@github.com>
zhangzp [Fri, 8 Apr 2022 11:15:48 +0000 (19:15 +0800)]
[FLINK-27072] Add Transformer for Bucketizer
zhangzp [Fri, 8 Apr 2022 11:15:34 +0000 (19:15 +0800)]
[hotfix] Reorganize HasHandleInvalid param in StringIndexer, OneHotEncoder and VectorAssembler
yunfengzhou-hub [Wed, 6 Apr 2022 14:38:00 +0000 (22:38 +0800)]
[Flink 26443] Add benchmark framework
This closes #71.
zhangzp [Sat, 2 Apr 2022 07:20:48 +0000 (15:20 +0800)]
[hotfix] Use inputCol and outputCol in MinMaxScaler and StandardScaler
zhangzp [Sat, 2 Apr 2022 06:17:16 +0000 (14:17 +0800)]
[hotfix] Rename testFeaturePredictionParam to testOutputSchema in unit test
zhangzp [Wed, 23 Mar 2022 02:18:47 +0000 (10:18 +0800)]
[FLINK-26626] Add Transformer and Estimator of StandardScaler
weibo [Sat, 2 Apr 2022 07:32:36 +0000 (15:32 +0800)]
[FLINK-25616] Add Transformer for VectorAssembler
This closes #56.
yunfengzhou-hub [Wed, 30 Mar 2022 07:43:07 +0000 (15:43 +0800)]
[FLINK-26904] Update Stage::load to use StreamTableEnvironment
This closes #76.
yunfengzhou-hub [Wed, 30 Mar 2022 09:28:38 +0000 (17:28 +0800)]
[FLINK-26100] Add link to document website in README
This PR adds a link to Flink ML's document website in README.md. The added content is referenced from Flink repo's README.
This closes #75.
yunfengzhou-hub [Tue, 29 Mar 2022 02:57:19 +0000 (10:57 +0800)]
[FLINK-26313] Add Transformer and Estimator of OnlineKMeans
This closes #70.
Zhipeng Zhang [Wed, 23 Mar 2022 02:28:33 +0000 (10:28 +0800)]
[FLINK-25527] Add Transformer and Estimator for StringIndexer
This closes #52.
weibo [Mon, 21 Mar 2022 11:33:03 +0000 (19:33 +0800)]
[FLINK-25552] Add Estimator and Transformer for MinMaxScaler
This closes #54.
Jingsong Lee [Fri, 18 Mar 2022 14:17:16 +0000 (22:17 +0800)]
[hotfix] Fix document minor error in docs.README.md
This closes #72.
Mr-Mu [Thu, 10 Mar 2022 05:05:20 +0000 (23:05 -0600)]
[FLINK-26404] Support non-local file systems
This closes #68.
zhangzp [Fri, 25 Feb 2022 03:18:46 +0000 (11:18 +0800)]
[FLINK-26263] (followup) Check data size in LogisticRegression
This closes #66.
huangxingbo [Wed, 23 Feb 2022 03:24:49 +0000 (11:24 +0800)]
[FLINK-26267][ml][python] Add common params interface in ML Python API
This closes #65.
huangxingbo [Tue, 22 Feb 2022 10:00:44 +0000 (18:00 +0800)]
[FLINK-26266][ml][python] Support Vector and Matrix in ML Python API
This closes #64.
yunfengzhou-hub [Mon, 21 Feb 2022 01:44:42 +0000 (09:44 +0800)]
[FLINK-26263] Check data size in LogisticRegression
This closes #63.
yunfengzhou-hub [Fri, 18 Feb 2022 09:55:04 +0000 (17:55 +0800)]
[FLINK-26100][docs] Add doc for ops & key concepts (#62)
This closes #62.
yunfengzhou-hub [Thu, 17 Feb 2022 06:08:01 +0000 (14:08 +0800)]
[FLINK-26100][docs] Set up Flink ML Document Website
This closes #58.
Dong Lin [Thu, 13 Jan 2022 07:17:40 +0000 (15:17 +0800)]
[hotfix] Remove redundant directories
This closes #55.
zhangzp [Wed, 5 Jan 2022 10:01:16 +0000 (18:01 +0800)]
[hotfix] Add scala suffix for flink-ml-uber dependency and flink-ml-lib as a dependency of flink-ml-uber
Dong Lin [Fri, 31 Dec 2021 01:53:03 +0000 (09:53 +0800)]
[hotfix][python] Update setup.py to throw error if python version > 3.8
This closes #51.
zhangzp [Thu, 30 Dec 2021 09:27:44 +0000 (17:27 +0800)]
[hotfix] Add scala version as suffix of all modules
zhangzp [Thu, 30 Dec 2021 05:59:40 +0000 (13:59 +0800)]
[hotfix] Update README.md, NOTICE and license of blas in flink-ml-uber
Yun Gao [Tue, 28 Dec 2021 02:16:41 +0000 (10:16 +0800)]
[release] Update version to 2.1-SNAPSHOT
Yun Gao [Mon, 27 Dec 2021 16:59:52 +0000 (00:59 +0800)]
[hotfix] Remove scala target and fix script errors
1. The release profile must be called apache-release to also deploy
source and javadoc jars.
2. Remove the command related to python whl package.
3. Always relies on scala 2.12.
Dong Lin [Fri, 24 Dec 2021 14:04:10 +0000 (22:04 +0800)]
[hotfix] Add the NOTICE file
This closes #45.
Yun Gao [Mon, 27 Dec 2021 07:04:41 +0000 (15:04 +0800)]
Revert "[release] Update version to 2.1-SNAPSHOT"
This reverts commit
ac7da66c0e30ff925d1465d9d1251a38c05ddc08.
Dong Lin [Fri, 24 Dec 2021 04:32:32 +0000 (12:32 +0800)]
[hotfix] Add scripts to build and release python artifacts
This closes #44.
Yun Gao [Thu, 23 Dec 2021 15:08:33 +0000 (23:08 +0800)]
[release] Update version to 2.1-SNAPSHOT
Dong Lin [Thu, 23 Dec 2021 08:48:35 +0000 (16:48 +0800)]
Prepares Flink ML for 2.0.0 release
This commit makes the following changes:
- Adds scripts under tools/releasing
- Updates Flink ML version to 2.0-SNAPSHOT
- Includes flink-ml-uber in the pom.xml
- Updates flink-ml-uber/pom.xml to include the core module and the iteration module
- Updates the release profile in pom.xml to follow flink-statefun's practice
- Removes flink-ml-examples
abdelrahman-ik [Tue, 21 Dec 2021 01:16:19 +0000 (20:16 -0500)]
[FLINK-25394] Upgrade log4j to 2.17.0
This closes #42.
Dong Lin [Tue, 21 Dec 2021 06:47:26 +0000 (14:47 +0800)]
[hotfix][iteration] Updates onEpochWatermarkIncremented() and onIterationTerminated() to throw Exception
This closes #41.
Dong Lin [Mon, 25 Oct 2021 09:08:56 +0000 (17:08 +0800)]
[FLINK-23959][FLIP-175] Compose Estimator/Model/AlgoOperator from DAG of Estimator/Model/AlgoOperator
This closes #20.
weibo [Tue, 14 Dec 2021 02:40:18 +0000 (10:40 +0800)]
[FLINK-24557] Add Estimator and Transformer for K-nearest neighbor
This closes #24.
zhangzp [Fri, 17 Dec 2021 09:36:41 +0000 (17:36 +0800)]
[FLINK-24556] Make model data pojo for naive bayes, kmeans and logistic regression
This closes #28.
zhangzp [Mon, 13 Dec 2021 08:56:01 +0000 (16:56 +0800)]
[hotfix] Reformat naive bayes, kmeans and logistic regression
zhangzp [Mon, 13 Dec 2021 08:33:42 +0000 (16:33 +0800)]
[FLINK-24556] Add Estimator and Transformer for logistic regression
zhangzp [Wed, 17 Nov 2021 06:31:27 +0000 (14:31 +0800)]
[FLINK-24845] Add allreduce utility function in FlinkML
This closes #30.
Yunfeng Zhou [Thu, 2 Dec 2021 07:32:17 +0000 (15:32 +0800)]
[FLINK-24955] Add Estimator and Transformer for One Hot Encoder
This closes #37.
zhangzp [Fri, 26 Nov 2021 10:11:39 +0000 (18:11 +0800)]
[hotfix] Move infra from flink-ml-lib to flink-ml-core
This closes #39.
Yunfeng Zhou [Wed, 17 Nov 2021 02:59:03 +0000 (10:59 +0800)]
[FLINK-24817] Add Estimator and Transformer for Naive Bayes
This closes #32.
huangxingbo [Wed, 1 Dec 2021 09:31:59 +0000 (17:31 +0800)]
[FLINK-25120][python] Add many kinds of checks in ML Python API
This closes #40.
huangxingbo [Wed, 1 Dec 2021 09:31:14 +0000 (17:31 +0800)]
[FLINK-25120][python] Fix some type annotations
This closes #40.
huangxingbo [Wed, 17 Nov 2021 07:58:13 +0000 (15:58 +0800)]
[FLINK-24933][python] Support ML Python API to implement FLIP-173 and FLP-174
This closes #36.
Dong Lin [Thu, 25 Nov 2021 12:06:25 +0000 (20:06 +0800)]
[hotfix] Rename flink-ml-api to flink-ml-core
This closes #38.
zhangzp [Thu, 18 Nov 2021 09:29:56 +0000 (17:29 +0800)]
[hotfix] Cache records when processing elements rather than only in snapshot func
zhangzp [Wed, 17 Nov 2021 11:46:12 +0000 (19:46 +0800)]
[hotfix] Fix nullpointer exception when broadcast variables are cleaned
Dong Lin [Tue, 26 Oct 2021 05:30:58 +0000 (13:30 +0800)]
[FLINK-24810] Add Estimator and Model for the k-means clustering algorithm
This closes #27.
Dong Lin [Sun, 14 Nov 2021 12:54:54 +0000 (20:54 +0800)]
[FLINK-22915][FLIP-173] Updates the static load(...) method of Stage subclasses to take StreamExecutionEnvironment as parameter
Dong Lin [Mon, 15 Nov 2021 08:12:57 +0000 (16:12 +0800)]
[FLINK-22915][FLIP-173] Updates Model::setModelData(...) to return the Model instance itself
This closes #33.
Dong Lin [Mon, 15 Nov 2021 08:25:29 +0000 (16:25 +0800)]
[FLINK-24354][FLIP-174] Updates WithParams::set(...) to throw Exception if the given param is not defined on this instance
This closes #35.
Yun Gao [Mon, 15 Nov 2021 03:36:30 +0000 (11:36 +0800)]
[hotfix] Mark BroadcastUtils as internal
Yun Gao [Mon, 15 Nov 2021 03:32:43 +0000 (11:32 +0800)]
[hotfix][iteration] Return more fine-grained operator class for the WrapperFactory
Yun Gao [Thu, 11 Nov 2021 08:04:23 +0000 (16:04 +0800)]
[FLINK-24842][iteration] Make outputs depends on tails for the iteration body
This closes #31.
Yun Gao [Sat, 6 Nov 2021 06:27:58 +0000 (14:27 +0800)]
[FLINK-24808][iteration] Support IterationListener for per-round operators
This closes #26.
Yun Gao [Thu, 11 Nov 2021 12:06:03 +0000 (20:06 +0800)]
[hotfix][iteration] Fix the bad compile due to changes of state
Yun Gao [Wed, 10 Nov 2021 14:27:49 +0000 (22:27 +0800)]
[FLINK-24807][iteration] Not start logging at the head operator if the barrier feed back first
This closes #25.
Yun Gao [Fri, 5 Nov 2021 17:41:02 +0000 (01:41 +0800)]
[FLINK-24807][iteration] Support raw operator state
Yun Gao [Fri, 5 Nov 2021 06:55:22 +0000 (14:55 +0800)]
[FLINK-24807][iteration] not emit CoordinatorCheckpointEvent after terminating
Currently the HeadCoordinator would emit CoordinatorCheckpointEvent to
the tasks so that the GloballyAlignedEvent would not be interleave with
the checkpoint barrier. Howver, if the tasks are finished and we
continue emitting the event, the checkpoint would fail due to there
are failed operator events. To address this issue, we would stop
emitting CoordinatorCheckpointEvent after the head operator is terminating,
namely it received the GloballyAlignedEvent marking terminating.
Yun Gao [Thu, 4 Nov 2021 16:39:54 +0000 (00:39 +0800)]
[FLINK-24807][iteration] Add per-round checkpoint IT case
Yun Gao [Thu, 4 Nov 2021 16:39:40 +0000 (00:39 +0800)]
[hotfix][iteration] Rename the all-round checkpoint test to be it case
Yun Gao [Wed, 3 Nov 2021 16:47:53 +0000 (00:47 +0800)]
[FLINK-24807][iteration] Stores the state for per-round wrapper
Yun Gao [Wed, 3 Nov 2021 07:16:23 +0000 (15:16 +0800)]
[FLINK-24807][iteration] Support snapshot the ReplayOperator
Yun Gao [Tue, 2 Nov 2021 15:57:20 +0000 (23:57 +0800)]
[FLINK-24729][iteration] Support iteration with mixed operator life-cycle
This closes #23.
Yun Gao [Mon, 8 Nov 2021 03:10:04 +0000 (11:10 +0800)]
[hotfix][iteration] Fixes the wrong type for criteria merger
Yun Gao [Tue, 2 Nov 2021 09:06:12 +0000 (17:06 +0800)]
[FLINK-24722][iteration] Fix the issues in supporting keyed stream inside the iteration body
This closes #22.
Dong Lin [Sun, 26 Sep 2021 13:40:59 +0000 (21:40 +0800)]
[FLINK-24354][FLIP-174] Improve the WithParams interface
Dong Lin [Sun, 26 Sep 2021 13:37:32 +0000 (21:37 +0800)]
[FLINK-24354][FLIP-174] Remove old param-related classes
Dong Lin [Tue, 9 Nov 2021 09:21:20 +0000 (17:21 +0800)]
[hotfix] Remove those library infra classes that need to be revisited
This closes #19.
zhangzp [Fri, 5 Nov 2021 07:33:04 +0000 (15:33 +0800)]
[FLINK-24279] support withBroadcast in DataStream
This closes #18.
Yun Gao [Mon, 1 Nov 2021 15:56:09 +0000 (23:56 +0800)]
[FLINK-24655][iteration] Add ITCase for the checkpoint and failover
This closes #17.
Yun Gao [Thu, 7 Oct 2021 02:47:30 +0000 (10:47 +0800)]
[FLINK-24655][iteration] Do not rely on the precedent tasks to insert epoch watermark
If the finished and failover, then it would be skipped and would not
insert the epoch watermark again.
Yun Gao [Wed, 6 Oct 2021 17:28:27 +0000 (01:28 +0800)]
[FLINK-24655][iteration] Skip the repeat round for all-round operator
Yun Gao [Wed, 6 Oct 2021 16:39:49 +0000 (00:39 +0800)]
[FLINK-24655][iteration] Support the checkpoints for the iteration
Yun Gao [Mon, 4 Oct 2021 13:21:51 +0000 (21:21 +0800)]
[FLINK-24655][iteration] Support snapshot the feedback records on checkpoint
Yun Gao [Mon, 4 Oct 2021 13:20:49 +0000 (21:20 +0800)]
[FLINK-24655][iteration] Make head operator aligned with coordinator for each checkpoint
Yun Gao [Wed, 6 Oct 2021 10:36:29 +0000 (18:36 +0800)]
[hotfix][iteration] Simplify the head operator test
Yun Gao [Wed, 6 Oct 2021 08:19:21 +0000 (16:19 +0800)]
[FLINK-24655][iteration] HeadOperator waits for MAX_WATERMARK iterates back before terminating.
This is a basis for the checkpoint since for checkpoints
with feedback edges, we would need to also include the
feedback records into snapshot, thus if we want to make
sure all the checkpoints before the terminated globally
aligned events get done, we have to wait for one more round.
Yun Gao [Sat, 2 Oct 2021 18:26:38 +0000 (02:26 +0800)]
[FLINK-24653][iteration] Support per-round operators inside the iteration
Yun Gao [Mon, 1 Nov 2021 04:58:36 +0000 (12:58 +0800)]
[hotfix][iteration] Merge the IterationFactory and Iterations