incubator-hivemall.git
2 years agoMerge remote-tracking branch 'origin/v0.6.0'
Makoto Yui [Thu, 19 Dec 2019 08:06:44 +0000 (17:06 +0900)] 
Merge remote-tracking branch 'origin/v0.6.0'

2 years agoUpdated copyrights holders
Makoto Yui [Thu, 19 Dec 2019 05:18:25 +0000 (14:18 +0900)] 
Updated copyrights holders

2 years ago[HIVEMALL-288] mf_predict throws SemanticException No matching method with (array...
Makoto Yui [Thu, 12 Dec 2019 08:32:27 +0000 (17:32 +0900)] 
[HIVEMALL-288] mf_predict throws SemanticException No matching method with (array<double>, array<double>, int)

## What changes were proposed in this pull request?

`mf_predict` throws SemanticException No matching method with (array<double>, array<double>, int)

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-288

## How was this patch tested?

manual tests on EMR

```sql
select
  -- 3 arguments
  mf_predict(array(cast(1.0 as float),cast(2.0 as float),cast(3.0 as float)), array(cast(1.0 as float),cast(2.0 as float),cast(3.0 as float)), 1),
  mf_predict(array(1.0,2.0,3.0), array(1.0,2.0,3.0), 1),
  mf_predict(array(cast(1.0 as DOUBLE),cast(2.0 as DOUBLE),cast(3.0 as DOUBLE)), array(cast(1.0 as DOUBLE),cast(2.0 as DOUBLE),cast(3.0 as DOUBLE)), 1),
  -- 2 arguments
  mf_predict(array(1.0,2.0,3.0), array(1.0,2.0,3.0)),
  -- 4 arguments
  mf_predict(array(1.0,2.0,3.0), array(1.0,2.0,3.0), 0, 0),
  -- 5 arguments
  mf_predict(array(1.0,2.0,3.0), array(1.0,2.0,3.0), 0, 0, 1);
```

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #224 from myui/HIVEMALL-288.

2 years agoUpdate date
Makoto Yui [Tue, 3 Dec 2019 06:21:35 +0000 (15:21 +0900)] 
Update date

2 years ago[DOC] update titanic random forest doc for decision_path
Makoto Yui [Mon, 2 Dec 2019 10:25:54 +0000 (19:25 +0900)] 
[DOC] update titanic random forest doc for decision_path

2 years agoFixed release guide
Makoto Yui [Thu, 28 Nov 2019 18:26:51 +0000 (03:26 +0900)] 
Fixed release guide

2 years ago[maven-release-plugin] prepare for next development iteration v0.6.0
Makoto Yui [Thu, 28 Nov 2019 16:43:53 +0000 (01:43 +0900)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release v0.6.0-rc1 v0.6.0-rc1
Makoto Yui [Thu, 28 Nov 2019 16:43:43 +0000 (01:43 +0900)] 
[maven-release-plugin] prepare release v0.6.0-rc1

2 years agoBumped version string to 0.6.0-incubating
Makoto Yui [Thu, 28 Nov 2019 16:41:45 +0000 (01:41 +0900)] 
Bumped version string to 0.6.0-incubating

2 years agoMinor refactoring and fixed function docs
Makoto Yui [Thu, 28 Nov 2019 07:46:02 +0000 (16:46 +0900)] 
Minor refactoring and fixed function docs

2 years ago[HIVEMALL-159][DOC] Add documentation about One-hot encoding
Makoto Yui [Thu, 28 Nov 2019 07:11:17 +0000 (16:11 +0900)] 
[HIVEMALL-159][DOC] Add documentation about One-hot encoding

## What changes were proposed in this pull request?

Add documentation about One-hot encoding

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-159

## How to use this feature?

See userguide

Author: Makoto Yui <myui@apache.org>

Closes #223 from myui/onehot_docs.

2 years ago[HIVEMALL-56][DOC] Add documentation about Similarity/Distance functions
Makoto Yui [Wed, 27 Nov 2019 09:03:41 +0000 (18:03 +0900)] 
[HIVEMALL-56][DOC] Add documentation about Similarity/Distance functions

## What changes were proposed in this pull request?

Add documentation about Similarity/Distance functions

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-56

## Checklist

Author: Makoto Yui <myui@apache.org>

Closes #222 from myui/HIVEMALL-56.

2 years ago[HIVEMALL-158][DOC] Refine deprecated userguide contents
Makoto Yui [Wed, 27 Nov 2019 07:42:34 +0000 (16:42 +0900)] 
[HIVEMALL-158][DOC] Refine deprecated userguide contents

## What changes were proposed in this pull request?

Refine deprecated userguide contents

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-158

Author: Makoto Yui <myui@apache.org>

Closes #221 from myui/HIVEMALL-158.

2 years ago[HIVEMALL-285] Add -inspect_opts option to show hyperparameters
Makoto Yui [Wed, 27 Nov 2019 07:11:56 +0000 (16:11 +0900)] 
[HIVEMALL-285] Add -inspect_opts option to show hyperparameters

## What changes were proposed in this pull request?

Add `-inspect_opts` option to show hyperparameters

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-285

## How was this patch tested?

manual tests on EMR

## How to use this feature?

```sql
select train_regressor(array(), 0, '-inspect_opts -optimizer adam -reg elasticnet');

FAILED: UDFArgumentException Inspected Optimizer options ...
{disable_cvtest=false, regularization=ElasticNet, loss_function=SquaredLoss, eps=1.0E-8, decay=0.0, iterations=10, eta0=0.1, l1_ratio=0.5, lambda=1.0E-4, eta=Invscaling, optimizer=adam, beta1=0.9, beta2=0.999, alpha=1.0, cv_rate=0.005, power_t=0.1}
```

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #220 from myui/HIVEMALL-285.

2 years agoRevised exception type
Makoto Yui [Tue, 26 Nov 2019 06:43:09 +0000 (15:43 +0900)] 
Revised exception type

2 years agoMinor refactoring
Makoto Yui [Tue, 26 Nov 2019 06:39:30 +0000 (15:39 +0900)] 
Minor refactoring

2 years ago[HIVEMALL-283] Bump up netty version to 4.1.42.Final
Makoto Yui [Tue, 26 Nov 2019 04:54:43 +0000 (13:54 +0900)] 
[HIVEMALL-283] Bump up netty version to 4.1.42.Final

## What changes were proposed in this pull request?

Bump up netty version to 4.1.42.Final

This closes #206 and closes #207

## What type of PR is it?

Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-283

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #219 from myui/HIVEMALL-283.

2 years ago[HIVEMALL-226] Move hivemall.fm and hivemall.mf packages to under hivemall.factorization
Makoto Yui [Mon, 25 Nov 2019 18:58:42 +0000 (03:58 +0900)] 
[HIVEMALL-226] Move hivemall.fm and hivemall.mf packages to under hivemall.factorization

## What changes were proposed in this pull request?

Move hivemall.fm and hivemall.mf packages to under hivemall.factorization

## What type of PR is it?

Refactoring

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-226

## How was this patch tested?

unit tests and manual tests on EMR

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #218 from myui/HIVEMALL-266.

2 years agoUpdate javadoc and applied formatter
Makoto Yui [Mon, 25 Nov 2019 17:05:56 +0000 (02:05 +0900)] 
Update javadoc and applied formatter

2 years ago[HIVEMALL-165] Fixed to accept any primitive
Makoto Yui [Mon, 25 Nov 2019 16:53:29 +0000 (01:53 +0900)] 
[HIVEMALL-165] Fixed to accept any primitive

## What changes were proposed in this pull request?

Fix a bug that `array_remove` UDF throws exception when the first argument is null

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-165

## How was this patch tested?

manual tests on EMR

## How to use this feature?

```sql
WITH data4 as (
  select false as n, array(2.0, 3.0, 4.0) as nums
  union all
   select true as n, array(2.0, 3.0, 4.0) as nums
)
select
  array_remove(if(n = true, null, nums), 2.0) as c1,
  array_remove(if(n = true, null, nums), array(3.0,2.0)) as c2,
  array_remove(if(n = false, null, nums), 2.0) as c3
from
  data4;
> c1      c2      c3
> [3,4]   [4]     NULL
> NULL    NULL    [3,4]

select array_remove(array(2.0,2.1,3.0,4.0,2.0),2), array_remove(array(2.0,3.0,4.0),array(3,2.0));
> [2.1,3,4]       [4]

SELECT array_remove(array(1,null,3),null);
> [1,3]

SELECT array_remove(array(1,null,3,null,5),null);
> [1,3,5]

SELECT array_remove(array(1,null,3),array(null));
> [1,3]

SELECT array_remove(array('aaa','bbb'),'bbb');
> ["aaa"]

SELECT array_remove(array('aaa','bbb','ccc','bbb'), array('bbb','ccc'));
> ["aaa"]

select array_remove(array(null),null);
> []

select array_remove(array(null,'bbb'),'aaa');
> [null,"bbb"]
```

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #217 from myui/HIVEMALL-165.

2 years ago[HIVEMALL-121] Add -libsvm formatting option to feature_hashing UDF
Makoto Yui [Mon, 25 Nov 2019 10:03:15 +0000 (19:03 +0900)] 
[HIVEMALL-121] Add -libsvm formatting option to feature_hashing UDF

## What changes were proposed in this pull request?

Add `-libsvm` formatting option for `feature_hashing

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-121

## How was this patch tested?

unit tests, manual tests on EMR

## How to use this feature?

```sql
select feature_hashing(array('aaa:1.0','aaa','bbb:2.0'), '-libsvm');
> ["4063537:1.0","4063537:1","8459207:2.0"]

select feature_hashing(array('aaa:1.0','aaa','bbb:2.0'), '-features 10 -libsvm');
> ["1:2.0","7:1.0","7:1"]
```

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #216 from myui/HIVEMALL-121.

2 years ago[HIVEMALL-249] Fix fmeasure UDAF to support any integers
Makoto Yui [Mon, 25 Nov 2019 08:50:35 +0000 (17:50 +0900)] 
[HIVEMALL-249] Fix fmeasure UDAF to support any integers

## What changes were proposed in this pull request?

Fix fmeasure UDAF to support any integers

## What type of PR is it?

Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-249

## How to use this feature?

```sql
create table data2 as
  select 1.1 as truth, 0 as predicted
union all
  select 0.0 as truth, 1 as predicted
union all
  select 0.0 as truth, 0 as predicted
union all
  select 1.0 as truth, 1 as predicted
union all
  select 0.0 as truth, 1 as predicted
union all
  select 0.0 as truth, 0 as predicted
;

select fmeasure(truth, predicted, '-average binary') from data;
```

## How was this patch tested?

manual tests on EMR

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #215 from myui/HIVEMALL-249.

2 years ago[HIVEMALL-276] Stable support for XGBoost v0.90
Makoto Yui [Fri, 22 Nov 2019 15:56:36 +0000 (00:56 +0900)] 
[HIVEMALL-276] Stable support for XGBoost v0.90

## What changes were proposed in this pull request?

- Fix xgboost module to create DMatrix from CSRMatrix
- Support xgboost v0.90 hyperparameters
- Replace xgboost4j with [xgboost-predictor](https://github.com/komiya-atsushi/xgboost-predictor-java) for prediction
- Add documentation about Xgboost

## What type of PR is it?

Refactoring, Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-276
https://issues.apache.org/jira/browse/HIVEMALL-275
https://issues.apache.org/jira/browse/HIVEMALL-279
https://issues.apache.org/jira/browse/HIVEMALL-272
https://issues.apache.org/jira/browse/HIVEMALL-27

## How to use this feature?

as described in [user guide](http://hivemall.apache.org/userguide/index.html).

## How was this patch tested?

unit tests and manual tests on EMR

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #213 from myui/HIVEMALL-275-2.

2 years ago[HIVEMALL-281] Support max_by, min_by, majority_vote UDAFs
Makoto Yui [Fri, 22 Nov 2019 14:17:11 +0000 (23:17 +0900)] 
[HIVEMALL-281] Support max_by, min_by, majority_vote UDAFs

## What changes were proposed in this pull request?

upport max_by, min_by, majority_vote UDAFs

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-281

## How was this patch tested?

manual tests on EMR

## How to use this feature?

```sql

create table data1 as (
  select 'jake' as name, 18 as age
  union all
  select 'tom' as name, 64 as age
  union all
  select 'lisa' as name, 32 as age
);

select
  max_by(name, age) as max_name,
  min_by(name, age) as min_name
from
  data1;
> tom, jake

create table data2 as
  select
    explode(array('1', '2', '2', '2', '5', '4', '1', '2')) as k;

select
  majority_vote(k) as k
from
  data2;
> 2
```

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #214 from myui/HIVEMALL-281.

2 years ago[HOTFIX] bumped matrix4j version to 0.9.2
Makoto Yui [Mon, 11 Nov 2019 05:38:54 +0000 (14:38 +0900)] 
[HOTFIX] bumped matrix4j version to 0.9.2

2 years ago[HIVEMALL-278] Bumped matrix4j version to v0.9.1
Makoto Yui [Fri, 1 Nov 2019 09:27:53 +0000 (18:27 +0900)] 
[HIVEMALL-278] Bumped matrix4j version to v0.9.1

## What changes were proposed in this pull request?

Bumped matrix4j version to v0.9.1 since matrix4j v0.9.0 had a bug on constructing CSRMatrix in an unordered column order.

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-278

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #212 from myui/HIVEMALL-278.

2 years agoadd missing junit dependency
Makoto Yui [Thu, 31 Oct 2019 10:58:20 +0000 (19:58 +0900)] 
add missing junit dependency

2 years agoAdded SparseDMatrixBuilder 211/head
Makoto Yui [Thu, 31 Oct 2019 10:17:54 +0000 (19:17 +0900)] 
Added SparseDMatrixBuilder

2 years agoRenamed XGBoostUDTF as XGBoostBaseUDTF
Makoto Yui [Thu, 31 Oct 2019 10:17:31 +0000 (19:17 +0900)] 
Renamed XGBoostUDTF as XGBoostBaseUDTF

2 years ago[HIVEMALL-274] Fix wrong column name of train_regressor() in tutorial
Aki Ariga [Thu, 31 Oct 2019 07:44:44 +0000 (16:44 +0900)] 
[HIVEMALL-274] Fix wrong column name of train_regressor() in tutorial

## What changes were proposed in this pull request?

Fix document bug reported in HIVEMALL-274

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/projects/HIVEMALL/issues/HIVEMALL-274

## How was this patch tested?

N/A

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Aki Ariga <ariga@treasure-data.com>

Closes #210 from chezou/HIVEMALL-274.

2 years agoAdded document about xgboost_version() UDF
Makoto Yui [Wed, 30 Oct 2019 08:59:49 +0000 (17:59 +0900)] 
Added document about xgboost_version() UDF

2 years ago[HIVEMALL-273] Support xgboost v0.90
Makoto Yui [Wed, 30 Oct 2019 07:41:21 +0000 (16:41 +0900)] 
[HIVEMALL-273] Support xgboost v0.90

## What changes were proposed in this pull request?

Support xgboost v0.90

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-273

## How was this patch tested?

unit tests and manual tests on EMR

## How to use this feature?

https://gist.github.com/myui/aa6e142a95ca8f995cc8e49146dbe2eb

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #209 from myui/HIVEMALL-273.

2 years ago[HIVEMALL-260] Remove dependencies to Scala library in xgboost classifier
Makoto Yui [Tue, 29 Oct 2019 06:37:43 +0000 (15:37 +0900)] 
[HIVEMALL-260] Remove dependencies to Scala library in xgboost classifier

## What changes were proposed in this pull request?

Remove dependencies to Scala library in xgboost classifier

## What type of PR is it?

Bug Fix, Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-260

## How was this patch tested?

manual tests on EMR

## How to use this feature?

to appear

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #205 from myui/HIVEMALL-260.

2 years agoRemove rand_gid/rand_gid2 macro
Makoto Yui [Wed, 23 Oct 2019 09:44:41 +0000 (18:44 +0900)] 
Remove rand_gid/rand_gid2 macro

## What changes were proposed in this pull request?

Remove rand_gid/rand_gid2 macro

## What type of PR is it?

Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-270

Author: Makoto Yui <myui@apache.org>

Closes #204 from myui/HIVEMALL-270.

2 years ago[HIVEMALL-261][HIVEMALL-262] argmin/argmax/argsort UDF
Makoto Yui [Wed, 23 Oct 2019 09:01:51 +0000 (18:01 +0900)] 
[HIVEMALL-261][HIVEMALL-262] argmin/argmax/argsort UDF

## What changes were proposed in this pull request?

Introduce argmin/argmax/argsort UDF

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-261
https://issues.apache.org/jira/browse/HIVEMALL-262

## How was this patch tested?

unit tests, manual tests on EMR

## How to use this feature?

```sql
SELECT argmax(array(5,2,0,1));
> 0

SELECT array_slice(array(5,2,0,1), argmax(array(5,2,0,1)));
> 5

SELECT argmin(array(5,2,0,1));
> 2

SELECT argsort(array(5,2,0,1));
> 2, 3, 1, 0

SELECT array_slice(array(5,2,0,1), argsort(array(5,2,0,1)));
> 0, 1, 2, 5

SELECT argsort(argsort(array(5,2,0,1))), argrank(array(5,2,0,1));
> 3, 2, 0, 1

SELECT arange(5), arange(1, 5), arange(1, 5, 1), arange(0, 5, 1);
> [0,1,2,3,4]     [1,2,3,4]       [1,2,3,4]       [0,1,2,3,4]

SELECT arange(1, 6, 2);
> 1, 3, 5

SELECT arange(-1, -6, 2);
> -1, -3, -5

SELECT argsort(array(5, 2, 0, 1)), argrank(array(5, 2, 0, 1)), argsort(argsort(array(5, 2, 0, 1)));
> [2,3,1,0]       [3,2,0,1]       [3,2,0,1]
```

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #197 from myui/argmax.

2 years ago[HIVEMALL-244] Support Java9, Java11(LTS)
Makoto Yui [Mon, 21 Oct 2019 07:22:05 +0000 (16:22 +0900)] 
[HIVEMALL-244] Support Java9, Java11(LTS)

## What changes were proposed in this pull request?

Support Java9, Java11(LTS)

## What type of PR is it?

Improvement | Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-244

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #203 from myui/HIVEMALL-244.

2 years ago[HIVEMALL-269] Modified to use matrix4j for matrix module
Makoto Yui [Fri, 18 Oct 2019 08:42:16 +0000 (17:42 +0900)] 
[HIVEMALL-269] Modified to use matrix4j for matrix module

## What changes were proposed in this pull request?

 Use matrix4j for matrix module

## What type of PR is it?

Hot Fix | Refactoring

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-269

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #202 from myui/HIVEMALL-269.

2 years agoFixed annotations
Makoto Yui [Tue, 8 Oct 2019 07:15:24 +0000 (16:15 +0900)] 
Fixed annotations

2 years agoMoved matrix/random package to utils/random
Makoto Yui [Mon, 7 Oct 2019 07:16:19 +0000 (16:16 +0900)] 
Moved matrix/random package to utils/random

2 years agoMerged ArrayUtilsTest
Makoto Yui [Mon, 7 Oct 2019 05:44:39 +0000 (14:44 +0900)] 
Merged ArrayUtilsTest

2 years ago[HIVEMALL-267] Drop Spark Dataframe support (SparkSQL remain supported)
Makoto Yui [Fri, 4 Oct 2019 05:28:49 +0000 (14:28 +0900)] 
[HIVEMALL-267] Drop Spark Dataframe support (SparkSQL remain supported)

## What changes were proposed in this pull request?

Drop Spark Dataframe support (SparkSQL remain supported).

## What type of PR is it?

Hot Fix, Refactoring

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-267

## How was this patch tested?

unit tests, manual tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #201 from myui/HIVEMALL-267.

2 years ago[HIVEMALL-268] Fix the default vInit, eta initialization bug in FactorizationMachines
Makoto Yui [Thu, 3 Oct 2019 08:34:10 +0000 (17:34 +0900)] 
[HIVEMALL-268] Fix the default vInit, eta initialization bug in FactorizationMachines

## What changes were proposed in this pull request?

Fix the default vInit, eta initialization bug in FactorizationMachines

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-268

## How was this patch tested?

unit tests, manual tests on EMR

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #200 from myui/HIVEMALL-268.

3 years ago[HIVEMALL-171] Tracing functionality for prediction of DecisionTrees
Makoto Yui [Fri, 27 Sep 2019 18:39:01 +0000 (03:39 +0900)] 
[HIVEMALL-171] Tracing functionality for prediction of DecisionTrees

## What changes were proposed in this pull request?

Introduce `decision_path` UDF providing tracing of decision tree prediction paths

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-171

## How was this patch tested?

unit tests, manual tests on EMR

## How to use this feature?

to be described in the user guide

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #199 from myui/HIVEMALL-171.

3 years ago[HIVEMALL-245] Refactor RandomForest for Sparse Data handling
Makoto Yui [Fri, 13 Sep 2019 09:23:00 +0000 (18:23 +0900)] 
[HIVEMALL-245] Refactor RandomForest for Sparse Data handling

## What changes were proposed in this pull request?

Refactor RandomForest for Sparse Data handling

## What type of PR is it?

Refactoring

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-245
https://issues.apache.org/jira/browse/HIVEMALL-171

## How was this patch tested?

unit tests, manual tests on EMR

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #198 from myui/HIVEMALL-245.

3 years agoFixed a documentation bug
Makoto Yui [Fri, 26 Jul 2019 07:33:22 +0000 (16:33 +0900)] 
Fixed a documentation bug

3 years agoAdd test of sparse input for randomforest classifier
Makoto Yui [Thu, 18 Jul 2019 07:51:33 +0000 (16:51 +0900)] 
Add test of sparse input for randomforest classifier

3 years agoFixed a minor typo in doc
Makoto Yui [Sat, 13 Jul 2019 14:45:52 +0000 (23:45 +0900)] 
Fixed a minor typo in doc

3 years agoAdded sanity checks for training data in RandomForest
Makoto Yui [Wed, 10 Jul 2019 07:17:20 +0000 (16:17 +0900)] 
Added sanity checks for training data in RandomForest

3 years agoRefactor Matrix module for NNZ and zero value handling
Makoto Yui [Wed, 10 Jul 2019 05:58:39 +0000 (14:58 +0900)] 
Refactor Matrix module for NNZ and zero value handling

## What changes were proposed in this pull request?

Refactor Matrix module for NNZ and zero value handling.

## What type of PR is it?

Hot Fix, Refactoring

## What is the Jira issue?

no JIRA issue

## How was this patch tested?

Unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #196 from myui/refactor_randomforest.

3 years agoFixed ToC
Makoto Yui [Fri, 28 Jun 2019 16:57:48 +0000 (01:57 +0900)] 
Fixed ToC

3 years agoAdded usage for feature_binning UDF
Makoto Yui [Fri, 28 Jun 2019 16:55:39 +0000 (01:55 +0900)] 
Added usage for feature_binning UDF

3 years agoFixed a doc
Makoto Yui [Fri, 28 Jun 2019 16:30:53 +0000 (01:30 +0900)] 
Fixed a doc

3 years agoFixed feature binning documentation
Makoto Yui [Fri, 28 Jun 2019 06:43:05 +0000 (15:43 +0900)] 
Fixed feature binning documentation

3 years ago[HIVEMALL-259][DOC] Refactor feature_binning UDF
Makoto Yui [Thu, 27 Jun 2019 18:02:38 +0000 (03:02 +0900)] 
[HIVEMALL-259][DOC] Refactor feature_binning UDF

## What changes were proposed in this pull request?

Refactor feature_binning UDF and update the function usage

## What type of PR is it?

Documentation, Refactoring

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-259

## How was this patch tested?

unit tests, manual tests on EMR

## How to use this feature?

```
WITH extracted as (
  select
    extract_feature(feature) as index,
    extract_weight(feature) as value
  from
    input l
    LATERAL VIEW explode(features) r as feature
),
mapping as (
  select
    index,
    build_bins(value, 5, true) as quantiles -- 5 bins with auto bin shrinking
  from
    extracted
  group by
    index
),
bins as (
   select
    to_map(index, quantiles) as quantiles
   from
    mapping
)
select
  l.features as original,
  feature_binning(l.features, r.quantiles) as features
from
  input l
  cross join bins r
```

see https://gist.github.com/myui/f943fa3ce1a7e1ac3f2dd9a7f9fa703b

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #195 from myui/HIVEMALL-259.

3 years agoFixed imports
Makoto Yui [Tue, 25 Jun 2019 12:52:12 +0000 (21:52 +0900)] 
Fixed imports

3 years ago[HIVEMALL-253-2] map_roulette UDF
Solodye [Tue, 25 Jun 2019 10:31:02 +0000 (19:31 +0900)] 
[HIVEMALL-253-2] map_roulette UDF

revise #192

Author: Makoto Yui <myui@apache.org>

Closes #193 from myui/HIVEMALL-253-2.

3 years ago[HIVEMALL-258] Add UDF to convert feature/label in Libsvm format
Makoto Yui [Thu, 20 Jun 2019 10:35:42 +0000 (19:35 +0900)] 
[HIVEMALL-258] Add UDF to convert feature/label in Libsvm format

## What changes were proposed in this pull request?

Add UDF to convert feature/label in Libsvm format

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-258

## How was this patch tested?

unit tests and manual tests

## How to use this feature?

```sql
Usage:
 select to_libsvm_format(array('apple:3.4','orange:2.1'))
 > 6284535:3.4 8104713:2.1
 select to_libsvm_format(array('apple:3.4','orange:2.1'), '-features 10')
 > 3:2.1 7:3.4
 select to_libsvm_format(array('7:3.4','3:2.1'), 5.0)
 > 5.0 3:2.1 7:3.4
```

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #194 from myui/libsvm.

3 years agoFixed a bug in document
Makoto Yui [Thu, 20 Jun 2019 07:09:16 +0000 (16:09 +0900)] 
Fixed a bug in document

3 years agoFixed the usage of min-max scaling and zscore
Makoto Yui [Wed, 19 Jun 2019 10:12:03 +0000 (19:12 +0900)] 
Fixed the usage of min-max scaling and zscore

3 years agoIncreased write buffer from 1MB to 2MB
Makoto Yui [Wed, 12 Jun 2019 08:27:24 +0000 (17:27 +0900)] 
Increased write buffer from 1MB to 2MB

3 years agoUpdate doc
Makoto Yui [Fri, 19 Apr 2019 07:16:32 +0000 (16:16 +0900)] 
Update doc

3 years ago[HIVEMALL-251] Add option to return PartOfSpeech information for tokenize_ja
Makoto Yui [Fri, 19 Apr 2019 07:04:01 +0000 (16:04 +0900)] 
[HIVEMALL-251] Add option to return PartOfSpeech information for tokenize_ja

## What changes were proposed in this pull request?

Add option to return PartOfSpeech information for `tokenize_ja` UDF.

## What type of PR is it?

Feature, Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-251

## How was this patch tested?

unit tests and manual tests on EMR

## How to use this feature?

```sql
WITH tmp as (
  select
    tokenize_ja('kuromojiを使った分かち書きのテストです。','-mode search -pos') as r
)
select
  r.tokens,
  r.pos,
  r.tokens[0] as token0,
  r.pos[0] as pos0
from
  tmp;
```

| tokens |pos | token0 | pos0 |
|:-:|:-:|:-:|:-:|
| ["kuromoji","使う","分かち書き","テスト"] | ["名詞-一般","動詞-自立","名詞-一般","名詞-サ変接続"] | kuromoji | 名詞-一般 |

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #191 from myui/HIVEMALL-251.

3 years ago[HIVEMALL-246] Add feature name validation in feature UDF
Makoto Yui [Sat, 13 Apr 2019 21:24:42 +0000 (06:24 +0900)] 
[HIVEMALL-246] Add feature name validation in feature UDF

## What changes were proposed in this pull request?

This PR adds feature name validation in feature UDF

feature(name, value) should validate name not to include ":". Fail-fast behavior is preferable.

## What type of PR is it?

Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-246

## How was this patch tested?

unit tests

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #190 from myui/HIVEMALL-246.

3 years ago[HIVEMALL-237-1] Add usage in ML function reference page
Makoto Yui [Sat, 13 Apr 2019 20:37:14 +0000 (05:37 +0900)] 
[HIVEMALL-237-1] Add usage in ML function reference page

## What changes were proposed in this pull request?

Add usage in ML function reference page

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-237

## How was this patch tested?

via CI

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?

Author: Makoto Yui <myui@apache.org>
Author: Makoto YUI <yuin405@gmail.com>

Closes #183 from myui/HIVEMALL-237.

3 years ago[HIVEMALL-248] UDF for Kuromoji stoptags
Makoto Yui [Sat, 13 Apr 2019 20:09:38 +0000 (05:09 +0900)] 
[HIVEMALL-248] UDF for Kuromoji stoptags

## What changes were proposed in this pull request?

In tokenize_ja, user need to provide stoptags that matched tokens removed from the token stream. So, stoptag is "exclusive" rule.

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-248

## How was this patch tested?

unit tests, functional test on EMR

## How to use this feature?

```sql
select tokenize_ja("kuromojiを使った分かち書きのテストです。", "normal", array("kuromoji"), stoptags_exclude(array("名詞")));
```
> ["分かち書き","テスト"]

`stoptags_exclude(array<string> tags, [, const string lang='ja'])` is a useful UDF for getting [stoptags](https://github.com/apache/lucene-solr/blob/master/lucene/analysis/kuromoji/src/resources/org/apache/lucene/analysis/ja/stoptags.txt) excluding given part-of-speech tags as seen below:

```sql
select stoptags_exclude(array("名詞-固有名詞"));
```
> ["その他","その他-間投","フィラー","副詞","副詞-一般","副詞-助詞類接続","助動詞","助詞","助詞-並立助詞"
,"助詞-係助詞","助詞-副助詞","助詞-副助詞/並立助詞/終助詞","助詞-副詞化","助詞-接続助詞","助詞-格助詞
","助詞-格助詞-一般","助詞-格助詞-引用","助詞-格助詞-連語","助詞-特殊","助詞-終助詞","助詞-連体化","助
詞-間投助詞","動詞","動詞-接尾","動詞-自立","動詞-非自立","名詞","名詞-サ変接続","名詞-ナイ形容詞語幹",
"名詞-一般","名詞-代名詞","名詞-代名詞-一般","名詞-代名詞-縮約","名詞-副詞可能","名詞-動詞非自立的","名
詞-引用文字列","名詞-形容動詞語幹","名詞-接尾","名詞-接尾-サ変接続","名詞-接尾-一般","名詞-接尾-人名","
名詞-接尾-副詞可能","名詞-接尾-助動詞語幹","名詞-接尾-助数詞","名詞-接尾-地域","名詞-接尾-形容動詞語幹"
,"名詞-接尾-特殊","名詞-接続詞的","名詞-数","名詞-特殊","名詞-特殊-助動詞語幹","名詞-非自立","名詞-非自
立-一般","名詞-非自立-副詞可能","名詞-非自立-助動詞語幹","名詞-非自立-形容動詞語幹","形容詞","形容詞-接
尾","形容詞-自立","形容詞-非自立","感動詞","接続詞","接頭詞","接頭詞-動詞接続","接頭詞-名詞接続","接頭
詞-形容詞接続","接頭詞-数接","未知語","記号","記号-アルファベット","記号-一般","記号-句点","記号-括弧閉
","記号-括弧開","記号-空白","記号-読点","語断片","連体詞","非言語音"]

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #189 from myui/HIVEMALL-248.

3 years ago[HIVEMALL-247][DOC] Recommend hive.optimize.cte.materialize.threshold=2 in Hive tunin...
Makoto Yui [Fri, 12 Apr 2019 07:02:17 +0000 (16:02 +0900)] 
[HIVEMALL-247][DOC] Recommend hive.optimize.cte.materialize.threshold=2 in Hive tuning tips

## What changes were proposed in this pull request?

Recommend `hive.optimize.cte.materialize.threshold=2` in Hive tuning tips

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-247

Author: Makoto Yui <myui@apache.org>

Closes #188 from myui/HIVEMALL-247.

3 years ago[HIVEMALL-250][DOC] Add tutorial for binarize_label
Makoto Yui [Fri, 12 Apr 2019 06:38:53 +0000 (15:38 +0900)] 
[HIVEMALL-250][DOC] Add tutorial for binarize_label

## What changes were proposed in this pull request?

Add tutorial for `binarize_label` UDTF

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-250

## How to use this feature?

as described in tutorial

Author: Makoto Yui <myui@apache.org>

Closes #187 from myui/HIVEMALL-250.

3 years agoAdded a unit test for PA regression
Makoto Yui [Mon, 25 Mar 2019 08:27:09 +0000 (17:27 +0900)] 
Added a unit test for PA regression

3 years agoFixed links
Makoto Yui [Mon, 18 Mar 2019 09:43:50 +0000 (18:43 +0900)] 
Fixed links

3 years agoworkaround for maven-project-info-reports-plugin erros on building site
Makoto Yui [Mon, 18 Mar 2019 09:37:04 +0000 (18:37 +0900)] 
workaround for maven-project-info-reports-plugin erros on building site

3 years agoUpdated scm tag
Makoto Yui [Mon, 18 Mar 2019 07:22:20 +0000 (16:22 +0900)] 
Updated scm tag

3 years agoExcluded JDK's tools.jar from Bytecode Version enforcer
Makoto Yui [Mon, 18 Mar 2019 06:40:35 +0000 (15:40 +0900)] 
Excluded JDK's tools.jar from Bytecode Version enforcer

3 years agoAdded Java API compatibility checks
Makoto Yui [Mon, 18 Mar 2019 05:51:58 +0000 (14:51 +0900)] 
Added Java API compatibility checks

3 years ago[HIVEMALL-242][HIVEMALL-241] Drop support for Spark 2.1 and Deprecate Java7 for packaging
Makoto Yui [Mon, 18 Mar 2019 05:14:14 +0000 (14:14 +0900)] 
[HIVEMALL-242][HIVEMALL-241] Drop support for Spark 2.1 and Deprecate Java7 for packaging

## What changes were proposed in this pull request?

- Drop support for Spark 2.1
- Require Java8 for packaging, deprecating Java7 (class file compatibility is Java7 or later)

Runtime Java compatibility: Java7 or later
Packaging/Compile-time Java compatibility: Java8 or later

## What type of PR is it?

Hot Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-242
https://issues.apache.org/jira/browse/HIVEMALL-241

## How was this patch tested?

unit tests, manual tests

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #186 from myui/drop-spark2_1.

3 years ago[HIVEMALL-243] Fix nominal variable handling in DecisionTree and RegressionTre
Makoto Yui [Wed, 13 Mar 2019 07:56:17 +0000 (16:56 +0900)] 
[HIVEMALL-243] Fix nominal variable handling in DecisionTree and RegressionTre

## What changes were proposed in this pull request?

For NOMINAL variable, the maximum attribute index 'm' is used for computing splits.

This cause performance issues for sparse nominal variables. So, revise this handling for a better performance.

https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/smile/classification/DecisionTree.java#L703

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-243

## How was this patch tested?

- [x] manual test on EMR

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #185 from myui/HIVEMALL-243.

3 years agoApplied refactoring
Makoto Yui [Thu, 21 Feb 2019 07:11:35 +0000 (16:11 +0900)] 
Applied refactoring

3 years agoApplied formatter
Makoto Yui [Thu, 21 Feb 2019 06:59:41 +0000 (15:59 +0900)] 
Applied formatter

3 years ago[HIVEMALL-238] Fixed from_json UDF to support top-level Map object
Makoto Yui [Thu, 21 Feb 2019 06:55:39 +0000 (15:55 +0900)] 
[HIVEMALL-238] Fixed from_json UDF to support top-level Map object

## What changes were proposed in this pull request?

Fixed from_json UDF to support top-level Map object

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-238

## How was this patch tested?

unit tests, manual tests

## How to use this feature?

```sql
select
  from_json(to_json(map('one',1,'two',2)), 'map<string,int>')
```

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #184 from myui/HIVEMALL-238.

3 years agoFixed scala test errors
Makoto Yui [Fri, 15 Feb 2019 06:10:56 +0000 (15:10 +0900)] 
Fixed scala test errors

3 years agoFixed CI error due to a bug in unit test
Makoto Yui [Thu, 14 Feb 2019 06:07:12 +0000 (15:07 +0900)] 
Fixed CI error due to a bug in unit test

3 years agoRefined tutorial documents
Makoto Yui [Fri, 8 Feb 2019 06:10:54 +0000 (15:10 +0900)] 
Refined tutorial documents

3 years agoApplied refactoring and documentation improvement
Makoto Yui [Fri, 8 Feb 2019 06:10:29 +0000 (15:10 +0900)] 
Applied refactoring and documentation improvement

3 years agoRenamed map_index UDF to map_get
Makoto Yui [Thu, 7 Feb 2019 06:12:39 +0000 (15:12 +0900)] 
Renamed map_index UDF to map_get

3 years agoAdded usages
Makoto Yui [Wed, 6 Feb 2019 08:16:24 +0000 (17:16 +0900)] 
Added usages

3 years agoModified to_string_array to be a generic UDF
Makoto Yui [Wed, 6 Feb 2019 08:15:47 +0000 (17:15 +0900)] 
Modified to_string_array to be a generic UDF

3 years ago[HIVEMALL-236] to_json/from_json cause KryoException/NullPointerException with ArrayL...
Makoto Yui [Tue, 5 Feb 2019 08:17:37 +0000 (17:17 +0900)] 
[HIVEMALL-236] to_json/from_json cause KryoException/NullPointerException with ArrayList due to Kryo bug

## What changes were proposed in this pull request?

Avoid NPE in Kryo serialization of List object created by `Arrays.asList`.

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-236

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #182 from myui/json_fix.

3 years ago[HIVEMALL-233-2] RandomForest regressor accepts sparse vector input
Takuya Kitazawa [Tue, 5 Feb 2019 04:55:55 +0000 (13:55 +0900)] 
[HIVEMALL-233-2] RandomForest regressor accepts sparse vector input

## What changes were proposed in this pull request?

Enable RandomForestRegressor to accept sparse vector input as RandomForestClassifier already does.

This closes #178

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-233

## How was this patch tested?

manual tests on EMR

## How to use this feature?

```sql
with customers as (
  select 1 as id, "male" as gender, 23 as age, "Japan" as country, 12 as num_purchases
  union all
  select 2 as id, "female" as gender, 43 as age, "US" as country, 4 as num_purchases
  union all
  select 3 as id, "other" as gender, 19 as age, "UK" as country, 2 as num_purchases
  union all
  select 4 as id, "male" as gender, 31 as age, "US" as country, 20 as num_purchases
  union all
  select 5 as id, "female" as gender, 37 as age, "Australia" as country, 9 as num_purchases
),
training as (
  select
    array_concat(
      quantitative_features(
        array("age"),
        age
      ),
      categorical_features(
        array("country", "gender"),
        country, gender
      )
    ) as features,
    num_purchases
  from
    customers
)
select
  train_randomforest_regressor(
    feature_hashing(features), -- feature vector
    num_purchases, -- target value
    '-trees 40 -seed 31' -- hyper-parameters
  )
from
  training
;
```

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Takuya Kitazawa <k.takuti@gmail.com>
Author: Makoto Yui <myui@apache.org>

Closes #181 from myui/HIVEMALL-233-2.

3 years ago[HIVEMALL-234] Define `EtaEstimator` default values as constants
Takuya Kitazawa [Wed, 30 Jan 2019 05:01:35 +0000 (14:01 +0900)] 
[HIVEMALL-234] Define `EtaEstimator` default values as constants

## What changes were proposed in this pull request?

Fix mismatched default values declared in `getOptions()` and `EtaEstimator`.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-234

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #179 from takuti/HIVEMALL-234.

3 years ago[HIVEMALL-235] Fix a bug in expansion of array where size is zero
Makoto Yui [Wed, 30 Jan 2019 04:50:25 +0000 (13:50 +0900)] 
[HIVEMALL-235] Fix a bug in expansion of array where size is zero

## What changes were proposed in this pull request?

Fix a bug in expansion of array where size is zero.

See for detail
https://github.com/apache/incubator-hivemall/pull/178/commits/d7695d461056b21eab25465e015c582edc2b57ce

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-235

## How was this patch tested?

unit tests

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #180 from myui/HIVEMALL-235.

3 years ago[HIVEMALL-232][DOC] Fix typo in the Top-K document
Kengo Seki [Thu, 10 Jan 2019 18:33:49 +0000 (03:33 +0900)] 
[HIVEMALL-232][DOC] Fix typo in the Top-K document

## What changes were proposed in this pull request?

`DISTRIBUTE BY x CLASS SORT BY x` in the Top-K document looks like a typo, so fixing it.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-232

## How was this patch tested?

I think no test is needed since it's just a minor documentation fix.

Author: Kengo Seki <sekikn@apache.org>

Closes #177 from sekikn/HIVEMALL-232.

3 years agoFixed to update generic_func.md properly
Makoto Yui [Wed, 9 Jan 2019 07:00:53 +0000 (16:00 +0900)] 
Fixed to update generic_func.md properly

3 years ago[HIVEMALL-231] Replaced subarray UDF implementation with SubarrayUDF
Makoto Yui [Tue, 8 Jan 2019 11:02:07 +0000 (20:02 +0900)] 
[HIVEMALL-231] Replaced subarray UDF implementation with SubarrayUDF

## What changes were proposed in this pull request?

Replaced subarray UDF implementation with SubarrayUDF for backward compatibility.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-231

## How was this patch tested?

manual tests on EMR

## How to use this feature?

To be described in [userguide](http://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html#array).

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #176 from myui/subarray.

3 years agoMoved git repos to Gitbox
Makoto Yui [Tue, 8 Jan 2019 06:21:59 +0000 (15:21 +0900)] 
Moved git repos to Gitbox

3 years ago[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example
Makoto Yui [Wed, 26 Dec 2018 10:15:43 +0000 (19:15 +0900)] 
[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example

## What changes were proposed in this pull request?

Refine user guide for generic classifier/regressor and so on.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-214

## How to use this feature?

See user guide.

Author: Makoto Yui <myui@apache.org>

Closes #159 from myui/HIVEMALL-214.

3 years ago[HIVEMALL-230] Revise Optimizer Implementation
Makoto Yui [Wed, 26 Dec 2018 10:14:23 +0000 (19:14 +0900)] 
[HIVEMALL-230] Revise Optimizer Implementation

## What changes were proposed in this pull request?

Revise Optimizer implementation.

1. Revise default hyperparameters of AdaDelta and Adam.
2. Support AdamW, Amsgrad, AdamHD, Eve, and YellowFin optimizer.

- [x] Nesterov’s Accelerated Gradient
https://arxiv.org/abs/1212.0901
- [x] Rmsprop
Geoffrey Hinton, Nitish Srivastava, Kevin Swersky. 2014. Lecture 6e: Rmsprop: Divide the gradient by a running average of its recent magnitude
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
- [x] RMSpropGraves - Generating Sequences With Recurrent Neural Networks
https://arxiv.org/abs/1308.0850
- [x] Fixing Weight Decay Regularization in Adam
https://openreview.net/forum?id=rk6qdGgCZ
- [x] On the Convergence of Adam and Beyond
https://openreview.net/forum?id=ryQu7f-RZ
- [x] AdamHD (Adam with Hypergradient descent)
https://arxiv.org/pdf/1703.04782.pdf
- [x] Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates
https://openreview.net/forum?id=r1WUqIceg
- [x] nadam: Adam with Nesterov momentum
https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ
http://cs229.stanford.edu/proj2015/054_report.pdf
http://www.cs.toronto.edu/~fritz/absps/momentum.pdf
- [ ] ~YellowFin and the Art of Momentum Tuning~
https://openreview.net/forum?id=SyrGJYlRZ

## What type of PR is it?

Improvement, Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-230

## How was this patch tested?

unit tests, emr

## How to use this feature?

Described in [tutorial](http://hivemall.incubator.apache.org/userguide/index.html)

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #175 from myui/adam_test.

3 years agoFixed ANN message and download page
Makoto Yui [Tue, 4 Dec 2018 07:13:25 +0000 (16:13 +0900)] 
Fixed ANN message and download page

3 years agoUpdate the project top page
Makoto Yui [Mon, 3 Dec 2018 09:27:36 +0000 (18:27 +0900)] 
Update the project top page

3 years agoUpdated release history
Makoto Yui [Mon, 3 Dec 2018 09:03:02 +0000 (18:03 +0900)] 
Updated release history

3 years agoMerge remote-tracking branch 'origin/v0.5.2'
Makoto Yui [Mon, 3 Dec 2018 07:32:03 +0000 (16:32 +0900)] 
Merge remote-tracking branch 'origin/v0.5.2'

3 years ago[DOC] Added workaround for a Surefire error
Makoto Yui [Wed, 21 Nov 2018 06:11:29 +0000 (15:11 +0900)] 
[DOC] Added workaround for a Surefire error