incubator-hivemall.git
5 days ago[HIVEMALL-232][DOC] Fix typo in the Top-K document master
Kengo Seki [Thu, 10 Jan 2019 18:33:49 +0000 (03:33 +0900)] 
[HIVEMALL-232][DOC] Fix typo in the Top-K document

## What changes were proposed in this pull request?

`DISTRIBUTE BY x CLASS SORT BY x` in the Top-K document looks like a typo, so fixing it.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-232

## How was this patch tested?

I think no test is needed since it's just a minor documentation fix.

Author: Kengo Seki <sekikn@apache.org>

Closes #177 from sekikn/HIVEMALL-232.

6 days agoFixed to update generic_func.md properly
Makoto Yui [Wed, 9 Jan 2019 07:00:53 +0000 (16:00 +0900)] 
Fixed to update generic_func.md properly

7 days ago[HIVEMALL-231] Replaced subarray UDF implementation with SubarrayUDF
Makoto Yui [Tue, 8 Jan 2019 11:02:07 +0000 (20:02 +0900)] 
[HIVEMALL-231] Replaced subarray UDF implementation with SubarrayUDF

## What changes were proposed in this pull request?

Replaced subarray UDF implementation with SubarrayUDF for backward compatibility.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-231

## How was this patch tested?

manual tests on EMR

## How to use this feature?

To be described in [userguide](http://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html#array).

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #176 from myui/subarray.

7 days agoMoved git repos to Gitbox
Makoto Yui [Tue, 8 Jan 2019 06:21:59 +0000 (15:21 +0900)] 
Moved git repos to Gitbox

2 weeks ago[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example
Makoto Yui [Wed, 26 Dec 2018 10:15:43 +0000 (19:15 +0900)] 
[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example

## What changes were proposed in this pull request?

Refine user guide for generic classifier/regressor and so on.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-214

## How to use this feature?

See user guide.

Author: Makoto Yui <myui@apache.org>

Closes #159 from myui/HIVEMALL-214.

2 weeks ago[HIVEMALL-230] Revise Optimizer Implementation
Makoto Yui [Wed, 26 Dec 2018 10:14:23 +0000 (19:14 +0900)] 
[HIVEMALL-230] Revise Optimizer Implementation

## What changes were proposed in this pull request?

Revise Optimizer implementation.

1. Revise default hyperparameters of AdaDelta and Adam.
2. Support AdamW, Amsgrad, AdamHD, Eve, and YellowFin optimizer.

- [x] Nesterov’s Accelerated Gradient
https://arxiv.org/abs/1212.0901
- [x] Rmsprop
Geoffrey Hinton, Nitish Srivastava, Kevin Swersky. 2014. Lecture 6e: Rmsprop: Divide the gradient by a running average of its recent magnitude
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
- [x] RMSpropGraves - Generating Sequences With Recurrent Neural Networks
https://arxiv.org/abs/1308.0850
- [x] Fixing Weight Decay Regularization in Adam
https://openreview.net/forum?id=rk6qdGgCZ
- [x] On the Convergence of Adam and Beyond
https://openreview.net/forum?id=ryQu7f-RZ
- [x] AdamHD (Adam with Hypergradient descent)
https://arxiv.org/pdf/1703.04782.pdf
- [x] Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates
https://openreview.net/forum?id=r1WUqIceg
- [x] nadam: Adam with Nesterov momentum
https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ
http://cs229.stanford.edu/proj2015/054_report.pdf
http://www.cs.toronto.edu/~fritz/absps/momentum.pdf
- [ ] ~YellowFin and the Art of Momentum Tuning~
https://openreview.net/forum?id=SyrGJYlRZ

## What type of PR is it?

Improvement, Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-230

## How was this patch tested?

unit tests, emr

## How to use this feature?

Described in [tutorial](http://hivemall.incubator.apache.org/userguide/index.html)

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #175 from myui/adam_test.

6 weeks agoFixed ANN message and download page
Makoto Yui [Tue, 4 Dec 2018 07:13:25 +0000 (16:13 +0900)] 
Fixed ANN message and download page

6 weeks agoUpdate the project top page
Makoto Yui [Mon, 3 Dec 2018 09:27:36 +0000 (18:27 +0900)] 
Update the project top page

6 weeks agoUpdated release history
Makoto Yui [Mon, 3 Dec 2018 09:03:02 +0000 (18:03 +0900)] 
Updated release history

6 weeks agoMerge remote-tracking branch 'origin/v0.5.2'
Makoto Yui [Mon, 3 Dec 2018 07:32:03 +0000 (16:32 +0900)] 
Merge remote-tracking branch 'origin/v0.5.2'

7 weeks ago[DOC] Added workaround for a Surefire error
Makoto Yui [Wed, 21 Nov 2018 06:11:29 +0000 (15:11 +0900)] 
[DOC] Added workaround for a Surefire error

8 weeks ago[HIVEMALL-227-2] Updated release guide to use SHA-512
Makoto Yui [Mon, 19 Nov 2018 10:29:28 +0000 (19:29 +0900)] 
[HIVEMALL-227-2] Updated release guide to use SHA-512

8 weeks ago[maven-release-plugin] prepare for next development iteration v0.5.2
Makoto Yui [Mon, 19 Nov 2018 08:44:42 +0000 (17:44 +0900)] 
[maven-release-plugin] prepare for next development iteration

8 weeks ago[maven-release-plugin] prepare release v0.5.2-rc2 v0.5.2 v0.5.2-rc2
Makoto Yui [Mon, 19 Nov 2018 08:44:31 +0000 (17:44 +0900)] 
[maven-release-plugin] prepare release v0.5.2-rc2

8 weeks agoBumped up ASF parent pom version to 21 to use SHA-512 instead of SHA-1
Makoto Yui [Mon, 19 Nov 2018 08:34:02 +0000 (17:34 +0900)] 
Bumped up ASF parent pom version to 21 to use SHA-512 instead of SHA-1

2 months ago[HIVEMALL-227][DOC] Removed md5 and replace sha1 with sha512 following new ASF policy
Makoto Yui [Thu, 15 Nov 2018 09:39:58 +0000 (18:39 +0900)] 
[HIVEMALL-227][DOC] Removed md5 and replace sha1 with sha512 following new ASF policy

## What changes were proposed in this pull request?

Removed md5 and replace sha1 with sha512 following new ASF policy

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-227

Author: Makoto Yui <myui@apache.org>

Closes #173 from myui/HIVEMALL-227.

2 months agoBumped version string to 0.5.2-incubating
Makoto Yui [Thu, 15 Nov 2018 06:54:44 +0000 (15:54 +0900)] 
Bumped version string to 0.5.2-incubating

2 months agoPrepare for the next Snapshot release of v0.5.2
Makoto Yui [Thu, 15 Nov 2018 06:16:28 +0000 (15:16 +0900)] 
Prepare for the next Snapshot release of v0.5.2

2 months ago[SPARK][HOTFIX] Fix the existing test failures in spark-2.3
Takeshi Yamamuro [Wed, 14 Nov 2018 17:33:01 +0000 (02:33 +0900)] 
[SPARK][HOTFIX] Fix the existing test failures in spark-2.3

## What changes were proposed in this pull request?
This pr is to fix the test failures for spark-2.3.

## How was this patch tested?
Run the existing tests.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #171 from maropu/HOTFIX-20181114.

2 months agoFix typo
Vladimir Kroz [Wed, 14 Nov 2018 06:25:19 +0000 (15:25 +0900)] 
Fix typo

## What changes were proposed in this pull request?

Fix minor typo in documentation

## What type of PR is it?

Documentation

## What is the Jira issue?

n/a

## How was this patch tested?

n/a

## How to use this feature?

n/a

## Checklist

n/a

Author: Vladimir Kroz <vkroz@users.noreply.github.com>

Closes #172 from vkroz/patch-1.

2 months agoFixed tutorial docs
Makoto Yui [Tue, 13 Nov 2018 09:29:07 +0000 (18:29 +0900)] 
Fixed tutorial docs

2 months ago[HIVEMALL-223] Add -kv_map and -vk_map option to to_ordered_list UDAF
Makoto Yui [Tue, 13 Nov 2018 09:18:35 +0000 (18:18 +0900)] 
[HIVEMALL-223] Add -kv_map and -vk_map option to to_ordered_list UDAF

## What changes were proposed in this pull request?

Add `-kv_map` and `-vk_map` option to `to_ordered_list` UDAF.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-223

## How was this patch tested?

unit tests and manual tests on EMR

## How to use this feature?

Will be described in
http://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html#array

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #170 from myui/HIVEMALL-223.

2 months agoFixed Travis CI bug [[: not found
Makoto Yui [Fri, 9 Nov 2018 06:39:35 +0000 (15:39 +0900)] 
Fixed Travis CI bug [[: not found

2 months agoFixed scalatest version used for Spark 2.3 to avoid scalatest version conflict
Makoto Yui [Wed, 7 Nov 2018 17:47:50 +0000 (02:47 +0900)] 
Fixed scalatest version used for Spark 2.3 to avoid scalatest version conflict

2 months agoFixed release guide for MAVEN_OPT
Makoto Yui [Wed, 7 Nov 2018 17:37:17 +0000 (02:37 +0900)] 
Fixed release guide for MAVEN_OPT

2 months agoUpdated Netty version to cope with NoSuchMethodError PooledByteBufAllocator.metric...
Makoto Yui [Wed, 7 Nov 2018 10:24:24 +0000 (19:24 +0900)] 
Updated Netty version to cope with NoSuchMethodError PooledByteBufAllocator.metric() for Spark v2.3

2 months agoFixed a bug introduced in the previous commit
Makoto Yui [Wed, 7 Nov 2018 09:42:09 +0000 (18:42 +0900)] 
Fixed a bug introduced in the previous commit

2 months agoRemoved unknown host
Makoto Yui [Wed, 7 Nov 2018 09:36:05 +0000 (18:36 +0900)] 
Removed unknown host

2 months agoFixed scala test for subarray UDF misusage
Makoto Yui [Wed, 7 Nov 2018 07:36:49 +0000 (16:36 +0900)] 
Fixed scala test for subarray UDF misusage

2 months agoFixed GeneralRegressorUDTFTest to cope with behavioral change where dloss is zero
Makoto Yui [Wed, 7 Nov 2018 06:41:24 +0000 (15:41 +0900)] 
Fixed GeneralRegressorUDTFTest to cope with behavioral change where dloss is zero

2 months agoFixed a bug in ArrayFlattenUDFTest
Makoto Yui [Tue, 6 Nov 2018 14:12:47 +0000 (23:12 +0900)] 
Fixed a bug in ArrayFlattenUDFTest

2 months agoFixed a possible Json deserialize bug caused by illegal Text use
Makoto Yui [Tue, 6 Nov 2018 10:42:28 +0000 (19:42 +0900)] 
Fixed a possible Json deserialize bug caused by illegal Text use

2 months agoFixed failing test
Makoto Yui [Tue, 6 Nov 2018 10:41:15 +0000 (19:41 +0900)] 
Fixed failing test

2 months agoUpdated release guide for SSL related workaround
Makoto Yui [Mon, 5 Nov 2018 08:51:13 +0000 (17:51 +0900)] 
Updated release guide for SSL related workaround

2 months agoAdded missing license header
Makoto Yui [Sat, 3 Nov 2018 07:54:04 +0000 (16:54 +0900)] 
Added missing license header

2 months agoAdded Koji to the Mentor list
Makoto Yui [Sat, 3 Nov 2018 07:46:08 +0000 (16:46 +0900)] 
Added Koji to the Mentor list

2 months agoFixed term vector space tutorial
Makoto Yui [Sat, 3 Nov 2018 07:38:47 +0000 (16:38 +0900)] 
Fixed term vector space tutorial

2 months agoFixed bm25() UDF for help message
Makoto Yui [Sat, 3 Nov 2018 07:38:13 +0000 (16:38 +0900)] 
Fixed bm25() UDF for help message

2 months ago[HIVEMALL-196] Support BM25 scoring
Jackson Huang [Fri, 2 Nov 2018 10:35:13 +0000 (19:35 +0900)] 
[HIVEMALL-196] Support BM25 scoring

## What changes were proposed in this pull request?

Adding scoring function Okapi BM25 as a UDF

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/projects/HIVEMALL/issues/HIVEMALL-196

## How was this patch tested?

1. Unit testing
2. Manual testing on Hive

## How to use this feature?
This new `okapi_bm25` function requires 5 mandatory arguments and 2 optional hyperparameters:

1. raw frequency count of a term in a given document
2. length of the given document
3. average length of a document in the corpus
4. number of documents in the corpus
5. number of documents containing the term, i.e. document frequency
6. (*optional*) k1 - a smoothing hyperparameter
7. (*optional*) b - a smoothing hyperparameter

### Step 1: Count frequency of terms
```sql
create or replace view frequency
as
select
  docid,
  word,
  count(*) as freq
from
  test_corpus_exploded
group by
  docid,
  word
;
```

### Step 2: Calculate document lengths
```sql
create or replace view doc_len
as
select
  docid, count(1) as cnt
from
  test_corpus_exploded
group by
  docid
;
```

### Step 3: Calculate document frequency
```sql
create or replace view document_frequency
as
select
  word,
  count(distinct docid) docs
from
  test_corpus_exploded
group by
  word
;
```

### Step 4: Set number of documents
```sql
set hivevar:n_docs=3;
```

### Step 5: Use `okapi_bm25`
```sql
create or replace view bm25
as
with tmp as (
select avg(cnt) as avgdl from doc_len
)
select
  f.docid,
  f.word,
  okapi_bm25(
    CAST(f.freq AS INT),
    dl.cnt,
    CAST(tmp.avgdl AS DOUBLE),
    ${n_docs},
    df.docs,
    '-k1 1.5 -b 0.75'
  ) as score
from frequency     f
JOIN document_frequency df ON (f.word = df.word)
JOIN doc_len            dl ON (f.docid = dl.docid)
CROSS JOIN tmp
ORDER BY
score desc;
```

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Jackson Huang <huang.j@treasure-data.com>
Author: Makoto Yui <myui@apache.org>

Closes #163 from jaxony/feature/bm25.

2 months ago[HIVEMALL-222] Introduce Gradient Clipping to avoid exploding gradient to General...
Makoto Yui [Wed, 24 Oct 2018 08:20:56 +0000 (17:20 +0900)] 
[HIVEMALL-222] Introduce Gradient Clipping to avoid exploding gradient to General Classifier/Regressor

## What changes were proposed in this pull request?

Avoid [exploding gradients](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/readings/L15%20Exploding%20and%20Vanishing%20Gradients.pdf) by gradient clipping (by value)

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-222

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #169 from myui/clipping.

3 months agoRemoved unnecessary comment
Makoto Yui [Thu, 11 Oct 2018 08:13:50 +0000 (17:13 +0900)] 
Removed unnecessary comment

3 months agoTiny optimization for PassThrough regularization
Makoto Yui [Fri, 21 Sep 2018 07:52:47 +0000 (16:52 +0900)] 
Tiny optimization for PassThrough regularization

3 months agoStatic method should be called static way
Makoto Yui [Tue, 18 Sep 2018 10:51:42 +0000 (19:51 +0900)] 
Static method should be called static way

3 months ago[HIVEMALL-219] Fixed LDA bug for single update
Makoto Yui [Tue, 18 Sep 2018 10:46:18 +0000 (19:46 +0900)] 
[HIVEMALL-219] Fixed LDA bug for single update

## What changes were proposed in this pull request?

Fixed LDA bug for single update and added unit tests

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-219

## How was this patch tested?

unit tests and manual tests on EMR

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #166 from myui/HIVEMALL-219-2.

3 months ago[HIVEMALL-219][BUGFIX] Fixed NPE in finalizeTraining()
Makoto Yui [Tue, 18 Sep 2018 09:51:33 +0000 (18:51 +0900)] 
[HIVEMALL-219][BUGFIX] Fixed NPE in finalizeTraining()

## What changes were proposed in this pull request?

Fixed NPE in finalizeTraining() where there are no training example

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-219

## How was this patch tested?

to appear

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #165 from myui/HIVEMALL-219.

3 months agoUpdated mentor list
Makoto Yui [Tue, 18 Sep 2018 05:52:09 +0000 (14:52 +0900)] 
Updated mentor list

4 months ago[HIVEMALL-218] Fixed train_lda NPE where input row is null
Makoto Yui [Fri, 7 Sep 2018 10:19:35 +0000 (19:19 +0900)] 
[HIVEMALL-218] Fixed train_lda NPE where input row is null

## What changes were proposed in this pull request?

Fixed NegativeArraySizeException where input is NULL of `train_lda`

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-218

## How was this patch tested?

manual tests

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #164 from myui/HIVEMALL-218.

4 months ago[HIVEMALL-217] Resolve missing links for user manual
Aki Ariga [Thu, 6 Sep 2018 09:46:56 +0000 (18:46 +0900)] 
[HIVEMALL-217] Resolve missing links for user manual

## What changes were proposed in this pull request?

Fix missing links and unintended redirection on the document.
- Resolve unintended redirects
- Use https insted of http if possible
- Add instruction for KDD Cup 2012 evaluation code

There are still known issues required to be fixed:
- Change of Kaggle documents loses good refere for [Log Loss](https://www.kaggle.com/wiki/LogarithmicLoss) and [Metrics](https://www.kaggle.com/wiki/Metrics) for evaluation for regression page
- Due to version up of EMR version, [this link](https://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-ami.html) redirects unintended page
- Mix server related documents are outdated; tips/mixserver, tips/hadoop_tuning

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/projects/HIVEMALL/issues/HIVEMALL-217

## How was this patch tested?

manual tests

Author: Aki Ariga <ariga@treasure-data.com>

Closes #162 from chezou/resolve-missinglink.

4 months ago[HIVEMALL-163] Add IS_INFINITE, IS_FINITE, IS_NAN functions
Aki Ariga [Tue, 4 Sep 2018 06:59:17 +0000 (15:59 +0900)] 
[HIVEMALL-163] Add IS_INFINITE, IS_FINITE, IS_NAN functions

## What changes were proposed in this pull request?

Add Floating point functions: infinity, is_finite, is_infinite, is_nan, nan

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-163

## How was this patch tested?

Unit tests

## How to use this feature?

```sql
select is_infinite(infinity());
select is_infinite(1.0);

select is_finite(infinity());
select is_finite(1.0);

select nan();

select is_nan(nan());
select is_nan(10.0);
```

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Aki Ariga <ariga@treasure-data.com>

Closes #160 from chezou/HIVEMALL-163.

4 months ago[HIVEMALL-216] Fix Docker image based on openjdk 8
Aki Ariga [Tue, 4 Sep 2018 06:30:14 +0000 (15:30 +0900)] 
[HIVEMALL-216] Fix Docker image based on openjdk 8

## What changes were proposed in this pull request?

This PR fixes building Docker image from Docker file.

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-216

## How was this patch tested?

manual tests

## How to use this feature?

See [documentation](https://hivemall.incubator.apache.org/userguide/docker/getting_started.html)

Author: Aki Ariga <ariga@treasure-data.com>

Closes #161 from chezou/fix-dockerfile.

4 months ago[HIVEMALL-215] Add step-by-step tutorial on Supervised Learning
Aki Ariga [Fri, 31 Aug 2018 06:01:48 +0000 (15:01 +0900)] 
[HIVEMALL-215] Add step-by-step tutorial on Supervised Learning

## What changes were proposed in this pull request?

In this PR, step by step tutorial is going to be introduced.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-215

Author: Aki Ariga <ariga@treasure-data.com>

Closes #158 from chezou/tutorial.

4 months agoRe-organized logo files
Makoto Yui [Wed, 29 Aug 2018 09:57:02 +0000 (18:57 +0900)] 
Re-organized logo files

4 months agoApplied formatter
Makoto Yui [Tue, 28 Aug 2018 15:48:46 +0000 (00:48 +0900)] 
Applied formatter

4 months ago[HIVEMALL-212] Fix Classifier/Regressor not to forward zero weighted values
Makoto Yui [Tue, 28 Aug 2018 15:42:45 +0000 (00:42 +0900)] 
[HIVEMALL-212] Fix Classifier/Regressor not to forward zero weighted values

## What changes were proposed in this pull request?

Feature with weight = 0.0  need not to be saved in the prediction model. It is preferable to reduce the size of prediction model. So, this PR fixes Classifier/Regressor not to forward zero weighted values

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-212

## How was this patch tested?

unit tests and manual tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #157 from myui/HIVEMALL-212.

4 months ago[HIVEMALL-211][BUGFIX] Fixed Optimizer for regularization updates
Makoto Yui [Fri, 24 Aug 2018 09:44:40 +0000 (18:44 +0900)] 
[HIVEMALL-211][BUGFIX] Fixed Optimizer for regularization updates

## What changes were proposed in this pull request?

This PR fixes a bug of regularization scheme of Optimizer.

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-211

## How was this patch tested?

unit tests, manual tests on EMR

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #156 from myui/HIVEMALL-211.

4 months ago[HIVEMALL-201] Evaluate, fix and document FFM
Takuya Kitazawa [Thu, 23 Aug 2018 11:05:04 +0000 (20:05 +0900)] 
[HIVEMALL-201] Evaluate, fix and document FFM

## What changes were proposed in this pull request?

Applied some refactoring to #149
This PR closes #149

## What type of PR is it?

Hot Fix, Refactoring

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-201

## How was this patch tested?

unit tests, manual tests

## How to use this feature?

Will be published at: http://hivemall.incubator.apache.org/userguide/binaryclass/criteo_ffm.html

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Takuya Kitazawa <k.takuti@gmail.com>
Author: Makoto Yui <myui@apache.org>

Closes #155 from myui/HIVEMALL-201-2.

5 months ago[HIVEMALL-210][BUGFIX] Fix a bug in lda_predict/plsa_predict
Makoto Yui [Mon, 6 Aug 2018 07:42:20 +0000 (16:42 +0900)] 
[HIVEMALL-210][BUGFIX] Fix a bug in lda_predict/plsa_predict

## What changes were proposed in this pull request?

Fixed a bug in lda_predict/plsa_predict that duplicated term probability is [unexpectedly replaced](https://github.com/apache/incubator-hivemall/blame/a8a97d6e873d5a8a30b06f92ddc14d1ec95c2738/core/src/main/java/hivemall/topicmodel/LDAPredictUDAF.java#L396)

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-210

## How was this patch tested?

unit tests and manual tests

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #154 from myui/HIVEMALL-210.

6 months ago[HIVEMALL-208] Upgrade to Lucene 5.5.5
iijima_satoshi [Thu, 5 Jul 2018 09:05:45 +0000 (18:05 +0900)] 
[HIVEMALL-208] Upgrade to Lucene 5.5.5

## What changes were proposed in this pull request?
tokenize_ja failed to analyze certain Japanese strings
This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.

## What type of PR is it?
Bug Fix

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-208

## How was this patch tested?
unit tests

## Checklist
- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?

Author: iijima_satoshi <iijima_satoshi@cyberagent.co.jp>

Closes #153 from iijima-satoshi/upgrade-lucene.

6 months ago[HIVEMALL-207] Remove ddl/*.td.hql files maintained for a specific company's use
Takuya Kitazawa [Fri, 22 Jun 2018 02:45:12 +0000 (11:45 +0900)] 
[HIVEMALL-207] Remove ddl/*.td.hql files maintained for a specific company's use

## What changes were proposed in this pull request?

Remove `resources/ddl/*.td.hql` files. This public OSS should not host such company-specific files.

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-207

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #152 from takuti/HIVEMALL-207.

7 months agoFixed to include relocated HCatalog in hivemall-all.jar
Makoto Yui [Thu, 14 Jun 2018 05:45:26 +0000 (14:45 +0900)] 
Fixed to include relocated HCatalog in hivemall-all.jar

7 months ago[HIVEMALL-203] Relocated org.codehaus.jackson to hivemall.codehause.jackson in hivema...
Makoto Yui [Mon, 11 Jun 2018 07:22:00 +0000 (16:22 +0900)] 
[HIVEMALL-203] Relocated org.codehaus.jackson to hivemall.codehause.jackson in hivemall-all.jar

## What changes were proposed in this pull request?

Relocated `org.codehaus.jackson` to `hivemall.codehause.jackson` in hivemall-all.jar because Jackson can be missing in some Hadoop/Hive enviroment

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-203

## How was this patch tested?

manual tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Makoto Yui <myui@apache.org>

Closes #151 from myui/relocate_jackson.

7 months ago[HIVEMALL-145] Merge Brickhouse functions
Makoto Yui [Wed, 6 Jun 2018 09:09:17 +0000 (18:09 +0900)] 
[HIVEMALL-145] Merge Brickhouse functions

## What changes were proposed in this pull request?

Merge [brickhouse](https://github.com/klout/brickhouse) functions.

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-145

## How was this patch tested?

unit tests and manual tests

## How to use this feature?

as described in [user guide](http://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html).

## Checklist

- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?
- [x] Invite active/main Brickhouse developers as Hivemall PPMC members or committers.
https://github.com/klout/brickhouse/issues/149
- [x] +1 from Klout members to merge

Author: Makoto Yui <myui@apache.org>

Closes #135 from myui/merge_brickhouse.

7 months agoupdate conv.awk location
nono [Thu, 24 May 2018 09:58:17 +0000 (18:58 +0900)] 
update conv.awk location

http 404 before correction

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## What type of PR is it?

[Bug Fix | Improvement | Feature | Documentation | Hot Fix | Refactoring]

## What is the Jira issue?

(Put link here and add [HIVEMALL-*Jira number*] in PR title, e.g., [HIVEMALL-533])

## How was this patch tested?

(Please explain how this patch was tested. e.g., unit tests, integration tests, manual tests)

## How to use this feature?

(Please remove this section if not needed)

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: nono <aschor@users.noreply.github.com>

Closes #150 from aschor/patch-1.

7 months agoUpdate GitHub PR template for code formatter
Takuya Kitazawa [Thu, 24 May 2018 00:34:45 +0000 (09:34 +0900)] 
Update GitHub PR template for code formatter

8 months agoRequest contributers to use ./bin/format_code.sh
Takuya Kitazawa [Wed, 16 May 2018 08:51:36 +0000 (17:51 +0900)] 
Request contributers to use ./bin/format_code.sh

8 months agoUpdate README according to the change of formatter
Takuya Kitazawa [Wed, 16 May 2018 08:43:18 +0000 (17:43 +0900)] 
Update README according to the change of formatter

8 months agoRemoved unnessesary entry
Makoto Yui [Sat, 28 Apr 2018 01:52:05 +0000 (10:52 +0900)] 
Removed unnessesary entry

8 months agoRefactored to remove IDE warnings
Makoto Yui [Sat, 28 Apr 2018 01:36:10 +0000 (10:36 +0900)] 
Refactored to remove IDE warnings

8 months agoApplied formatter
Makoto Yui [Fri, 27 Apr 2018 06:36:44 +0000 (15:36 +0900)] 
Applied formatter

8 months agoChanged to use spotless-maven-plugin for the formatter
Makoto Yui [Fri, 27 Apr 2018 06:28:30 +0000 (15:28 +0900)] 
Changed to use spotless-maven-plugin for the formatter

8 months agoReverted formatter-maven-plugin version to 0.5.2
Makoto Yui [Fri, 27 Apr 2018 03:20:54 +0000 (12:20 +0900)] 
Reverted formatter-maven-plugin version to 0.5.2

8 months agoRemoved warnings about duplicate entry
Makoto Yui [Fri, 27 Apr 2018 02:52:06 +0000 (11:52 +0900)] 
Removed warnings about duplicate entry

8 months agoFixed formatting scheme for multi-module project
Makoto Yui [Thu, 26 Apr 2018 16:14:15 +0000 (01:14 +0900)] 
Fixed formatting scheme for multi-module project

8 months agoApplied refactoring for #145
Makoto Yui [Thu, 26 Apr 2018 06:46:07 +0000 (15:46 +0900)] 
Applied refactoring for #145

8 months agoRenamed a unit test class name
Makoto Yui [Thu, 26 Apr 2018 05:52:53 +0000 (14:52 +0900)] 
Renamed a unit test class name

8 months agoMake size of incubator logo smaller
Takuya Kitazawa [Thu, 26 Apr 2018 02:52:06 +0000 (11:52 +0900)] 
Make size of incubator logo smaller

8 months agoFix incubator logo at the bottom of site
Takuya Kitazawa [Thu, 26 Apr 2018 02:38:46 +0000 (11:38 +0900)] 
Fix incubator logo at the bottom of site

8 months ago[HIVEMALL-191] Add Kryo serialization test to existing workaround code
Takuya Kitazawa [Wed, 25 Apr 2018 08:04:02 +0000 (17:04 +0900)] 
[HIVEMALL-191] Add Kryo serialization test to existing workaround code

## What changes were proposed in this pull request?

Add Kryo serialization test to existing workaround code as: https://github.com/apache/incubator-hivemall/commit/f6765dff7be67e1a3327709bbb9bfdc6eba7b97f

To be more precise, currently two UDFs `quantified_features` and `tokenize_ja` explicitly have the workaround lazy instantiation code. So, this PR makes their `transient` keyword unnecessary.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-191

## How was this patch tested?

Added some unit tests, and manually tested as well

## Checklist

- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #145 from takuti/HIVEMALL-191.

8 months ago[HIVEMALL-193] Implement a tool for generating a list of Hivemall UDFs
Takuya Kitazawa [Wed, 25 Apr 2018 07:46:01 +0000 (16:46 +0900)] 
[HIVEMALL-193] Implement a tool for generating a list of Hivemall UDFs

## What changes were proposed in this pull request?

Automatically generate a list of UDFs for:

- https://hivemall.incubator.apache.org/userguide/misc/funcs.html
- https://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html

Initial mock implementation: https://github.com/takuti/hivemalldoc

## What type of PR is it?

Improvement, Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-193

## How was this patch tested?

See output: https://gist.github.com/takuti/312d3a11bf85fc4044399d7e97a06f13

## How to use this feature?

```
$ mvn clean package -Dskiptests=true -Dmaven.test.skip=true
$ mvn org.apache.hivemall:hivemall-docs:generate-funcs-list
```

## Checklist

- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #148 from takuti/HIVEMALL-193.

8 months ago[HIVEMALL-197] Update Apache incubator logo
Takuya Kitazawa [Fri, 20 Apr 2018 07:04:56 +0000 (16:04 +0900)] 
[HIVEMALL-197] Update Apache incubator logo

## What changes were proposed in this pull request?

* Avoid direct link to ASF's file
* Update Apache incubator logo

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-197

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #147 from takuti/HIVEMALL-197.

8 months agoApplied refactoring
Makoto Yui [Thu, 19 Apr 2018 07:20:27 +0000 (16:20 +0900)] 
Applied refactoring

8 months ago[HIVEMALL-192] Fix typos: graphvis -> graphviz
Takuya Kitazawa [Wed, 18 Apr 2018 08:27:01 +0000 (17:27 +0900)] 
[HIVEMALL-192] Fix typos: graphvis -> graphviz

## What changes were proposed in this pull request?

Fix crucial typos.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-192

## Checklist

- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #146 from takuti/HIVEMALL-192.

9 months agoFormat
Takuya Kitazawa [Tue, 17 Apr 2018 02:28:53 +0000 (11:28 +0900)] 
Format

9 months agoUpdated randomforest hyperparameter description
Makoto Yui [Tue, 17 Apr 2018 02:05:01 +0000 (11:05 +0900)] 
Updated randomforest hyperparameter description

9 months agoUpdated description of RandomForest classifier option
Makoto Yui [Tue, 17 Apr 2018 01:55:12 +0000 (10:55 +0900)] 
Updated description of RandomForest classifier option

9 months agoFix typo in documentation: Whether, wheather -> weather
Takuya Kitazawa [Tue, 17 Apr 2018 00:42:47 +0000 (09:42 +0900)] 
Fix typo in documentation: Whether, wheather -> weather

9 months agoRemoved a link in user guide
Makoto Yui [Mon, 16 Apr 2018 14:12:54 +0000 (23:12 +0900)] 
Removed a link in user guide

9 months ago[HIVEMALL-189] Create a list of all functions
Takuya Kitazawa [Mon, 16 Apr 2018 13:46:36 +0000 (22:46 +0900)] 
[HIVEMALL-189] Create a list of all functions

## What changes were proposed in this pull request?

Create a list of all functions in the documentation. In order to make maintenance easier and simpler, the list is systematically generated by reading `Description` annotation in the code: [takuti/hivemalldoc](https://github.com/takuti/hivemalldoc).

In case this list does not look sufficient, let's update `Description` annotation itself and make the code more informative in the future.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-189

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #143 from takuti/HIVEMALL-189.

9 months agoMoved a file used in a test to test/resources
Makoto Yui [Fri, 13 Apr 2018 07:50:05 +0000 (16:50 +0900)] 
Moved a file used in a test to test/resources

9 months agoClose #144: [HIVEMALL-190][HOTFIX] Fixed a bug in tree_predict_v1 on loading old...
Makoto Yui [Fri, 13 Apr 2018 07:20:09 +0000 (16:20 +0900)] 
Close #144: [HIVEMALL-190][HOTFIX] Fixed a bug in tree_predict_v1 on loading old prediction models

9 months agoFix typos
Takuya Kitazawa [Fri, 13 Apr 2018 06:41:16 +0000 (15:41 +0900)] 
Fix typos

9 months agoFixed array_concat to return List<Writable>
Makoto Yui [Thu, 12 Apr 2018 06:13:28 +0000 (15:13 +0900)] 
Fixed array_concat to return List<Writable>

9 months agoRemove deprecated sha1 UDF from define-udfs.td.hql
Takuya Kitazawa [Thu, 12 Apr 2018 00:00:27 +0000 (09:00 +0900)] 
Remove deprecated sha1 UDF from define-udfs.td.hql

9 months agoFixed a Kryo serialization error in select_k_best UDF
Makoto Yui [Wed, 11 Apr 2018 05:53:37 +0000 (14:53 +0900)] 
Fixed a Kryo serialization error in select_k_best UDF

9 months agoFix typo in define-udfs.td.hql
Takuya Kitazawa [Tue, 10 Apr 2018 07:14:03 +0000 (16:14 +0900)] 
Fix typo in define-udfs.td.hql

9 months ago[HIVEMALL-188] Avoid KryoException: java.lang.NullPointerException
Takuya Kitazawa [Tue, 10 Apr 2018 05:16:13 +0000 (14:16 +0900)] 
[HIVEMALL-188] Avoid KryoException: java.lang.NullPointerException

## What changes were proposed in this pull request?

Fix a bug in `tokenize_ja` that occasionally raises `KryoException: java.lang.NullPointerException`

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-188

## How was this patch tested?

Manual tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?

Author: Takuya Kitazawa <k.takuti@gmail.com>

Closes #142 from takuti/HIVEMALL-188.

9 months agoMerged brickhouse functions #135
Makoto Yui [Mon, 9 Apr 2018 07:04:37 +0000 (16:04 +0900)] 
Merged brickhouse functions #135

9 months ago[HIVEMALL-117][SPARK] Update the installation guide for Spark
Takeshi Yamamuro [Thu, 5 Apr 2018 06:24:55 +0000 (15:24 +0900)] 
[HIVEMALL-117][SPARK] Update the installation guide for Spark

## What changes were proposed in this pull request?
This pr updated the installation guide for Spark.

## What type of PR is it?
Documentation

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-117

## How was this patch tested?
N/A

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #141 from maropu/HIVEMALL-117.

9 months ago[HIVEMALL-180][SPARK] Drop the Spark-2.0 support
Takeshi Yamamuro [Mon, 2 Apr 2018 05:19:30 +0000 (14:19 +0900)] 
[HIVEMALL-180][SPARK] Drop the Spark-2.0 support

## What changes were proposed in this pull request?
This pr dropped the module for Spark-2.0.

## What type of PR is it?
Improvement

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-180

## How was this patch tested?
Existing tests

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #138 from maropu/HIVEMALL-180.

9 months agoChange the organization of nzw
Kento NOZAWA [Mon, 2 Apr 2018 05:17:07 +0000 (14:17 +0900)] 
Change the organization of nzw

## What changes were proposed in this pull request?

Update Kento Nozawa's organization.

## What type of PR is it?

[Documentation]

## What is the Jira issue?

This fix is a minor fix for release, so I have not yet created a jira ticket. Let me know if I should create it.

## How was this patch tested?

By compiling the source code, confirming this change does not break the structure of pom.xml.

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?

Author: Kento NOZAWA <k_nzw@klis.tsukuba.ac.jp>

Closes #140 from nzw0301/change-nzw-org.