parquet-format.git
10 days agoPARQUET-1462: Allow specifying new development version in prepare-release.sh (#116) master
Zoltan Ivanfi [Tue, 4 Dec 2018 13:28:10 +0000 (14:28 +0100)] 
PARQUET-1462: Allow specifying new development version in prepare-release.sh (#116)

Before this change, prepare-release.sh only took the release version as a
parameter, the new development version was asked interactively for each
individual pom.xml file, which made answering them tedious.

6 weeks agoPARQUET-1437: Misleading comment in parquet.thrift (#115)
Zoltan Ivanfi [Tue, 30 Oct 2018 09:32:36 +0000 (10:32 +0100)] 
PARQUET-1437: Misleading comment in parquet.thrift (#115)

The documentation for list<ColumnOrder> column_orders stated that "Each
sort order corresponds to one column, determined by its position in the
list, matching the position of the column in the schema."

However, in reality, while the order of elements in these two
lists (schema and sort order) are the same, only leaf nodes are
represented in the list of sort orders, so the positions do not match.

2 months agoPARQUET-41: Add Bloom filter (#112)
Chen, Junjie [Fri, 12 Oct 2018 00:55:08 +0000 (08:55 +0800)] 
PARQUET-41: Add Bloom filter (#112)

* PARQUET-41: Add Bloom filter

* Grammar and structure tweaking for Bloom filter prose.

2 months agoPARQUET-1433: Parquet-format doesn't compile with Thrift 0.10.0 (#111)
nandorKollar [Fri, 5 Oct 2018 11:24:57 +0000 (13:24 +0200)] 
PARQUET-1433: Parquet-format doesn't compile with Thrift 0.10.0 (#111)

2 months ago[maven-release-plugin] prepare for next development iteration
Nandor Kollar [Thu, 27 Sep 2018 14:31:24 +0000 (16:31 +0200)] 
[maven-release-plugin] prepare for next development iteration

2 months ago[maven-release-plugin] prepare release apache-parquet-format-2.6.0 apache-parquet-format-2.6.0
Nandor Kollar [Thu, 27 Sep 2018 14:31:14 +0000 (16:31 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.6.0

2 months agoPARQUET-1424: Update CHANGES.md
Nandor Kollar [Thu, 27 Sep 2018 13:38:21 +0000 (15:38 +0200)] 
PARQUET-1424: Update CHANGES.md

2 months agoPARQUET-1429: Turn off DocLint on parquet-format (#108)
nandorKollar [Thu, 27 Sep 2018 14:22:38 +0000 (16:22 +0200)] 
PARQUET-1429: Turn off DocLint on parquet-format (#108)

The code generated by Thrift had several issues found by DocLint, which caused the attach-javadocs goal to fail when using Java 8.

2 months agoPARQUET-1428: Move columnar encryption into its feature branch 107/head
Nandor Kollar [Wed, 26 Sep 2018 11:49:14 +0000 (13:49 +0200)] 
PARQUET-1428: Move columnar encryption into its feature branch

Revert "PARQUET-1227: Thrift crypto metadata structures (#94)"

This reverts commit 518e206c3e6586b76e8315d5f62a8666ed62fa90.

2 months agoPARQUET-1428: Move columnar encryption into its feature branch
Nandor Kollar [Wed, 26 Sep 2018 11:49:07 +0000 (13:49 +0200)] 
PARQUET-1428: Move columnar encryption into its feature branch

Revert "PARQUET-1398: move iv_prefix to Algorithms (#103)"

This reverts commit c4a4ef22c99435ae069eb41e2977844e57dcfc37.

2 months agoPARQUET-1428: Move columnar encryption into its feature branch
Nandor Kollar [Wed, 26 Sep 2018 11:48:58 +0000 (13:48 +0200)] 
PARQUET-1428: Move columnar encryption into its feature branch

Revert "PARQUET-1401: optional RowGroup fields for handling hidden columns (#104)"

This reverts commit 677ed8ea23c60e5e42a4c537a454544884525593.

2 months agoPARQUET-1400: Deprecate parquet-mr related code in parquet-format (#105)
Gabor Szadovszky [Mon, 24 Sep 2018 12:16:42 +0000 (14:16 +0200)] 
PARQUET-1400: Deprecate parquet-mr related code in parquet-format (#105)

3 months agoPARQUET-1387: Nanosecond precision time and timestamp - parquet-format (#102)
nandorKollar [Tue, 28 Aug 2018 12:57:19 +0000 (14:57 +0200)] 
PARQUET-1387: Nanosecond precision time and timestamp - parquet-format (#102)

3 months agoPARQUET-1401: optional RowGroup fields for handling hidden columns (#104)
ggershinsky [Tue, 28 Aug 2018 12:56:59 +0000 (15:56 +0300)] 
PARQUET-1401: optional RowGroup fields for handling hidden columns (#104)

3 months agoPARQUET-1398: move iv_prefix to Algorithms (#103)
ggershinsky [Tue, 28 Aug 2018 12:56:23 +0000 (15:56 +0300)] 
PARQUET-1398: move iv_prefix to Algorithms (#103)

4 months agoPARQUET-1227: Thrift crypto metadata structures (#94)
ggershinsky [Mon, 23 Jul 2018 12:48:06 +0000 (15:48 +0300)] 
PARQUET-1227: Thrift crypto metadata structures (#94)

New Thrift structures for Parquet modular encryption.

4 months agoPARQUET-1351: Fix Travis builds by using trusty without thrift NodeJS and PHP (#100)
nandorKollar [Wed, 18 Jul 2018 17:01:12 +0000 (19:01 +0200)] 
PARQUET-1351: Fix Travis builds by using trusty without thrift NodeJS and PHP (#100)

* Use trusty image for Travis CI
* Compile Thrift without NodeJS and PHP. Looks like these are not present in the travis VM, and are not needed for Parquet.

5 months agoPARQUET-1312: Improve logical types documentation (#98)
nandorKollar [Mon, 25 Jun 2018 06:27:55 +0000 (08:27 +0200)] 
PARQUET-1312: Improve logical types documentation (#98)

7 months agoPARQUET-1266: LogicalTypes union in parquet-format doesn't include UUID
Nandor Kollar [Thu, 5 Apr 2018 08:01:42 +0000 (10:01 +0200)] 
PARQUET-1266: LogicalTypes union in parquet-format doesn't include UUID

7 months agoPARQUET-1294: Update release scripts for the new Apache policy
Gabor Szadovszky [Thu, 10 May 2018 14:22:24 +0000 (16:22 +0200)] 
PARQUET-1294: Update release scripts for the new Apache policy

7 months agoPARQUET-1290: clarify run lengths for RLE encoding (#96)
Tim Armstrong [Mon, 7 May 2018 16:51:05 +0000 (09:51 -0700)] 
PARQUET-1290: clarify run lengths for RLE encoding (#96)

8 months ago[maven-release-plugin] prepare for next development iteration
Zoltan Ivanfi [Thu, 29 Mar 2018 13:47:20 +0000 (15:47 +0200)] 
[maven-release-plugin] prepare for next development iteration

8 months ago[maven-release-plugin] prepare release apache-parquet-format-2.5.0 apache-parquet-format-2.5.0
Zoltan Ivanfi [Thu, 29 Mar 2018 13:47:01 +0000 (15:47 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.5.0

8 months agoRevert "[maven-release-plugin] prepare release apache-parquet-format-2.5.0"
Zoltan Ivanfi [Thu, 29 Mar 2018 13:44:08 +0000 (15:44 +0200)] 
Revert "[maven-release-plugin] prepare release apache-parquet-format-2.5.0"

This reverts commit a5b842613309a60b59d07af5d02a76c00e9ef2ac.

8 months ago[maven-release-plugin] prepare release apache-parquet-format-2.5.0
Zoltan Ivanfi [Thu, 29 Mar 2018 13:24:10 +0000 (15:24 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.5.0

8 months agoPARQUET-1234: Update CHANGES.md.
Gabor Szadovszky [Mon, 26 Mar 2018 13:27:06 +0000 (15:27 +0200)] 
PARQUET-1234: Update CHANGES.md.

8 months agoPARQUET-1260: Add Zoltan Ivanfi's code signing key to the KEYS file (#91)
Zoltan Ivanfi [Thu, 29 Mar 2018 13:22:22 +0000 (15:22 +0200)] 
PARQUET-1260: Add Zoltan Ivanfi's code signing key to the KEYS file (#91)

8 months agoPARQUET-1258: Update scm developer connection to github (#90)
Gabor Szadovszky [Wed, 28 Mar 2018 13:57:37 +0000 (15:57 +0200)] 
PARQUET-1258: Update scm developer connection to github (#90)

After moving to gitbox the old apache repo is not working anymore.
The pom.xml had to be updated accordingly.

8 months agoPARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)
Gabor Szadovszky [Mon, 26 Mar 2018 13:00:04 +0000 (15:00 +0200)] 
PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)

Describe handling of the ambigous min/max statistics for FLOAT/DOUBLE.

8 months agoPARQUET-1242: parquet.thrift refers to wrong releases for the new compressions
Zoltan Ivanfi [Fri, 23 Mar 2018 13:55:52 +0000 (14:55 +0100)] 
PARQUET-1242: parquet.thrift refers to wrong releases for the new compressions

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #87 from zivanfi/PARQUET-1242 and squashes the following commits:

33cb102 [Zoltan Ivanfi] PARQUET-1242: parquet.thrift refers to wrong releases for the new compressions

8 months agoMerge pull request #89 from timarmstrong/master
Lars Volker [Fri, 23 Mar 2018 00:00:06 +0000 (17:00 -0700)] 
Merge pull request #89 from timarmstrong/master

Update Encodings.md with RLE_DICTIONARY

8 months agoMerge pull request #86 from lekv/p323
Lars Volker [Thu, 22 Mar 2018 22:24:24 +0000 (15:24 -0700)] 
Merge pull request #86 from lekv/p323

PARQUET-323: Mark INT96 as deprecated

8 months agoUpdate Encodings.md with RLE_DICTIONARY 89/head
Tim Armstrong [Thu, 22 Mar 2018 21:40:47 +0000 (14:40 -0700)] 
Update Encodings.md with RLE_DICTIONARY

RLE_DICTIONARY is never mentioned in Encodings.md yet is the recommended
enum value to use in Parquet 2.0.

8 months agoPARQUET-1236: Align version of slf4j-api
1028332163 [Wed, 21 Mar 2018 15:26:58 +0000 (16:26 +0100)] 
PARQUET-1236: Align version of slf4j-api

https://issues.apache.org/jira/browse/PARQUET-1236

Author: 1028332163 <1028332163@qq.com>

Closes #85 from PandaMonkey/master and squashes the following commits:

158f082 [1028332163] align version of slf4j-api

9 months agoPARQUET-323: Mark INT96 as deprecated 86/head
Lars Volker [Tue, 13 Mar 2018 00:33:30 +0000 (17:33 -0700)] 
PARQUET-323: Mark INT96 as deprecated

Closes #49

10 months agoPARQUET-1201: Implement page indexes
Gabor Szadovszky [Tue, 13 Feb 2018 16:08:44 +0000 (17:08 +0100)] 
PARQUET-1201: Implement page indexes

Added helper methods to read/write ColumnIndex and OffsetIndex objects.

Author: Gabor Szadovszky <gabor.szadovszky@cloudera.com>

Closes #81 from gszadovszky/PARQUET-1201 and squashes the following commits:

573dada [Gabor Szadovszky] PARQUET-1201: Implement page indexes

10 months agoPARQUET-1197: Log rat failures
Gabor Szadovszky [Thu, 18 Jan 2018 16:05:11 +0000 (17:05 +0100)] 
PARQUET-1197: Log rat failures

Author: Gabor Szadovszky <gabor.szadovszky@cloudera.com>

Closes #80 from gszadovszky/PARQUET-1197 and squashes the following commits:

c97db9d [Gabor Szadovszky] PARQUET-1197: Log rat failures

11 months agoPARQUET-1065: Deprecate type-defined sort ordering for INT96 type
Zoltan Ivanfi [Thu, 11 Jan 2018 14:08:45 +0000 (15:08 +0100)] 
PARQUET-1065: Deprecate type-defined sort ordering for INT96 type

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #77 from zivanfi/PARQUET-1065 and squashes the following commits:

b5a2117 [Zoltan Ivanfi] PARQUET-1065: Deprecate type-defined sort ordering for INT96 type

11 months agoPARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings
Wes McKinney [Wed, 10 Jan 2018 03:04:57 +0000 (22:04 -0500)] 
PARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings

See related discussions on mailing list, JIRA

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #79 from wesm/PARQUET-1171 and squashes the following commits:

185348e [Wes McKinney] Fix typo
f29b38c [Wes McKinney] Add notes to indicate scope of usage for RLE, BIT_PACKED encodings

11 months agoPARQUET-1064: Deprecate type-defined sort ordering for INTERVAL type.
Zoltan Ivanfi [Tue, 9 Jan 2018 14:48:00 +0000 (15:48 +0100)] 
PARQUET-1064: Deprecate type-defined sort ordering for INTERVAL type.

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #76 from zivanfi/PARQUET-1064 and squashes the following commits:

0ff7b14 [Zoltan Ivanfi] PARQUET-1064: Fixed typo.
5599951 [Zoltan Ivanfi] PARQUET-1064: Deprecate type-defined sort ordering for INTERVAL type.

11 months agoPARQUET-1156: Address dev/merge_parquet_pr.py problems.
Zoltan Ivanfi [Tue, 9 Jan 2018 14:44:48 +0000 (15:44 +0100)] 
PARQUET-1156: Address dev/merge_parquet_pr.py problems.

Identical to my change in parquet-mr, which already got approved and merged.

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #78 from zivanfi/PARQUET-1156 and squashes the following commits:

518faef [Zoltan Ivanfi] PARQUET-1156: Address dev/merge_parquet_pr.py problems.

13 months agoPARQUET-1145: Add license to .gitignore
Lars Volker [Mon, 13 Nov 2017 12:56:08 +0000 (13:56 +0100)] 
PARQUET-1145: Add license to .gitignore

Also removes .gitignore from the RAT whitelist.

Author: Lars Volker <lv@cloudera.com>

Closes #75 from lekv/license and squashes the following commits:

04523ef [Lars Volker] Also add license to .travis.yml
ce471fd [Lars Volker] PARQUET-1145: Add license to .gitignore

13 months ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Tue, 17 Oct 2017 19:25:34 +0000 (12:25 -0700)] 
[maven-release-plugin] prepare for next development iteration

13 months ago[maven-release-plugin] prepare release apache-parquet-format-2.4.0 apache-parquet-format-2.4.0
Ryan Blue [Tue, 17 Oct 2017 19:25:18 +0000 (12:25 -0700)] 
[maven-release-plugin] prepare release apache-parquet-format-2.4.0

13 months agoPARQUET-1144: Remove slf4j-nop.
Ryan Blue [Tue, 17 Oct 2017 19:21:05 +0000 (12:21 -0700)] 
PARQUET-1144: Remove slf4j-nop.

Author: Ryan Blue <blue@apache.org>

Closes #74 from rdblue/PARQUET-1144-remove-slf4j-nop and squashes the following commits:

d5d5639 [Ryan Blue] PARQUET-1144: Remove slf4j-nop.

13 months ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Tue, 17 Oct 2017 00:07:13 +0000 (17:07 -0700)] 
[maven-release-plugin] prepare for next development iteration

13 months ago[maven-release-plugin] prepare release apache-parquet-format-2.4.0
Ryan Blue [Tue, 17 Oct 2017 00:06:58 +0000 (17:06 -0700)] 
[maven-release-plugin] prepare release apache-parquet-format-2.4.0

13 months agoPARQUET-1134: Update CHANGES.md.
Ryan Blue [Tue, 17 Oct 2017 00:01:33 +0000 (17:01 -0700)] 
PARQUET-1134: Update CHANGES.md.

Also cleaning up old PRs:
Closes #37

13 months agoPARQUET-922: Add column indexes to parquet.thrift
Lars Volker [Mon, 16 Oct 2017 23:47:12 +0000 (16:47 -0700)] 
PARQUET-922: Add column indexes to parquet.thrift

I moved the design doc to a .md file and addressed the first round of review comments.

closes #63

This is based on work done by @mkornacker and @lekv who wrote the initial proposal and @poojanilangekar who evolved the design, wrote a prototypical implementation, and evaluated its performance.

Author: Lars Volker <lv@cloudera.com>
Author: poojanilangekar <nilangekar.pooja@gmail.com>
Author: Lars Volker <lvolker@gmail.com>

Closes #72 from lekv/index and squashes the following commits:

babb356 [Lars Volker] Address comments from Marcel and Zoltan.
6897c2b [Lars Volker] Address Marcel's comments.
bbb3670 [Lars Volker] Reinstate PageIndex.md
ebcb33f [Lars Volker] Revert "Extend comments in parquet.thrift, remove PageIndex.md"
877e14c [Lars Volker] Revert "Remove picture"
5df2bbc [Lars Volker] Remove picture
a39bf49 [Lars Volker] Extend comments in parquet.thrift, remove PageIndex.md
9ea100a [Lars Volker] Address comments from Zoltan.
9f79d72 [Lars Volker] Merge branch 'master' into index
5e8ea1c [Lars Volker] Fix Typo
da6f648 [Lars Volker] Addressing more comments
8541da7 [Lars Volker] Addressing review comments from the Parquet sync meeting
8e3c533 [Lars Volker] More review comments
109b20d [Lars Volker] Address more review comments, clarify the description of ColumnIndex
f5bfe55 [Lars Volker] Address review comments on parquet.thrift.
700cc00 [Lars Volker] PARQUET-922: Add documentation on page indexes
f983794 [poojanilangekar] PARQUET-922: ColumnIndex Layout to Support Page Skipping

14 months agoPARQUET-1136: Fix path to parquet.thrift in Makefile
Lars Volker [Thu, 12 Oct 2017 16:02:58 +0000 (09:02 -0700)] 
PARQUET-1136: Fix path to parquet.thrift in Makefile

Author: Lars Volker <lv@cloudera.com>

Closes #73 from lekv/makefile and squashes the following commits:

f6c5569 [Lars Volker] PARQUET-1136: Fix path to parquet.thrift in Makefile

14 months agoPARQUET-1031: Fix spelling errors, whitespace, GitHub urls
Fabrizio (Misto) Milo [Wed, 11 Oct 2017 15:53:21 +0000 (08:53 -0700)] 
PARQUET-1031: Fix spelling errors, whitespace, GitHub urls

rebased pull request https://github.com/apache/parquet-format/pull/39, fixed minor spelling mistake and the travis-ci URLs (which also pointed to the Parquet/parquet-format one).

@Mistobaan please  let me know if you would like to reclaim the original pull request.

Author: Fabrizio (Misto) Milo <mistobaan@gmail.com>
Author: Anna Szonyi <szonyi@cloudera.com>

Closes #59 from commanderofthegrey/parquet-1031 and squashes the following commits:

e61c3b5 [Anna Szonyi] add back uncompressed_page_size
1cb8163 [Anna Szonyi] PARQUET-1031: Fix spelling errors, whitespace, GitHub urls
67f0064 [Fabrizio (Misto) Milo] explicit that the length has no sign
e901ded [Fabrizio (Misto) Milo] fix misspells
ceda268 [Fabrizio (Misto) Milo] remove spaces
e1f9479 [Fabrizio (Misto) Milo] fix mispell

14 months agoPARQUET-1124: Add LZ4 and Zstd compression codecs.
Ryan Blue [Tue, 10 Oct 2017 19:55:27 +0000 (12:55 -0700)] 
PARQUET-1124: Add LZ4 and Zstd compression codecs.

This adds LZ4 and Zstd compression codecs to the format spec. From recent tests, Zstd appears to out-perform other codecs (including brotli on reads). LZ4 is widely available because it is built into Hadoop, making it a good successor to snappy, for fast compression and decompression when speed is mroe important than compression ratio.

Author: Ryan Blue <blue@apache.org>

Closes #70 from rdblue/PARQUET-1124-add-compression-codecs and squashes the following commits:

939328e [Ryan Blue] PARQUET-1124: Add warning about external codec dependencies.
affad3d [Ryan Blue] PARQUET-1124: Add lz4 and zstd compression codecs.

14 months agoPARQUET-1125: Add UUID logical type.
Ryan Blue [Tue, 10 Oct 2017 19:53:19 +0000 (12:53 -0700)] 
PARQUET-1125: Add UUID logical type.

UUIDs are commonly used as unique identifiers. A binary representation will reduce memory when writing or building bloom filters and will reduce cycles needed to compare values.

This commit is based on PARQUET-906 / PR #51.

Author: Ryan Blue <blue@apache.org>

Closes #71 from rdblue/PARQUET-1125-add-uuid-logical-type and squashes the following commits:

dc01707 [Ryan Blue] PARQUET-1125: Add UUID logical type.

14 months agoPARQUET-906: Add LogicalType annotation.
Ryan Blue [Tue, 10 Oct 2017 19:37:15 +0000 (12:37 -0700)] 
PARQUET-906: Add LogicalType annotation.

This commit adds a `LogicalType` union and a field for this logical type to `SchemaElement`. Adding a new structure for logical types is needed for a few reasons:

1. Adding to the ConvertedType enum is not forward-compatible. Adding new types to the `LogicalType` union is forward-compatible.
2. Using a struct for each type allows additional metadata, like `isAdjustedToUTC`, without adding more fields to `SchemaElement` that don't apply to all types.
3. Types without additional metadata can be updated later. For example, adding an `encoding` field to `StringType` when it is needed.

Author: Ryan Blue <blue@apache.org>

Closes #51 from rdblue/PARQUET-906-add-timestamp-adjustment-metadata and squashes the following commits:

ad8e91d [Ryan Blue] PARQUET-906: Clarify the use of NullType.
7cc29f7 [Ryan Blue] PARQUET-906: Rename NULL to UNKNOWN.
02f3868 [Ryan Blue] PARQUET-906: Update from comments on the PR.
c0386e9 [Ryan Blue] PARQUET-906: Remove NULL ConvertedType.
190bd8a [Ryan Blue] PARQUET-906: Update for review comments.
8203b21 [Ryan Blue] PARQUET-906: Add copyright header to LogicalTypes.
993102e [Ryan Blue] PARQUET-906: Remove the unreleased NULL ConvertedType.
86a22b4 [Ryan Blue] PARQUET-906: Add LogicalType annotation.

14 months agoPARQUET-322 Document ENUM as a logical type.
Jakub Kukul [Fri, 6 Oct 2017 23:57:21 +0000 (16:57 -0700)] 
PARQUET-322 Document ENUM as a logical type.

Author: Jakub Kukul <jakub@mbr-targeting.com>

Closes #54 from jkukul/master and squashes the following commits:

a2490b2 [Jakub Kukul] PARQUET-322 Document ENUM as a logical type.

14 months agoPARQUET-1050 fix the comments mistake of struct DataPageHeaderV2
LynnYuan [Fri, 6 Oct 2017 23:54:00 +0000 (16:54 -0700)] 
PARQUET-1050 fix the comments mistake of struct DataPageHeaderV2

Author: LynnYuan <yuanxiaolong@inspur.com>

Closes #58 from LynnYuanInspur/lynn and squashes the following commits:

2001d05 [LynnYuan] PARQUET-1050 fix the comments mistake of struct DataPageHeaderV2

14 months agoPARQUET-1024: Allow case-insensitive parquet-xxx prefix in PR title.
Ryan Blue [Fri, 6 Oct 2017 23:49:55 +0000 (16:49 -0700)] 
PARQUET-1024: Allow case-insensitive parquet-xxx prefix in PR title.

This merges changes from PARQUET-1024 in parquet-mr into parquet-format.

Also cleaning up old PRs:

Closes #29
Closes #60

14 months agoPARQUET-686: Clarifications about min-max stats.
Zoltan Ivanfi [Fri, 6 Oct 2017 23:38:53 +0000 (16:38 -0700)] 
PARQUET-686: Clarifications about min-max stats.

Changed some descriptions to reflect code changes that happened during code review without updating the corresponding comments and documentation:

* Removed references to the `SIGNED` and `UNSIGNED` sort orders, which were removed in favour of a single `TYPE_ORDER`.

* Removed obsolete references to `column_orders`'s effect on the `min` and `max` values, since those were declared obsolete instead and `column_orders` only affects the new `min_value` and `max_value` fields.

* Clarified `ColumnOrder`'s purpose, since the purpose of a union containing a single empty struct was hard to grasp.

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #55 from zivanfi/master and squashes the following commits:

a499d86 [Zoltan Ivanfi] Comparison rules updates.
0c973f7 [Zoltan Ivanfi] PARQUET-686: Further clarifications.
f8fab0b [Zoltan Ivanfi] PARQUET-686: Minor improvements in Thrift comments.
c86090d [Zoltan Ivanfi] PARQUET-686: Clarifications about min-max stats.

14 months agoPARQUET-1076: Use long key ids in KEYS file
Lars Volker [Fri, 6 Oct 2017 23:24:45 +0000 (16:24 -0700)] 
PARQUET-1076: Use long key ids in KEYS file

Created like so:

gpg --import < KEYS 2>&1 | grep key | sed -e 's/.*"\(.*\)".*/\1/' | \
while read k; do gpg --list-sigs --keyid-format long $k; gpg --export \
--armor $k; done > newkeys

Author: Lars Volker <lv@cloudera.com>

Closes #61 from lekv/full_keys and squashes the following commits:

89ac932 [Lars Volker] PARQUET-1076: Use long key ids in KEYS file

14 months agoPARQUET-1032: fix varint-encode() encoding algorithm link
kostya-sh [Fri, 6 Oct 2017 23:21:49 +0000 (16:21 -0700)] 
PARQUET-1032: fix varint-encode() encoding algorithm link

The spec says that varint-encode() is ULEB-128 encoding but links to VLQ algorithm that is slightly different from ULEB-128.

Author: kostya-sh <kostya-sh@users.noreply.github.com>

Closes #69 from kostya-sh/patch-1 and squashes the following commits:

f128603 [kostya-sh] PARQUET-1032: fix varint-encode() encoding algorithm link

15 months agoPARQUET-1091: Fix README links
Cheng Lian [Wed, 13 Sep 2017 02:02:39 +0000 (19:02 -0700)] 
PARQUET-1091: Fix README links

Multiple links in the code base, including images and build status in README.md, are still pointing to the old `Parquet/parquet-format` GitHub repository, which is now removed. This PR tries to fix them.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #65 from liancheng/fix-readme-links and squashes the following commits:

7d88f32 [Cheng Lian] Fix README links

15 months agoPARQUET-1102: Fix Travis CI builds for pull requests
Cheng Lian [Wed, 13 Sep 2017 01:44:02 +0000 (18:44 -0700)] 
PARQUET-1102: Fix Travis CI builds for pull requests

Travis CI migrated the default Ubuntu image version from precise to trusty on Sep 1st, 2017. This is probably the reason why all PR builds sent after Sep 1st failed. This PR tries to work around this issue by sticking to Ubuntu precise. We should migrate the build to Ubuntu trusty later, though.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #66 from liancheng/fix-pr-builds and squashes the following commits:

3cfce8e [Cheng Lian] Stick to Ubuntu precise on Travis CI

16 months agoPARQUET-1049: Make thrift version a property in pom.xml
Zoltan Ivanfi [Mon, 31 Jul 2017 16:23:20 +0000 (09:23 -0700)] 
PARQUET-1049: Make thrift version a property in pom.xml

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #57 from zivanfi/PARQUET-1049 and squashes the following commits:

8efc7a3 [Zoltan Ivanfi] PARQUET-1049: Make thrift version a property in pom.xml

16 months agoPARQUET-371: update thrift dependency to 0.9.3; do not shade slf4j
Julien Le Dem [Sat, 29 Jul 2017 23:30:22 +0000 (16:30 -0700)] 
PARQUET-371: update thrift dependency to 0.9.3; do not shade slf4j

Author: Julien Le Dem <julien@dremio.com>
Author: Julien Le Dem <julien@apache.org>

Closes #50 from julienledem/update_thrift and squashes the following commits:

f5db375 [Julien Le Dem] update travis
e30ad8f [Julien Le Dem] update thrift dependency; do not shade slf4j

19 months agoPARQUET-975: Add missing word in README.md
Lars Volker [Tue, 9 May 2017 19:27:00 +0000 (12:27 -0700)] 
PARQUET-975: Add missing word in README.md

Author: Lars Volker <lvolker@gmail.com>

Closes #52 from lekv/patch-1 and squashes the following commits:

90f693d [Lars Volker] PARQUET-975: Add missing word in README.md

19 months agoPARQUET-686: Add Order to store the order used for min/max stats.
Ryan Blue [Mon, 17 Apr 2017 18:23:41 +0000 (11:23 -0700)] 
PARQUET-686: Add Order to store the order used for min/max stats.

This adds a new enum, `Order`, that will be set to the order used to produce the min and max values in all `Statistics` objects (at the page level). `Order` has 8 symbols: `SIGNED`, `UNSIGNED`, and 6 symbols for custom orderings. This also adds a `CustomOrder` struct that is used to map the custom order symbols to string descriptors, such as [order keywords used by ICU collating sequences](http://userguide.icu-project.org/collation/api#TOC-Instantiating-the-Predefined-Collators). `CustomOrder` mappings are stored in the file footer.

Author: Ryan Blue <blue@apache.org>

Closes #46 from rdblue/PARQUET-686-add-stats-ordering and squashes the following commits:

f878c34 [Ryan Blue] PARQUET-686: Remove Order enum.
9447fb8 [Ryan Blue] PARQUET-686: Use "is" instead of "must be".
ffbb60b [Ryan Blue] PARQUET-686: Store ColumnOrder as a union.
c6e43b0 [Ryan Blue] PARQUET-686: Add new min_value and max_value stats.
eed4d47 [Ryan Blue] PARQUET-686: Add clarifications from review comments.
9962df8 [Ryan Blue] PARQUET-686: Remove is_ascending and number columns starting with 1.
faa9edb [Ryan Blue] PARQUET-686: Add order specs to logical types.
4534062 [Ryan Blue] PARQUET-686: Add ColumnOrders to FileMetaData.

23 months agoPARQUET-804: Fix link to mailing list in README.md
Lars Volker [Wed, 28 Dec 2016 19:59:29 +0000 (11:59 -0800)] 
PARQUET-804: Fix link to mailing list in README.md

Author: Lars Volker <lv@cloudera.com>

Closes #47 from lekv/readme and squashes the following commits:

7f8e835 [Lars Volker] Review feedback
8ab9060 [Lars Volker] PARQUET-804: Fix link to mailing list in README.md

2 years agoPARQUET-757: Add NULL type to Bring Parquet logical types to par with Arrow
Julien Le Dem [Fri, 4 Nov 2016 17:35:08 +0000 (10:35 -0700)] 
PARQUET-757: Add NULL type to Bring Parquet logical types to par with Arrow

Author: Julien Le Dem <julien@dremio.com>

Closes #45 from julienledem/types and squashes the following commits:

2956b63 [Julien Le Dem] review feedback
94236c4 [Julien Le Dem] PARQUET-757: Bring Parquet logical types to par with Arrow

2 years agoPARQUET-655: Fixes LogicalTypes.md link
Cheng Lian [Thu, 8 Sep 2016 21:25:53 +0000 (14:25 -0700)] 
PARQUET-655: Fixes LogicalTypes.md link

The current LogicalTypes.md link in README.md points to the the file in the old https://github.com/Parquet/parquet-format repository.

This PR replaces the stale link with a relative path so that it always points to the file in the right repository.

Author: Cheng Lian <lian@databricks.com>

Closes #41 from liancheng/parquet-655-logical-types-link and squashes the following commits:

f98ddb7 [Cheng Lian] Fixes LogicalTypes.md link

2 years agoPARQUET-609: Add Brotli to parquet's thrift definition
Ryan Blue [Mon, 11 Jul 2016 18:00:45 +0000 (11:00 -0700)] 
PARQUET-609: Add Brotli to parquet's thrift definition

Author: Ryan Blue <blue@apache.org>

Closes #40 from rdblue/PARQUET-609-add-brotli and squashes the following commits:

061dcbc [Ryan Blue] PARQUET-609: Add Brotli compression to the format.
4eb1ff0 [Ryan Blue] PARQUET-608: Add thrift.executable property.

2 years agoPARQUET-255: Fixes a typo in decimal type specification
Cheng Lian [Wed, 24 Feb 2016 09:39:00 +0000 (17:39 +0800)] 
PARQUET-255: Fixes a typo in decimal type specification

I believe the mentioned warning should be produced when decimal precision is less than (rather than less than or equal to) 10 when an `int64` is used to represent a decimal.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/incubator-parquet-format/26)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #26 from liancheng/fix-decimal-doc and squashes the following commits:

c4f2cd3 [Cheng Lian] Fixes a typo in LogicalTypes.md

2 years agoPARQUET-450: Fix several typos in Parquet format documentation
Laurent Goujon [Fri, 29 Jan 2016 18:30:06 +0000 (10:30 -0800)] 
PARQUET-450: Fix several typos in Parquet format documentation

It also changes parquet.thrift location to conform to maven layout and add a link to it from README.md

Author: Laurent Goujon <lgoujon@twitter.com>

Closes #36 from laurentgo/update-format-specification and squashes the following commits:

244c119 [Laurent Goujon] Fix several typos/errors in Parquet documentation
90a2be4 [Laurent Goujon] Fix thrift source path to match maven layout

2 years agoPARQUET-407: Incorrect delta-encoding example
socialpercon [Wed, 6 Jan 2016 23:48:03 +0000 (17:48 -0600)] 
PARQUET-407: Incorrect delta-encoding example

https://issues.apache.org/jira/browse/PARQUET-407

The minimum and the number of bits are incorrect at delta encoding Example 2 In `Encodings.md`.
In the example,

```
Example 2

7, 5, 3, 1, 2, 3, 4, 5, the deltas would be

-2, -2, -2, 1, 1, 1, 1
The minimum is -2, so the relative deltas are:

0, 0, 0, 3, 3, 3, 3

The encoded data is

header: 8 (block size), 1 (miniblock count), 8 (value count), 7 (first value)

block 0 (minimum delta), 2 (bitwidth), 000000111111b (0,0,0,3,3,3 packed on 2 bits)
```

The minimum is -2 and the relative deltas are 0, 0, 0, 3, 3, 3, 3. So, this should be corrected as below:

```
block -2 (minimum delta), 2 (bitwidth), 00000011111111b (0,0,0,3,3,3,3 packed on 2 bits)
```

Author: socialpercon <socialpercon@gmail.com>

Closes #35 from socialpercon/master and squashes the following commits:

3d5886a [socialpercon] Change incorrect delta-encoding example

3 years ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Mon, 14 Dec 2015 17:34:38 +0000 (09:34 -0800)] 
[maven-release-plugin] prepare for next development iteration

3 years ago[maven-release-plugin] prepare release apache-parquet-format-2.3.1 apache-parquet-format-2.3.1
Ryan Blue [Mon, 14 Dec 2015 17:34:24 +0000 (09:34 -0800)] 
[maven-release-plugin] prepare release apache-parquet-format-2.3.1

3 years agoPARQUET-403: Remove remaining references to incubation.
Ryan Blue [Mon, 14 Dec 2015 17:23:39 +0000 (09:23 -0800)] 
PARQUET-403: Remove remaining references to incubation.

Author: Ryan Blue <blue@apache.org>

Closes #34 from rdblue/PARQUET-403-fix-release-scripts and squashes the following commits:

5354bd3 [Ryan Blue] PARQUET-403: Remove remaining references to incubation.

3 years agoPARQUET-369: Add shaded SLF4J NOP binding.
Ryan Blue [Tue, 27 Oct 2015 22:08:17 +0000 (15:08 -0700)] 
PARQUET-369: Add shaded SLF4J NOP binding.

This silences the complaint that no logger implementation could be
found. No logger implementation was possible because the class that
SLF4J was trying to load had been relocated. The only options are to
relocate an implementation along with SLF4J or not shade SLF4J. This
adds the NOP logger to silence the warning.

Author: Ryan Blue <blue@apache.org>

Closes #32 from rdblue/PARQUET-369-fix-slf4j-binding and squashes the following commits:

f993a91 [Ryan Blue] PARQUET-369: Add shaded SLF4J NOP binding.

3 years agoPARQUET-200: Add microsecond-precision time specs.
Ryan Blue [Tue, 16 Jun 2015 16:53:17 +0000 (09:53 -0700)] 
PARQUET-200: Add microsecond-precision time specs.

Author: Ryan Blue <blue@apache.org>

Closes #23 from rdblue/PARQUET-200-nanosecond-times and squashes the following commits:

2b1e423 [Ryan Blue] PARQUET-200: Add microsecond-precision time specs.

3 years agoPARQUET-178: Remove SLF4J META-INF from binary artifacts.
Ryan Blue [Tue, 16 Jun 2015 16:49:32 +0000 (09:49 -0700)] 
PARQUET-178: Remove SLF4J META-INF from binary artifacts.

Author: Ryan Blue <blue@apache.org>

Closes #24 from rdblue/PARQUET-178-remove-slf4j-meta-inf and squashes the following commits:

379fa0e [Ryan Blue] PARQUET-178: Remove SLF4J META-INF from binary artifacts.

3 years agoPARQUET-265: Update POM for Parquet TLP. 27/head
Ryan Blue [Wed, 29 Apr 2015 00:16:28 +0000 (17:16 -0700)] 
PARQUET-265: Update POM for Parquet TLP.

3 years agoPARQUET-240: Fix typo in the LIST description
Colin Marc [Sat, 4 Apr 2015 17:39:43 +0000 (10:39 -0700)] 
PARQUET-240: Fix typo in the LIST description

The description mistakenly refers to the repeated element being named "array"
instead of "list". This also fixes one grammatical error.

Author: Colin Marc <colinmarc@gmail.com>

Closes #25 from colinmarc/fix-list-typo and squashes the following commits:

5c36271 [Colin Marc] Fix typo in the LIST description

3 years agoPARQUET-188: Add ColumnChunk metadata order to the spec.
Ryan Blue [Mon, 9 Mar 2015 20:10:51 +0000 (13:10 -0700)] 
PARQUET-188: Add ColumnChunk metadata order to the spec.

As discussed on PARQUET-188, this updates the file format spec to state that the column metadata order should match the schema column order.

Author: Ryan Blue <blue@apache.org>

Closes #22 from rdblue/PARQUET-188-specify-column-metadata-order and squashes the following commits:

af8334f [Ryan Blue] PARQUET-188: Add ColumnChunk metadata order to the spec.

3 years agoPARQUET-113: Add specs for LIST and MAP annotations.
Ryan Blue [Wed, 4 Mar 2015 20:08:49 +0000 (12:08 -0800)] 
PARQUET-113: Add specs for LIST and MAP annotations.

Draft specs for using `MAP` and `LIST` annotations.

Please help verify that this can read all existing map and list data correctly!

Author: Ryan Blue <blue@apache.org>

Closes #17 from rdblue/PARQUET-113-add-list-and-map-spec and squashes the following commits:

7c50699 [Ryan Blue] PARQUET-113: Clarify LIST and MAP annotations.
eb627c7 [Ryan Blue] PARQUET-113: Add rules for maps written with Hive.
2515ffc [Ryan Blue] PARQUET-113: Clarify rules after working on implementations.
969a71e [Ryan Blue] PARQUET-113: Remove requirement for annotated repeated types.
3135c61 [Ryan Blue] PARQUET-113: Add specs for LIST and MAP annotations.

3 years ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Wed, 11 Feb 2015 22:59:20 +0000 (14:59 -0800)] 
[maven-release-plugin] prepare for next development iteration

3 years ago[maven-release-plugin] prepare release apache-parquet-format-2.3.0-incubating apache-parquet-format-2.3.0-incubating
Ryan Blue [Wed, 11 Feb 2015 22:59:06 +0000 (14:59 -0800)] 
[maven-release-plugin] prepare release apache-parquet-format-2.3.0-incubating

3 years agoPARQUET-185: Update release scripts and POM.
Ryan Blue [Wed, 11 Feb 2015 22:57:42 +0000 (14:57 -0800)] 
PARQUET-185: Update release scripts and POM.

* Disable source zip generation in POM
* Update scripts to use -incubating version

Author: Ryan Blue <blue@apache.org>

Closes #21 from rdblue/PARQUET-185-update-for-release and squashes the following commits:

056f966 [Ryan Blue] PARQUET-185: Update release scripts and POM.

3 years agoPARQUET-184: Add release scripts.
Ryan Blue [Mon, 9 Feb 2015 20:21:08 +0000 (12:21 -0800)] 
PARQUET-184: Add release scripts.

These help with the release process, which is roughly:
* `sh dev/release-prepare.sh <version>`
* `mvn release:perform`
* `sh dev/source-release.sh <version> <rc-num>`
* Send a vote e-mail

The documentation is posted here: https://github.com/rdblue/incubator-parquet-site/blob/release-docs/site/source/how-to-release.html.md

Author: Ryan Blue <blue@apache.org>

Closes #20 from rdblue/PARQUET-184-add-release-scripts and squashes the following commits:

e17a447 [Ryan Blue] PARQUET-184: Update tags based on review feedback.
cb5637e [Ryan Blue] PARQUET-184: Add release scripts.

3 years ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Sat, 7 Feb 2015 00:34:16 +0000 (16:34 -0800)] 
[maven-release-plugin] prepare for next development iteration

3 years ago[maven-release-plugin] prepare release apache-parquet-format-2.3.0 apache-parquet-format-2.3.0
Ryan Blue [Sat, 7 Feb 2015 00:34:01 +0000 (16:34 -0800)] 
[maven-release-plugin] prepare release apache-parquet-format-2.3.0

3 years ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Fri, 6 Feb 2015 22:38:53 +0000 (14:38 -0800)] 
[maven-release-plugin] prepare for next development iteration

3 years ago[maven-release-plugin] prepare release apache-parquet-format-2.2.0 apache-parquet-format-2.2.0
Ryan Blue [Fri, 6 Feb 2015 22:38:39 +0000 (14:38 -0800)] 
[maven-release-plugin] prepare release apache-parquet-format-2.2.0

3 years agoRevert "[maven-release-plugin] prepare release parquet-format-2.2.0"
Ryan Blue [Fri, 6 Feb 2015 22:37:43 +0000 (14:37 -0800)] 
Revert "[maven-release-plugin] prepare release parquet-format-2.2.0"

This reverts commit fd55aaa98982ad0d8f75cbfed90a3b9374595ba5.

3 years ago[maven-release-plugin] prepare release parquet-format-2.2.0
Ryan Blue [Fri, 6 Feb 2015 22:34:34 +0000 (14:34 -0800)] 
[maven-release-plugin] prepare release parquet-format-2.2.0

3 years agoPARQUET-111: Update LICENSE and pom.
Ryan Blue [Fri, 6 Feb 2015 22:12:02 +0000 (14:12 -0800)] 
PARQUET-111: Update LICENSE and pom.

This updates the LICENSE and shading configuration in the pom for
problem found while working on PARQUET-111 in parquet-mr.

Author: Ryan Blue <blue@apache.org>

Closes #19 from rdblue/PARQUET-111-fixes-from-mr and squashes the following commits:

ed98c6a [Ryan Blue] PARQUET-111: Update LICENSE and pom.

3 years agoPARQUET-23: Refactor parquet-format to org.apache names.
Ryan Blue [Thu, 18 Dec 2014 23:33:56 +0000 (15:33 -0800)] 
PARQUET-23: Refactor parquet-format to org.apache names.

This updates parquet-format to use org.apache names. Still need to:
* Validate that parquet-mr works as expected when relying on these changes

Author: Ryan Blue <blue@apache.org>

Closes #18 from rdblue/PARQUET-23-rename-to-org-apache and squashes the following commits:

ddcd50e [Ryan Blue] PARQUET-23: Update changelog for org.apache parquet-format 2.2.0.
5c339d4 [Ryan Blue] PARQUET-23: Update POM to use Apache maven release config.
ac982ca [Ryan Blue] PARQUET-23: Refactor parquet-format to org.apache names.

4 years agoPARQUET-119: add data_encodings to ColumnMetaData to enable dictionary based predicat...
julien [Thu, 30 Oct 2014 20:49:26 +0000 (13:49 -0700)] 
PARQUET-119: add data_encodings to ColumnMetaData to enable dictionary based predicate push down

To implement predicate push down based on dictionary we need to know if fallback happened.
If all data pages are dictionary encoded we can use the dictionary for predicate-push down.
If not we can not.

CC @nongli @rdblue @isnotinvain @tsdeng

Author: julien <julien@twitter.com>

Closes #16 from julienledem/data_encodings and squashes the following commits:

3a60c6c [julien] typo
46f7b7a [julien] update to stats based on feedback
6474f58 [julien] Merge branch 'master' into data_encodings
3529ccf [julien] make data_encodings optional
709dd7c [julien] add data_encodings to ColumnMetaData to enable dictionary based predicate push down

4 years agoPARQUET-24: port changes form parquet_mr
julien [Fri, 24 Oct 2014 08:51:28 +0000 (01:51 -0700)] 
PARQUET-24: port changes form parquet_mr

Author: julien <julien@twitter.com>

Closes #14 from julienledem/PARQUET_24 and squashes the following commits:

a0efb8f [julien] port changes form parquet_mr

4 years agoPARQUET-109: Update NOTICE, add binary LICENSE.
Ryan Blue [Wed, 1 Oct 2014 22:40:52 +0000 (15:40 -0700)] 
PARQUET-109: Update NOTICE, add binary LICENSE.

This makes a minor change to NOTICE to match the maven-generated project
name (Apache Parquet => Apache Parquet Format (Incubating)). There
should be no more additions to NOTICE needed. The Thrift NOTICE has no
additions beyond the standard Apache boilerplate and SLF4J has no NOTICE
file. **Please double-check this**

This also adds a LICENSE file in the resources that is included in the
binary distribution. This LICENSE represents the contents of the binary
distribution and includes the QOS.ch BSD license for SLF4J. There are no
other licenses required. Although the LICENSE.txt in the thrift binary
that is shaded includes other licenses, the files that they apply to are
not included in either source or binary form because the shaded jar
contains only the Thrift Java library. **Please double-check this**

This also removes the Thrift NOTICE.txt and LICENSE.txt files from the
binary distribution. The parquet-format NOTICE and LICENSE files
represent the contents of the jar.

Author: Ryan Blue <rblue@cloudera.com>

Closes #15 from rdblue/PARQUET-109-licensing-fixes and squashes the following commits:

3b34771 [Ryan Blue] PARQUET-109: Update NOTICE, add binary LICENSE.

4 years agoPARQUET-72: Fix NOTICE
Ryan Blue [Tue, 16 Sep 2014 21:15:15 +0000 (14:15 -0700)] 
PARQUET-72: Fix NOTICE

Author: Ryan Blue <rblue@cloudera.com>

Closes #13 from rdblue/PARQUET-72-fix-notice and squashes the following commits:

92d25c2 [Ryan Blue] PARQUET-72: Remove unnecessary entries in NOTICE.

4 years agoPARQUET-72: Add PGP keys to KEYS file.
Ryan Blue [Wed, 10 Sep 2014 21:50:33 +0000 (14:50 -0700)] 
PARQUET-72: Add PGP keys to KEYS file.

Author: Ryan Blue <rblue@cloudera.com>

Closes #12 from rdblue/add-pgp-keys and squashes the following commits:

2683b4e [Ryan Blue] PARQUET-72: Add PGP keys to KEYS file.