summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Nic Crane [Sun, 16 Jan 2022 17:57:48 +0000 (17:57 +0000)]
Add note in CONTRIBUTING.MD about typos
Will Jones [Sun, 16 Jan 2022 17:53:37 +0000 (09:53 -0800)]
Describe recipes and differentiate from user guides (#118)
* Describe recipes and differentiate from user guides
* Update CONTRIBUTING.md
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Nic Crane [Tue, 16 Nov 2021 15:09:12 +0000 (15:09 +0000)]
Add link to the README (#102)
Nic Crane [Wed, 3 Nov 2021 21:03:25 +0000 (21:03 +0000)]
Fix broken build and aesthetic improvements (#103)
* Add arrow logo
* Add missing solution headings
* Shorten section titles
* Delete redundant content
* Add link to R docs
* Rename intro sections
Nic Crane [Wed, 3 Nov 2021 12:49:32 +0000 (12:49 +0000)]
Aesthetic improvements to intro page (#101)
* Add arrow logo
* Use includegraphics and add link to dplyr
Nic Crane [Wed, 3 Nov 2021 10:25:22 +0000 (10:25 +0000)]
Update preface (#100)
Nic Crane [Wed, 3 Nov 2021 10:06:05 +0000 (10:06 +0000)]
ARROW-13749: [Doc][Cookbook] Work with functions from other packages via dplyr bindings - R (#95)
* Add content on using functions from other packages
* Add links to packages
Nic Crane [Wed, 3 Nov 2021 09:55:47 +0000 (09:55 +0000)]
ARROW-13714: [Doc][Cookbook] Sharing data between R and Python - R (#99)
* Initial file
* Add to bookdown file
* Add examples
* Add example of PyArrow functions
* Initial file
* Add to bookdown
* Add examples
* Add example of PyArrow functions
* Capitalisation
* Update dependencies
* Add pyarrow installation
* Consistent capitalisation
Nic Crane [Wed, 3 Nov 2021 09:54:45 +0000 (09:54 +0000)]
ARROW-13752 [Doc][Cookbook] Searching for values matching a predicate in Arrays - R (#96)
* Add in extra recipes
* Rephrase to make more idiomatic
Nic Crane [Mon, 1 Nov 2021 05:43:49 +0000 (05:43 +0000)]
ARROW-13713: [Doc][Cookbook] Reading and Writing Compressed Data - R (#91)
* Add initial recipes
* Add to bookdown
* Move "compressed data" content to the "read and write data" chapter
* Add "Solution" headings
* Write parquet not feather!
* Add .gz ending note
* Add note about defaults
* Add note in see also section
* Add comment about default compression
* Add to comment
Alessandro Molina [Thu, 28 Oct 2021 12:18:59 +0000 (14:18 +0200)]
Update python tests for version 6.0.0 (#98)
* Update python tests for version 6.0.0
* Adapt to new partitioning convention
Nic Crane [Thu, 28 Oct 2021 09:23:46 +0000 (12:23 +0300)]
[R] - Flight recipes (#90)
* Flight recipes
* Add code to make examples self-contained
* "discussion" -> "see also"
* In in solution headings
* Use sentence case and remove "ing" from first verb in title
* Mention it's a pyarrow thing
* Update r/content/flight.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Alessandro Molina [Thu, 28 Oct 2021 09:22:05 +0000 (11:22 +0200)]
ARROW-13712: Reading and Writing Compressed Data (#87)
* ARROW-13712: Reading and Writing Compressed Data
* Apply suggestions from code review
Co-authored-by: Nic <thisisnic@gmail.com>
* Rewording
* Rewording
* Update python/source/io.rst
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Weston Pace [Wed, 27 Oct 2021 20:46:20 +0000 (10:46 -1000)]
Add a basic dataset reading example (#85)
* Added a basic dataset reading example
* Remove cmake debugging
* Adding newline to end of common.cc
* Apply suggestions from code review
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Nic [Tue, 26 Oct 2021 10:47:52 +0000 (13:47 +0300)]
[R] 93 - dplyr chapter feedback (#94)
* Fix bullet points
* Ensure it's obvious arrow is doing the work
* chunks
Alessandro Molina [Thu, 21 Oct 2021 08:45:54 +0000 (10:45 +0200)]
ARROW-13730: Adding a column to an existing Table (#81)
Nic [Thu, 21 Oct 2021 08:37:44 +0000 (09:37 +0100)]
ARROW-13732: [Doc][Cookbook] Manipulating and analyze Arrow data with dplyr verbs - R (#78)
* Update chapter to follow problem/solution/discussion format
* Split data manipulation chapter into tables/arrays and add initial content
* Remove assignment and have simpler chains
* Shorten line
* Add content on using compute functions not implemented in the R package
* Remove the word tidyverse as it's inaccurate
* Add heading
* Entirely refactor
* Add comment with current missing content
* Actually use Arrow
* Add "what you should know" section and do loads of rephrasing and adding examples
* Add "what you should know before, and content on calling functions directly
* Add test
* Rename files
* Add tests to code chunks
* Add note about collect/create
* Fix bad link
* Update r/content/arrays.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/arrays.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/arrays.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Remove "where possible"
* Rephrase to clarify
* restyle
* More rephrasing
* Add some simpler intro content to the dplyr chapter
* Add tests for intro code
* Put dataset in Table$create() call
* Reduce whitespace
* Use arrow_table instead of Table$create
* Use Table$create for the moment while the next version isn't on CRAN yet
* Erroneous renaming
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Alessandro Molina [Wed, 13 Oct 2021 12:03:52 +0000 (14:03 +0200)]
ARROW-13710: Arrow Flight recipe (#84)
* Flight RPC
* Create datasets directory
* comment
* Ops, forgot to create repo for real
* Size can change depending on the system
* Apply suggestions from code review
Co-authored-by: David Li <li.davidm96@gmail.com>
* Address code review feedbacks
Co-authored-by: David Li <li.davidm96@gmail.com>
Alessandro Molina [Fri, 8 Oct 2021 08:57:50 +0000 (10:57 +0200)]
ARROW-13753: Filtering Arrays for values matching a mask filter (#80)
* filtering arrays recipe
* Wrong heading
* Apply suggestions from code review
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Alessandro Molina [Tue, 5 Oct 2021 09:31:32 +0000 (11:31 +0200)]
ARROW-13751: Recipe for searching for values matching a predicate (#79)
* Recipe for searching for values matching a predicate
* Apply suggestions from code review
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Wrong heading
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Nic [Fri, 1 Oct 2021 17:27:35 +0000 (17:27 +0000)]
[R] - Schemas recipes (#67)
* Add the creating schemas recipe
* Add in content on combinig schemas, and specifying schemas when reading in files
* Delete unncecessary files, and stop showing test chunks
* Rephrase the bit about converting from R to Arrow
* Remove extraneous word
* Also mention reading in data
* Extra clarity
* missing word
* Add appendices
* Add section on casting, remove "problem" headings, update dataset, move tables to appendix, show example of incompatible data types
* Link between incompatible data types and appendix table
* Add content on combining schemas
* Rephrase
* Add context
* Reorder items in table
* Add recipe for schemas where match or don't match
* Rephrase
* Update code which causes an error to not run
* Relegate unify_schemas to discussion
* Fix rebase
* Remove appendix and link to vignette instead
* Remove examples of everything that could go wrong, as not relevant
* Fix failing test
Alessandro Molina [Thu, 30 Sep 2021 14:19:08 +0000 (16:19 +0200)]
ARROW-13727: Recipe to concatenate two tables (#76)
* Recipe to concatenate two tables
* Apply suggestions from code review
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Alessandro Molina [Thu, 30 Sep 2021 12:43:42 +0000 (14:43 +0200)]
Unify schemas recipe (#75)
Nic [Mon, 20 Sep 2021 09:54:09 +0000 (09:54 +0000)]
Add missing word (#77)
Nic [Wed, 15 Sep 2021 11:28:56 +0000 (11:28 +0000)]
Use as.data.frame instead of dplyr::collect (#71)
* Use as.data.frame instead of dplyr::collect
* Typo
Alessandro Molina [Wed, 15 Sep 2021 11:26:16 +0000 (13:26 +0200)]
ARROW-13716: Add RecordBatch recipe (#66)
* Add RecordBatch recipe
* Apply suggestions from code review
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Make example obvious
* Apply suggestions from code review
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Alessandro Molina [Wed, 15 Sep 2021 11:25:08 +0000 (13:25 +0200)]
Specifying schemas for arrays and tables (#73)
Tomek Drabas [Fri, 10 Sep 2021 23:46:18 +0000 (16:46 -0700)]
Adding anonymous flag to s3 (#70)
* Adding anonymous flag to s3
* Fixing missing comma
* Info about s3 credentials
Weston Pace [Thu, 9 Sep 2021 19:04:56 +0000 (09:04 -1000)]
Committers automatically have all the permissions that collaborators do so this is no longer needed (congrats). (#69)
Nic [Thu, 9 Sep 2021 16:38:34 +0000 (16:38 +0000)]
ARROW-13718: [Doc][Cookbook] Creating Arrays - R (#65)
* Add recipe for creating an Array
Nic [Thu, 9 Sep 2021 16:38:09 +0000 (16:38 +0000)]
ARROW-13709: Reading JSON in R recipe (#64)
* Ensure that test chunks are not rendered
* Add code to delete any temporarily generated files, add recipe for reading JSON
* Rephrase
Alessandro Molina [Tue, 7 Sep 2021 18:02:15 +0000 (20:02 +0200)]
ARROW-13717: Creating arrays recipe (#63)
* Creating arrays recipe
* shorten pandas too for consistency
Weston Pace [Wed, 1 Sep 2021 21:25:06 +0000 (11:25 -1000)]
Added clang-tools (which includes clang-tidy) to environment.yml (#61)
* Added clang-tools (which includes clang-tidy) to environment.yml
* Added libstdc++ to the conda environment
* Only add clang flags if the compiler is clang
Alessandro Molina [Wed, 1 Sep 2021 14:38:35 +0000 (16:38 +0200)]
Creating tables recipe (#51)
https://issues.apache.org/jira/browse/ARROW-13715
Alessandro Molina [Wed, 1 Sep 2021 14:37:54 +0000 (16:37 +0200)]
Recipe to read line delimited json as of ARROW-13708 (#49)
* Recipe to read json
* rename pj to pa.json
* Add colon
Weston Pace [Wed, 1 Sep 2021 02:18:26 +0000 (16:18 -1000)]
Added clang-format and clang-tidy files. Added clang-tidy to the build. (#54)
* Added clang-format and clang-tidy files. Added clang-tidy to the build.
* Added copyright to .clang-tidy
* Wrapped duplicated cmake code into helper function. Got rid of defunct lint target
Weston Pace [Tue, 31 Aug 2021 18:35:56 +0000 (08:35 -1000)]
Add a workflow for running C++ tests on PRs (#53)
Alessandro Molina [Tue, 31 Aug 2021 18:27:28 +0000 (20:27 +0200)]
Update CSV recipe to use pyarrow.csv instead of pandas (#50)
* Switch CSV writing to the arrow provided one
* Add incremental recipe
* Update python/source/io.rst
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
* Update python/source/io.rst
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update python/source/io.rst
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Alessandro Molina [Tue, 24 Aug 2021 13:05:05 +0000 (15:05 +0200)]
Writing Partitioned Datasets recipe for Python (#47)
Weston Pace [Mon, 23 Aug 2021 19:44:26 +0000 (09:44 -1000)]
Initial C++ cookbook (#22)
* Initial C++ cookbook
* Addressing PR feedback. Converted contributing doc from RST to MD (since the extension was MD).
* Update cpp/CONTRIBUTING.md
Co-authored-by: David Li <li.davidm96@gmail.com>
* Creating standalone section for code of conduct
* Update cpp/CONTRIBUTING.md
Co-authored-by: David Li <li.davidm96@gmail.com>
* Addressing PR comments
Co-authored-by: David Li <li.davidm96@gmail.com>
Neal Richardson [Sat, 21 Aug 2021 00:52:11 +0000 (20:52 -0400)]
[R] Remove unnecessary head()s (#41)
* [R] Remove unnecessary head()s
* Update r/content/reading_and_writing_data.Rmd
Neal Richardson [Thu, 19 Aug 2021 02:35:58 +0000 (22:35 -0400)]
[R] Fix broken edit link (#39)
* [R] Fix broken edit link
* Update r/content/_bookdown.yml
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Nathanaƫl Leaute [Wed, 18 Aug 2021 04:18:46 +0000 (06:18 +0200)]
Explicit array creation (#2)
Weston Pace [Tue, 17 Aug 2021 13:24:02 +0000 (03:24 -1000)]
Copy .asf.yaml to asf-site branch (#38)
Daniel Gruno [Mon, 16 Aug 2021 12:11:15 +0000 (14:11 +0200)]
more debugging
Daniel Gruno [Mon, 16 Aug 2021 11:50:01 +0000 (13:50 +0200)]
another whitespace trigger
Daniel Gruno [Mon, 16 Aug 2021 11:47:25 +0000 (13:47 +0200)]
whitespace trigger for debugging .asf.yaml issue
Daniel Gruno [Mon, 16 Aug 2021 11:40:19 +0000 (13:40 +0200)]
[asf infra] trigger resort of .asf.yaml
Weston Pace [Mon, 16 Aug 2021 11:19:15 +0000 (01:19 -1000)]
Temporarily removing collaborators ending in - (#37)
ASF Infra has a regex to test if a username is a valid GH username. GH does not allow usernames to end in hyphen. However, they did allow this at one point and they grandfathered in some names. ASF Infra is rejecting these grandfathered in names.
Weston Pace [Mon, 16 Aug 2021 10:54:24 +0000 (00:54 -1000)]
Add Apache publishing / hosting (#30)
* Add Apache publishing / hosting
* Adding asf-site to the destinations we push to
Nic [Fri, 13 Aug 2021 05:52:47 +0000 (05:52 +0000)]
Fix C++ version (#36)
Nic [Thu, 12 Aug 2021 22:28:43 +0000 (22:28 +0000)]
Move tests into implementation-specific folders and workflows (#32)
* Move tests into implementation-specific folders and workflows
* Move workflows up a level so they run
* Improve naming
* Disallow multiple concurrent jobs
* Prevent concurrency
* Check concurrency change
Nic [Wed, 11 Aug 2021 19:54:47 +0000 (19:54 +0000)]
#13 install latest release (#20)
* install latest release
* Add Apache license, and simplify non-release builds for now
* Refactor and style code
* Fix typo
* Install binaries for dependencies too
* Run styler and update install_arrow_version logic to have better defaults
* Fix issue with NA var
* Less ambigous CI
* Got mixed up with RSPM having Linux binaries
* Typofix
* For loops aren't illegal
* Add issue number to TODO
* test empty commit
Weston Pace [Wed, 11 Aug 2021 19:53:59 +0000 (09:53 -1000)]
Changing the gh-pages deploy so that the gh-pages branch is always a single commit instead of something that keeps history (#26)
Weston Pace [Wed, 11 Aug 2021 19:53:39 +0000 (09:53 -1000)]
Setting up notifications via the ML of git activity (#31)
Weston Pace [Tue, 10 Aug 2021 02:09:17 +0000 (16:09 -1000)]
Adding thisisnic and amol- as collaborators (#14)
Weston Pace [Sun, 8 Aug 2021 06:56:12 +0000 (20:56 -1000)]
Created a .nojekyll file to tell github not to run jekyll on the gh-pages (#24)
Weston Pace [Fri, 6 Aug 2021 22:49:10 +0000 (12:49 -1000)]
Fixing depend target in deploy workflow to updated name of build target
Nic [Fri, 6 Aug 2021 22:46:06 +0000 (22:46 +0000)]
Also run CI on pull requests to the main branch (#21)
* Also run CI on pull requests to the main branch
* Separate out test/dploy scripts
* add tests back into deply stage
* rename jobs for consistency
Nic [Fri, 6 Aug 2021 21:35:04 +0000 (21:35 +0000)]
#17 no pacman (#18)
* remove redundant dependencies
* another redundant dependency
Weston Pace [Fri, 6 Aug 2021 07:53:27 +0000 (21:53 -1000)]
Prevent force pushes to main (#15)
Weston Pace [Fri, 6 Aug 2021 07:52:21 +0000 (21:52 -1000)]
The build is dropping a lot of csv/feather/etc files which it's annoying to avoid so I'm adding ignore rules (#16)
Nic [Thu, 5 Aug 2021 21:42:28 +0000 (21:42 +0000)]
remove duplicated sections and add ASF license (#7)
Weston Pace [Thu, 5 Aug 2021 08:43:09 +0000 (22:43 -1000)]
Two sections were using the same name across two different Rmd files and that was causing the combined Rmd to fail (#4)
* Two sections were using the same name across two different Rmd files and that was causing the combined Rmd to fail
* Update r/content/reading_and_writing_data.Rmd
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Nic [Thu, 5 Aug 2021 07:49:11 +0000 (07:49 +0000)]
Enable issues & gh-pages branch on Github repo (#6)
Nic [Thu, 5 Aug 2021 04:32:27 +0000 (04:32 +0000)]
Refactor GH Actions job that uses third party action (#3)
* remove third-party GH actions
* re-enable some stages
Neal Richardson [Thu, 29 Jul 2021 17:05:47 +0000 (13:05 -0400)]
Trivial change to see if I can commit
Alessandro Molina [Wed, 28 Jul 2021 14:38:20 +0000 (16:38 +0200)]
Initial content for Arrow Cookbook for Python and R (#1)
* Initial Import
* R cookbook initial commit (#1)
* R Cookbook skeleton and initial chapter
* Move r test script to a separate directory
* Add Apache 2 license
* Add parquet section
* Delete files used to demonstrate failing tests in CI
* Licensing
* Add content for different formats and rearrange headings
* Small change to make the tests run on macOS
* Completed the IO section and added intersphinx with PyArrow
* Add workflow to deploy to GH pages
* Update path
* Rename chapters and fill in section titles
* Commit whitespace to trigger build
* Update bookdown job
* try new job config
* Install nightly Arrow
* Evaluate all relevant bits!
* Deploy to r dir
* Try new workflow
* update build path
* Add email and update paths
* Update job to build all cookbooks
* Delete whitespace to trigger build
* Swap order to see if this fixes build
* Install system dependencies
* Put it back on Mac so it's faster
* Separate steps to diagnose issue
* Brew not sudo
* Switching to ubuntu as I don't understand why python 2
* Don't put results in r directory
* Capitalise 'C'
* Update bookdown link so can click to fork/edit
* Add CI stage that runs tests
* Add examples of manually creating Arrow objects and writing to various formats
* Add S3 parquet
* Partitioned data
* Partitioned Data from S3
* Rename record_batch_create chunk
* CSV recipe requires pandas
* Filter parquet data on read
* Reading/Writing feather files
* remove duplicated chunk name
* tweak create
* Categorical data
* Speed up compiling
* Fix tests
* tests pass
* Data manipulation functions
* Link to compute functions
* Tweak naming
* Add contribution file
* landing page style tweak
* Improve contribution documentation
* Explicitly reference the contribution docs
* ignore build directory
* Change branch name
* Update contents
* Update CONTRIBUTING.md
* Suggestions from Grammarly
* Rename initial chapter
* Update Makefile to allow Arrow version to be specified
* Truncate license file to relevant part
* typo
* Apply suggestions from code review
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Add link to code of conduct
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Capitalise "Array"
* Update r/CONTRIBUTING.md
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Update r/content/manipulating_data.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/manipulating_data.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/manipulating_data.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/reading_and_writing_data.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/creating_arrow_objects.Rmd
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Update r/content/manipulating_data.Rmd
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Update r/content/manipulating_data.Rmd
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Apply suggestions from code review
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Mention dependencies
* Mention that this is not the documentation
* rewording
* Add -jauto by default and indent a print
* The Apache Software Foundation
* reword
* Correct ambiguous and incorrect phrasing
* Update r/content/reading_and_writing_data.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Update r/content/reading_and_writing_data.Rmd
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Reorder sections
* Update r/content/manipulating_data.Rmd
Co-authored-by: Ian Cook <ianmcook@gmail.com>
* Remove redundant code snippet
* Update reading CSVs
* Add in section on converting from/to Arrow Tables and tibbles
* rephrase list of numbers
* rephrase list of numbers
* Add missing bracket
* Rephrase about parquet containing multiple cols
* rephrased
* Adapt to Arrow 5.0 output
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Ian Cook <ianmcook@gmail.com>
Wes McKinney [Wed, 14 Jul 2021 21:42:28 +0000 (16:42 -0500)]
Initial commit