opennlp-addons.git
2 months agoMerge pull request #3 from kojisekig/OPENNLP-1221 master
Koji Sekiguchi [Thu, 27 Sep 2018 01:58:12 +0000 (10:58 +0900)] 
Merge pull request #3 from kojisekig/OPENNLP-1221

OPENNLP-1221: FeatureGeneratorUtil.tokenFeature() is too specific for…

2 months agoOPENNLP-1221: FeatureGeneratorUtil.tokenFeature() is too specific for some languages 3/head
koji [Thu, 27 Sep 2018 01:56:13 +0000 (10:56 +0900)] 
OPENNLP-1221: FeatureGeneratorUtil.tokenFeature() is too specific for some languages

2 months agoMerge pull request #2 from kojisekig/addBuildXml
Koji Sekiguchi [Wed, 26 Sep 2018 01:37:59 +0000 (10:37 +0900)] 
Merge pull request #2 from kojisekig/addBuildXml

OPENNLP-1201: add Ant build.xml for temporarily because this project …

2 months agoOPENNLP-1201: add Ant build.xml for temporarily because this project depends on openn... 2/head
koji [Wed, 26 Sep 2018 01:35:19 +0000 (10:35 +0900)] 
OPENNLP-1201: add Ant build.xml for temporarily because this project depends on opennlp-tools-1.9.1-SNAPSHOT.jar which hasn't been released yet

2 months agoMerge pull request #1 from kojisekig/OPENNLP-1201
Koji Sekiguchi [Tue, 25 Sep 2018 06:40:51 +0000 (15:40 +0900)] 
Merge pull request #1 from kojisekig/OPENNLP-1201

OPENNLP-1201: add auxiliary info support to token in TokenNameFinder.…

2 months agoOPENNLP-1201: add auxiliary info support to token in TokenNameFinder. This is useful... 1/head
koji [Tue, 25 Sep 2018 06:38:15 +0000 (15:38 +0900)] 
OPENNLP-1201: add auxiliary info support to token in TokenNameFinder. This is useful for Japanese as users can utilize POS tag

2 years agoOPENNLP-860 Add .gitignore file
Jörn Kottmann [Tue, 18 Oct 2016 22:02:04 +0000 (00:02 +0200)] 
OPENNLP-860 Add .gitignore file

2 years agoOPENNLP-622 Added Morfologik license
William Colen [Thu, 14 Jul 2016 22:09:05 +0000 (22:09 +0000)] 
OPENNLP-622 Added Morfologik license

2 years agoOPENNLP-622 Added a different OpenNLP CLI loader that includes all jars in lib folder...
William Colen [Thu, 14 Jul 2016 21:36:48 +0000 (21:36 +0000)] 
OPENNLP-622 Added a different OpenNLP CLI loader that includes all jars in lib folder to classpath.

2 years agoOPENNLP-622 Included transitive dependencies
William Colen [Thu, 14 Jul 2016 16:27:40 +0000 (16:27 +0000)] 
OPENNLP-622 Included transitive dependencies

2 years agoOPENNLP-622 Fixed CLI launcher
William Colen [Thu, 14 Jul 2016 16:26:43 +0000 (16:26 +0000)] 
OPENNLP-622 Fixed CLI launcher

2 years agoOPENNLP-622 Fixed issues related to command line.
William Colen [Fri, 8 Jul 2016 19:18:54 +0000 (19:18 +0000)] 
OPENNLP-622 Fixed issues related to command line.

2 years agoOPENNLP-622 Added distribution assembly files
William Colen [Fri, 8 Jul 2016 03:53:06 +0000 (03:53 +0000)] 
OPENNLP-622 Added distribution assembly files

2 years agoOPENNLP-622 Fixed PosTaggerFactory and restored test.
William Colen [Fri, 8 Jul 2016 03:52:14 +0000 (03:52 +0000)] 
OPENNLP-622 Fixed PosTaggerFactory and restored test.

2 years agoOPENNLP-622 Refactored to remove usage of main methods of Morfologik.
William Colen [Thu, 7 Jul 2016 05:19:18 +0000 (05:19 +0000)] 
OPENNLP-622 Refactored to remove usage of main methods of Morfologik.

2 years agoOPENNLP-622 Updated to OpenNLP 1.6.0 and Morfologik 2.1.0
William Colen [Wed, 6 Jul 2016 21:22:38 +0000 (21:22 +0000)] 
OPENNLP-622 Updated to OpenNLP 1.6.0 and Morfologik 2.1.0

2 years agoOPENNLP-756
Mark Giaconia [Thu, 9 Jun 2016 20:09:01 +0000 (20:09 +0000)] 
OPENNLP-756
OPENNLP-750
Improved Regex handling in scorers and country context generator.
Upgraded Lucene dependency to 6.0.0
Fixed ProvinceProximityScorer and CountryProximityScorer
Fixed num rows returned bug
Added regex support to Country and Province in countrycontextfile, and added headers for better editing in things like xl
Cleaned up some other code, will post new CountryContext file on to OPENNLP-756
All indexes should be rebuilt because of new country context file format returned from the gazetteerIndexer class

3 years agoOPENNLP-805 Added arguments to main for hard coded paths.
Jörn Kottmann [Thu, 6 Aug 2015 13:22:59 +0000 (13:22 +0000)] 
OPENNLP-805 Added arguments to main for hard coded paths.

3 years agoNo jira, javadoc clean up. Removed author tags.
Jörn Kottmann [Mon, 3 Aug 2015 08:50:44 +0000 (08:50 +0000)] 
No jira, javadoc clean up. Removed author tags.

3 years agoAdded missing AL 2.0 header.
Jörn Kottmann [Mon, 3 Aug 2015 08:45:56 +0000 (08:45 +0000)] 
Added missing AL 2.0 header.

3 years agoOPENNLP-803 Updated the OpenNLP Tools dependency to the latest release version.
Jörn Kottmann [Mon, 3 Aug 2015 08:44:20 +0000 (08:44 +0000)] 
OPENNLP-803 Updated the OpenNLP Tools dependency to the latest release version.

3 years agoOPENNLP-756
Mark Giaconia [Mon, 2 Feb 2015 22:07:45 +0000 (22:07 +0000)] 
OPENNLP-756
Many small changes in a few classes due to the REGEX support in the country context file. The country context file is now capable of regex. A bug was also fixed in the AdminBoundaryContextGenerator which improved the performance of the ProvinceProximityScorer.

3 years agoOPENNLP-754 Corrected spelling of context in a println statement.
Jörn Kottmann [Mon, 2 Feb 2015 07:58:44 +0000 (07:58 +0000)] 
OPENNLP-754 Corrected spelling of context in a println statement.

3 years agoOPENNLP-750
Mark Giaconia [Sat, 31 Jan 2015 18:00:43 +0000 (18:00 +0000)] 
OPENNLP-750
Now the constructor of the AdminBoundaryContextGenerator will throw an IOException out to the GeoEntityLinker's init method if any of the following conditions are met:
the path to the file is empty or null
the file specified is not there
the file has no data in it resulting in an empty set of AdminBoundary data
This will force the EntityLinkerFactory to throw an ioexception when it calls the init method when instantiating the geoentitylinker

4 years agoOPENNLP-579 Updated the Entity Linker interface
Jörn Kottmann [Wed, 29 Oct 2014 19:15:23 +0000 (19:15 +0000)] 
OPENNLP-579 Updated the Entity Linker interface

4 years agoOPENNLP-728 Fixed javadoc errors which cause build failures when build on Java 8
Jörn Kottmann [Wed, 29 Oct 2014 18:46:28 +0000 (18:46 +0000)] 
OPENNLP-728 Fixed javadoc errors which cause build failures when build on Java 8

4 years agoOPENNLP-637
Mark Giaconia [Sun, 24 Aug 2014 01:45:48 +0000 (01:45 +0000)] 
OPENNLP-637
There was an invalid comparison in equals and toHashCode inside GazetteerEntry. Fixed. Also added better checks inside the geoentitylinker to ensure no dupes are added across where clauses.

4 years agoOPENNLP-706
Mark Giaconia [Sun, 24 Aug 2014 00:51:39 +0000 (00:51 +0000)] 
OPENNLP-706
Added score normalization for all gazetteerEntries across all where clauses for each name, this score is now part of the sort. Also improved the PlacetypeScorer to include the two main USGS gazetteer types Populated Place and CIVIL. Seems to be performing better on test data.

4 years agoOPENNLP-706
Mark Giaconia [Mon, 18 Aug 2014 14:49:42 +0000 (14:49 +0000)] 
OPENNLP-706
fixed caching, ensured indexing and searching are using the same analyzer wrapper, included provinceproximity scorer

4 years agoOPENNLP-706
Mark Giaconia [Fri, 15 Aug 2014 19:59:04 +0000 (19:59 +0000)] 
OPENNLP-706
Significant fix to the USGS indexing so that state names are properly discovered and weighted, added placename dice coefficient over bigrams to descending sort.

4 years agoOPENNLP-706
Mark Giaconia [Fri, 15 Aug 2014 18:10:51 +0000 (18:10 +0000)] 
OPENNLP-706
Significant fix to the indexing so that country names are properly discovered. Added a typeboosting scorer, and added descending sort to the output of each call to the geoentitylinker. Also did some general cleanup. Made configurable how many matches are returned from the gazetteer via a property.

4 years agoOPENNLP-706
Mark Giaconia [Wed, 13 Aug 2014 12:28:23 +0000 (12:28 +0000)] 
OPENNLP-706
Addressed issues from Joern's code review, also made use of hierarchy configurable, as well as added boosting at index time to administrative boundary types and populated place types so that these hits are more heavily weighted in the index.

4 years agoOPENNLP-706
Mark Giaconia [Fri, 11 Jul 2014 01:04:58 +0000 (01:04 +0000)] 
OPENNLP-706
OPENNLP-707
OPENNLP-708
OPENNLP-709
OPENNLP-710
Addressed each ticket. Also adjusted the package structure a bit to separate responsibility better.

4 years agoOPENNLP-699
Mark Giaconia [Tue, 20 May 2014 17:18:03 +0000 (17:18 +0000)] 
OPENNLP-699
due to movement of MarkableFileInputStreamFactory and MarkableFileInputStream classes to utils, the import changed in GenericModelableImpl

4 years agoOPENNLP-698
Mark Giaconia [Mon, 19 May 2014 14:26:39 +0000 (14:26 +0000)] 
OPENNLP-698
Fixed cleanInput() method so it handles multi token names. Now there is a property that can be added to the entitylinker.properties file, in which user can define whether to use double quotes around names or not.

4 years agoOPENNLP-698
Mark Giaconia [Mon, 19 May 2014 13:18:06 +0000 (13:18 +0000)] 
OPENNLP-698
Fixed cleanInput() method so it handles multi token names. Names are now returned in double quotes.

4 years agoOPENNLP-693
Mark Giaconia [Mon, 12 May 2014 19:30:21 +0000 (19:30 +0000)] 
OPENNLP-693
OPENNLP-694
OPENNLP-692
Added log4j logging. Added lucene spatial. removed the optional tags from pom for lucene dependency. Also added string sanitizing to Gazetteer searcher so lucene will stop logging syntax problems on noisy NER results.

4 years agoOPENNLP-664
Mark Giaconia [Fri, 7 Mar 2014 12:15:35 +0000 (12:15 +0000)] 
OPENNLP-664
Fixed, now country codes are no longer ignored.

4 years agoOPENNLP-630
Mark Giaconia [Wed, 5 Mar 2014 14:21:54 +0000 (14:21 +0000)] 
OPENNLP-630
Fixed ltoString() in linkedspan and baselink to be more friendly to the cli tool (and others).

4 years agoOPENNLP-630
Mark Giaconia [Fri, 28 Feb 2014 12:21:43 +0000 (12:21 +0000)] 
OPENNLP-630
Fixed println to be more friendly to the cli tool (and others). Also did some general cleanup like spelling errors and indexing changes

4 years agoNo jira, fixed compile error by reverting back to deprecated API. Lets update it...
Jörn Kottmann [Fri, 21 Feb 2014 15:39:10 +0000 (15:39 +0000)] 
No jira, fixed compile error by reverting back to deprecated API. Lets update it later.

4 years agoOPENNLP-636 Trainer now uses init method instead of constructor to initialize the...
Jörn Kottmann [Thu, 20 Feb 2014 09:58:33 +0000 (09:58 +0000)] 
OPENNLP-636 Trainer now uses init method instead of constructor to initialize the component.

4 years agoOPENNLP-574 Moved from addons to sandbox to mature there.
Jörn Kottmann [Tue, 18 Feb 2014 13:39:36 +0000 (13:39 +0000)] 
OPENNLP-574 Moved from addons to sandbox to mature there.

4 years agoOPENNLP-615
Mark Giaconia [Sun, 16 Feb 2014 21:51:41 +0000 (21:51 +0000)] 
OPENNLP-615
Greatly simplified fuzzy string match scoring by simply normalizing the lucene output levenstein, and fixed a bug in the filtering of hits below the thresh. Refined deduping logic a bit, and made the default bag of words radius for doccat larger,  which improved scores in testing.

4 years agoOPENNLP-637
Mark Giaconia [Thu, 13 Feb 2014 20:23:28 +0000 (20:23 +0000)] 
OPENNLP-637
Greatly simplified point clustering, and ensured no duplication is created.

4 years agoOPENNLP-615
Mark Giaconia [Sat, 8 Feb 2014 18:50:36 +0000 (18:50 +0000)] 
OPENNLP-615
Fixed doccat training portion due to recent changes using inputStreamFactory

4 years agoOPENNLP-607
Mark Giaconia [Sat, 8 Feb 2014 18:23:54 +0000 (18:23 +0000)] 
OPENNLP-607
Fixed training portion due to recent changes using inputStreamFactory

4 years agoOPENNLP-600
Mark Giaconia [Sun, 2 Feb 2014 14:39:41 +0000 (14:39 +0000)] 
OPENNLP-600
Changed to MockInputStreamFactory for training

4 years agoOPENNLP-600
Mark Giaconia [Sun, 2 Feb 2014 14:39:13 +0000 (14:39 +0000)] 
OPENNLP-600
Changed to MockInputStreamFactory for doccat model training

4 years agoOPENNLP-637
Mark Giaconia [Sat, 18 Jan 2014 20:03:54 +0000 (20:03 +0000)] 
OPENNLP-637
OPENNLP-639
Fixed and optimized GazateerSearcher to cache properly. Added hascode and equals to gazateer entry and ensured no duplicates are returned.

4 years agoOPENNLP-579
Mark Giaconia [Sun, 12 Jan 2014 14:44:54 +0000 (14:44 +0000)] 
OPENNLP-579
Many efficiencies. Fails gracefully if any resources are missing (Gazateers, countrycontext data, etc)
Updated  javadocs and comments

4 years agoOPENNLP-615
Mark Giaconia [Fri, 10 Jan 2014 14:15:08 +0000 (14:15 +0000)] 
OPENNLP-615
improved logic for when ModelBasedScorer property is missing, or has no configured model, or the model file does not exist

4 years agomoved from sandbox
Mark Giaconia [Fri, 10 Jan 2014 13:12:02 +0000 (13:12 +0000)] 
moved from sandbox

4 years agomoved from sandbox
Mark Giaconia [Fri, 10 Jan 2014 13:11:24 +0000 (13:11 +0000)] 
moved from sandbox

4 years agoOPENNLP-574 Intial work to integrate Mahouts Logistic Regression Classifiers
Jörn Kottmann [Wed, 8 Jan 2014 20:39:08 +0000 (20:39 +0000)] 
OPENNLP-574 Intial work to integrate Mahouts Logistic Regression Classifiers

4 years agoOPENNLP-624 Removed test code
Jörn Kottmann [Thu, 2 Jan 2014 15:13:27 +0000 (15:13 +0000)] 
OPENNLP-624 Removed test code

4 years agoAdded LICENE and NOTICE file for addons
Jörn Kottmann [Thu, 2 Jan 2014 11:12:02 +0000 (11:12 +0000)] 
Added LICENE and NOTICE file for addons

4 years agoOPENNLP-624 Training parameters can now be adjusted
Jörn Kottmann [Thu, 2 Jan 2014 10:46:51 +0000 (10:46 +0000)] 
OPENNLP-624 Training parameters can now be adjusted

4 years agoOPENNLP-624 Updated liblinear from 1.92 to 1.94
Jörn Kottmann [Thu, 2 Jan 2014 09:36:15 +0000 (09:36 +0000)] 
OPENNLP-624 Updated liblinear from 1.92 to 1.94

4 years agoOPENNLP-624 Fixed a compiliation error, and model can now be loaded and serialized
Jörn Kottmann [Wed, 1 Jan 2014 18:16:26 +0000 (18:16 +0000)] 
OPENNLP-624 Fixed a compiliation error, and model can now be loaded and serialized

5 years ago OPENNLP-624 Initial check in of the liblinear integration
Jörn Kottmann [Tue, 3 Dec 2013 15:09:57 +0000 (15:09 +0000)] 
 OPENNLP-624 Initial check in of the liblinear integration

5 years agoOPENNLP-622 Added code to create Morfologik data from TSV or OpenNLP XML tag dictiona...
William Colen [Mon, 2 Dec 2013 13:23:04 +0000 (13:23 +0000)] 
OPENNLP-622 Added code to create Morfologik data from TSV or OpenNLP XML tag dictionaries. Created a TagDictionary implementation using Morfologik. Added a POSTaggerFactory to bundle the Morfologik dictionaries in POS Tagger models.

5 years agoOPENNLP-582 Added JWNL based lemmatizer implementation. Thanks to Rodrigo Agerri...
Jörn Kottmann [Wed, 20 Nov 2013 10:15:36 +0000 (10:15 +0000)] 
OPENNLP-582 Added JWNL based lemmatizer implementation. Thanks to Rodrigo Agerri for providing a patch

5 years agoOPENNLP-582 Added morfologik addon. Thanks to Rodrigo Agerri for providing a patch.
Jörn Kottmann [Thu, 14 Nov 2013 21:24:13 +0000 (21:24 +0000)] 
OPENNLP-582 Added morfologik addon. Thanks to Rodrigo Agerri for providing a patch.