Discussion:
[POLL] What should happen to PyLucene now?
Jan Høydahl
2016-07-01 09:32:23 UTC
Permalink
Hi

As you all know not much has happened with PyLucene lately.
So I’m throwing out this poll to check the sentiment of the community.

Question: What should happen to PyLucene now?

[ ] I’m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[ ] Only care about the JCC part
[ ] Close down the sub project
[ ] Don’t care. I’m no longer a user
[ ] Other: ______________

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
Mark Csaba
2016-07-01 10:12:32 UTC
Permalink
Hello :)

-----Original Message-----
From: Jan HÞydahl [mailto:***@cominvent.com]
Sent: Friday, July 01, 2016 11:32 AM
To: pylucene-***@lucene.apache.org
Subject: [POLL] What should happen to PyLucene now?

Hi

As you all know not much has happened with PyLucene lately.
So I’m throwing out this poll to check the sentiment of the community.

Question: What should happen to PyLucene now?

[ ] I’m happy with the last 4.x release, no need for new releases

[X] Please, a new 6.x release (but I can’t contribute)

[ ] I’ll help make a new release happen, if I get some help!
[ ] Only care about the JCC part
[ ] Close down the sub project
[ ] Don’t care. I’m no longer a user
[ ] Other: ______________

--
Jan HÞydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
Dirk Rothe
2016-07-01 12:41:32 UTC
Permalink
Post by Jan Høydahl
Hi
As you all know not much has happened with PyLucene lately.
So I=E2=80=99m throwing out this poll to check the sentiment of the co=
mmunity.
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[X] Still mostly happy with the 3.6 release, pondering for at least 2 =

years whether to migrate to 4.x or elasticsearch.

And really grateful for the excellent job of Andi and the other =

contributors. Thanx!
Post by Jan Høydahl
[ ] I=E2=80=99m happy with the last 4.x release, no need for new rele=
ases
Post by Jan Høydahl
[ ] Please, a new 6.x release (but I can=E2=80=99t contribute)
[ ] I=E2=80=99ll help make a new release happen, if I get some help!
[ ] Only care about the JCC part
[ ] Close down the sub project
[ ] Don=E2=80=99t care. I=E2=80=99m no longer a user
[ ] Other: ______________
--
Jan H=C3=B8ydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
Joe Cabrera
2016-07-01 13:12:57 UTC
Permalink
[X] I’ll help make a new release happen, if I get some help!
Post by Dirk Rothe
Hi
Post by Jan Høydahl
As you all know not much has happened with PyLucene lately.
So I’m throwing out this poll to check the sentiment of the community.
Question: What should happen to PyLucene now?
[X] Still mostly happy with the 3.6 release, pondering for at least 2
years whether to migrate to 4.x or elasticsearch.
And really grateful for the excellent job of Andi and the other
contributors. Thanx!
[ ] I’m happy with the last 4.x release, no need for new releases
Post by Jan Høydahl
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[ ] Only care about the JCC part
[ ] Close down the sub project
[ ] Don’t care. I’m no longer a user
[ ] Other: ______________
--
Jan HÞydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
--
Joe Cabrera,
eminorlabs.com
Andi Vajda
2016-07-01 15:22:26 UTC
Permalink
[X] I’ll help make a new release happen, if I get some help!
The tests need porting :-)

Andi..
Post by Dirk Rothe
Hi
Post by Jan Høydahl
As you all know not much has happened with PyLucene lately.
So I’m throwing out this poll to check the sentiment of the community.
Question: What should happen to PyLucene now?
[X] Still mostly happy with the 3.6 release, pondering for at least 2
years whether to migrate to 4.x or elasticsearch.
And really grateful for the excellent job of Andi and the other
contributors. Thanx!
[ ] I’m happy with the last 4.x release, no need for new releases
Post by Jan Høydahl
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[ ] Only care about the JCC part
[ ] Close down the sub project
[ ] Don’t care. I’m no longer a user
[ ] Other: ______________
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
--
Joe Cabrera,
eminorlabs.com
Jeff Breidenbach
2016-07-01 16:30:20 UTC
Permalink
[x] I’m happy with the last 4.x release, no need for new releases

With emphasis on the "happy" part. Not meant to be discouraging in any way.
Marc Jeurissen
2016-07-01 10:23:24 UTC
Permalink
Post by Jan Høydahl
Hi
As you all know not much has happened with PyLucene lately.
So I’m throwing out this poll to check the sentiment of the community.
Question: What should happen to PyLucene now?
[ ] I’m happy with the last 4.x release, no need for new releases
[X] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[ ] Only care about the JCC part
[ ] Close down the sub project
[ ] Don’t care. I’m no longer a user
[ ] Other: ______________
--
Jan HÞydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
--
Signature Marc Jeurissen | UAntwerpen
Met vriendelijke groeten,

Marc Jeurissen

<http://anet.be>
Bibliotheek UAntwerpen
Stadscampus - S.A.085
Prinsstraat 9 - 2000 Antwerpen
***@uantwerpen.be <mailto:***@uantwerpen.be>
T +32 3 265 49 71
<http://anet.be>
Alexander Yaworsky
2016-07-01 16:35:43 UTC
Permalink
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I’m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don’t care. I’m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.

Thanks.

Alexander.
Greg Bowyer
2016-07-01 16:42:27 UTC
Permalink
Would it make more sense to move to py4j like spark uses. JCC is awesome
tech but installing it is a pain
Post by Alexander Yaworsky
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I’m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don’t care. I’m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.
Thanks.
Alexander.
Alexander Yaworsky
2016-07-01 16:56:18 UTC
Permalink
No, it wouldn't.I haven't tried py4j, just looked at docs and the key
word I see is 'dynamic'. We used dynamic wrappers (I don't remember
exact names, that was too long ago, but the problem using JNI that way
is poor performance. JCC, in contrast, generates wrappers that provide
outstanding performance. This is what I love it for.

It's not a pain, I should say. Quite easy if you clearly understand
what it does :)

I would move it to P3 itself, but I have no enough time and P2 support
still gives 3.5 years for that :)

Alexander.
Post by Greg Bowyer
Would it make more sense to move to py4j like spark uses. JCC is awesome
tech but installing it is a pain
Post by Alexander Yaworsky
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I’m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don’t care. I’m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.
Thanks.
Alexander.
Aric Coady
2016-07-02 18:30:55 UTC
Permalink
[X] I’ll help make a new release happen, if I get some help!
Post by Alexander Yaworsky
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
And +1 on staying current with lucene and python.
Post by Alexander Yaworsky
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I’m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don’t care. I’m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.
Thanks.
Alexander.
Andi Vajda
2016-07-10 21:42:16 UTC
Permalink
Thank you Jan for starting this thread !

Of the nine people that responded, three were interested in a new 6.x
release, with two offering to help make a new release happen.

A couple of others showed interest in JCC only.

Here is what I can propose:
1. I can make sure a PyLucene can be buildt from Lucene 6.x and runs.
2. Volunteers should then help in porting old 4.x tests, if they still
apply, and import new tests from the current Lucene suite as they see
fit.
3. Once everyone involved is happy with test coverage (which was never
exhaustive and need not be), a new release can be rolled and the
Lucene PMC put to contribution again for votes.

If any of these steps end up stalling, no new release happens and the
PyLucene subproject gets shutdown, eventually.

As for JCC, regardless of what happens to PyLucene itself, I'd very much
like to port it to Python 3. I've already done this once, the port is
available in a branch [1]. It 'just' needs to be refreshed. I intend to
eventually get to this, unless someone with a stronger itch beats me to it.

Andi..

[1] http://svn.apache.org/repos/asf/lucene/pylucene/branches/python_3/jcc/
[X] I?ll help make a new release happen, if I get some help!
Post by Alexander Yaworsky
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
And +1 on staying current with lucene and python.
Post by Alexander Yaworsky
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I?m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can?t contribute)
[ ] I?ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don?t care. I?m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.
Thanks.
Alexander.
Andi Vajda
2016-08-22 15:37:24 UTC
Permalink
Post by Andi Vajda
Thank you Jan for starting this thread !
Of the nine people that responded, three were interested in a new 6.x
release, with two offering to help make a new release happen.
A couple of others showed interest in JCC only.
1. I can make sure a PyLucene can be buildt from Lucene 6.x and runs.
PyLucene can now be built from Lucene's branch 6.x, on Mac OS X.
It builds, loads, can run a couple of simple tests like test_Binary.py and
test_BinaryDocument.py.

Here is how one can reproduce what I just did:
- cd ~/apache
- git clone --branch branch_6x https://github.com/apache/lucene-solr.git lucene.6x
- cd <pylucene dir>
- svn update
make sure you have a modern setuptools (if you are on linux, the
setuptools patching done by JCC to be able to build a plain shared
library most likely needs to be refreshed or maybe even eliminated).
- _install/bin/pip uninstall setuptools
- _install/bin/pip install setuptools
- cd jcc
- ../_install/bin/python setup.py build install
- cd ..
- make sources (this copies the lucene tree from the github tree cloned)
- make compile install

If all worked, you can then:
- _install/bin/python
Post by Andi Vajda
import lucene
lucene.initVM()
- _install/bin/python test/test_Binary.py

I have a Python virtual env installed in pylucene/_install, this helps with
keeping different versions of software separate.
Post by Andi Vajda
2. Volunteers should then help in porting old 4.x tests, if they still
apply, and import new tests from the current Lucene suite as they see
fit.
All other tests need to be carefully ported to match all the numerous API
changes and disappeared classes. For similar reasons, the extensions jar
does not build and is not currently included in the build. Its source java
classes need to be refreshed as tests get refreshed to 6.x.

Andi..
Post by Andi Vajda
3. Once everyone involved is happy with test coverage (which was never
exhaustive and need not be), a new release can be rolled and the
Lucene PMC put to contribution again for votes.
If any of these steps end up stalling, no new release happens and the
PyLucene subproject gets shutdown, eventually.
As for JCC, regardless of what happens to PyLucene itself, I'd very much like
to port it to Python 3. I've already done this once, the port is available in
a branch [1]. It 'just' needs to be refreshed. I intend to eventually get to
this, unless someone with a stronger itch beats me to it.
Andi..
[1] http://svn.apache.org/repos/asf/lucene/pylucene/branches/python_3/jcc/
[X] I?ll help make a new release happen, if I get some help!
On Jul 1, 2016, at 9:35 AM, Alexander Yaworsky
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
And +1 on staying current with lucene and python.
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I?m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can?t contribute)
[ ] I?ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don?t care. I?m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.
Thanks.
Alexander.
Andi Vajda
2016-09-03 16:34:58 UTC
Permalink
Post by Andi Vajda
Post by Andi Vajda
Thank you Jan for starting this thread !
Of the nine people that responded, three were interested in a new 6.x
release, with two offering to help make a new release happen.
A couple of others showed interest in JCC only.
1. I can make sure a PyLucene can be buildt from Lucene 6.x and runs.
PyLucene can now be built from Lucene's branch 6.x, on Mac OS X.
It builds, loads, can run a couple of simple tests like test_Binary.py and
test_BinaryDocument.py.
- cd ~/apache
- git clone --branch branch_6x https://github.com/apache/lucene-solr.git lucene.6x
- cd <pylucene dir>
- svn update
make sure you have a modern setuptools (if you are on linux, the
setuptools patching done by JCC to be able to build a plain shared
library most likely needs to be refreshed or maybe even eliminated).
- _install/bin/pip uninstall setuptools
- _install/bin/pip install setuptools
- cd jcc
- ../_install/bin/python setup.py build install
- cd ..
- make sources (this copies the lucene tree from the github tree cloned)
- make compile install
- _install/bin/python
Post by Andi Vajda
import lucene
lucene.initVM()
- _install/bin/python test/test_Binary.py
I have a Python virtual env installed in pylucene/_install, this helps with
keeping different versions of software separate.
Post by Andi Vajda
2. Volunteers should then help in porting old 4.x tests, if they still
apply, and import new tests from the current Lucene suite as they see
fit.
All other tests need to be carefully ported to match all the numerous API
changes and disappeared classes. For similar reasons, the extensions jar does
not build and is not currently included in the build. Its source java classes
need to be refreshed as tests get refreshed to 6.x.
PyLucene now builds and passes all its tests on Mac OS X and Linux.
It is thus in a state where a release candidate could be built and submitted
for review.

A volunteer is requested to build and test PyLucene's trunk on Windows. If
noone comes forward, I intend to try to release PyLucene 6.2 in a few weeks,
still.

Thanks !

Andi..
Post by Andi Vajda
Andi..
Post by Andi Vajda
3. Once everyone involved is happy with test coverage (which was never
exhaustive and need not be), a new release can be rolled and the
Lucene PMC put to contribution again for votes.
If any of these steps end up stalling, no new release happens and the
PyLucene subproject gets shutdown, eventually.
As for JCC, regardless of what happens to PyLucene itself, I'd very much
like to port it to Python 3. I've already done this once, the port is
available in a branch [1]. It 'just' needs to be refreshed. I intend to
eventually get to this, unless someone with a stronger itch beats me to it.
Andi..
[1] http://svn.apache.org/repos/asf/lucene/pylucene/branches/python_3/jcc/
[X] I?ll help make a new release happen, if I get some help!
On Jul 1, 2016, at 9:35 AM, Alexander Yaworsky
Well, this bothered me (not a dev but fixed some of your bugs locally
long long ago, why didn't send patches is another story). Here's my
opinion, as a user. 1. Be in sync with lucene is a must. 2. Be in sync
with python is a must. Therefore,
And +1 on staying current with lucene and python.
Post by Jan Høydahl
Question: What should happen to PyLucene now?
[ ] I?m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can?t contribute)
[ ] I?ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[X] Close down the sub project -- IF YOU ARE UNABLE TO MAINTAIN
[ ] Don?t care. I?m no longer a user
[X] Other: Move JCC to P3
Actually, the brilliant part of this project is JCC. In a company I
work for we still use it to utilize Java libraries from python. This
is the fastest solution and this sub-project must exist separately
imo. We do not use Lucene since 00's btw.
Thanks.
Alexander.
Dirk Rothe
2016-09-05 15:42:02 UTC
Permalink
Post by Andi Vajda
Post by Andi Vajda
- cd ~/apache
- git clone --branch branch_6x
https://github.com/apache/lucene-solr.git lucene.6x
- cd <pylucene dir>
- svn update
make sure you have a modern setuptools (if you are on linux, the
setuptools patching done by JCC to be able to build a plain shared
library most likely needs to be refreshed or maybe even eliminated).
- _install/bin/pip uninstall setuptools
- _install/bin/pip install setuptools
- cd jcc
- ../_install/bin/python setup.py build install
- cd ..
- make sources (this copies the lucene tree from the github tree cloned)
- make compile install
- _install/bin/python
Post by Andi Vajda
import lucene
lucene.initVM()
- _install/bin/python test/test_Binary.py
I have a Python virtual env installed in pylucene/_install, this helps
with keeping different versions of software separate.
Post by Andi Vajda
2. Volunteers should then help in porting old 4.x tests, if they still
apply, and import new tests from the current Lucene suite as they see
fit.
All other tests need to be carefully ported to match all the numerous
API changes and disappeared classes. For similar reasons, the
extensions jar does not build and is not currently included in the
build. Its source java classes need to be refreshed as tests get
refreshed to 6.x.
PyLucene now builds and passes all its tests on Mac OS X and Linux.
It is thus in a state where a release candidate could be built and
submitted for review.
A volunteer is requested to build and test PyLucene's trunk on Windows.
If noone comes forward, I intend to try to release PyLucene 6.2 in a few
weeks, still.
Nice Job!

I've successfully build PyLucene 6.2 on windows. Most tests pass:
* skipped the three test_ICU* due to missing "import icu"
* fixed test_PyLucene.py by ignoring open file handles (os.error) in
shutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()

* then stuff like these in test_PythonDirectory.py
======================================================================
ERROR: test_FieldEnumeration (__main__.PythonDirectoryTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\vls-lucene43\misc\pylucene_62\test\test_PyLucene.py", line 234,
in test_FieldEnumeration
self.test_indexDocument()
File "D:\vls-lucene43\misc\pylucene_62\test\test_PyLucene.py", line 86,
in test_indexDocument
self.closeStore(store, writer)
File "test_PythonDirectory.py", line 254, in closeStore
arg.close()
JavaError: <super: <class 'JavaError'>, <JavaError object>>
Java stacktrace:
java.lang.RuntimeException: InvalidArgsError
at org.apache.pylucene.store.PythonDirectory.deleteFile(Native Method)
at
org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
at
org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:721)
at
org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:715)
at
org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:691)
at org.apache.lucene.index.IndexWriter.flushFailed(IndexWriter.java:4929)
at
org.apache.lucene.index.DocumentsWriter$FlushFailedEvent.process(DocumentsWriter.java:758)
at
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4991)
at
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4982)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3372)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3333)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1117)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1162)

* and this one in test_PythonException.py
======================================================================
ERROR: testThroughLayerException (__main__.PythonExceptionTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_PythonException.py", line 34, in testThroughLayerException
qp.parse("foo bar")
JavaError: <super: <class 'JavaError'>, <JavaError object>>
Java stacktrace:
java.lang.RuntimeException: TestException
at
org.apache.pylucene.queryparser.classic.PythonQueryParser.getFieldQuery_quoted(Native
Method)
at
org.apache.pylucene.queryparser.classic.PythonQueryParser.getFieldQuery(Unknown
Source)
at
org.apache.lucene.queryparser.classic.QueryParser.MultiTerm(QueryParser.java:585)
at
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:198)
at
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:187)
at
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:111)

--dirk
Andi Vajda
2016-09-05 19:27:46 UTC
Permalink
Post by Dirk Rothe
Post by Andi Vajda
Post by Andi Vajda
- cd ~/apache
- git clone --branch branch_6x https://github.com/apache/lucene-solr.git lucene.6x
- cd <pylucene dir>
- svn update
make sure you have a modern setuptools (if you are on linux, the
setuptools patching done by JCC to be able to build a plain shared
library most likely needs to be refreshed or maybe even eliminated).
- _install/bin/pip uninstall setuptools
- _install/bin/pip install setuptools
- cd jcc
- ../_install/bin/python setup.py build install
- cd ..
- make sources (this copies the lucene tree from the github tree cloned)
- make compile install
- _install/bin/python
Post by Andi Vajda
import lucene
lucene.initVM()
- _install/bin/python test/test_Binary.py
I have a Python virtual env installed in pylucene/_install, this helps
with keeping different versions of software separate.
Post by Andi Vajda
2. Volunteers should then help in porting old 4.x tests, if they still
apply, and import new tests from the current Lucene suite as they see
fit.
All other tests need to be carefully ported to match all the numerous API
changes and disappeared classes. For similar reasons, the extensions jar
does not build and is not currently included in the build. Its source java
classes need to be refreshed as tests get refreshed to 6.x.
PyLucene now builds and passes all its tests on Mac OS X and Linux.
It is thus in a state where a release candidate could be built and
submitted for review.
A volunteer is requested to build and test PyLucene's trunk on Windows. If
noone comes forward, I intend to try to release PyLucene 6.2 in a few
weeks, still.
Nice Job!
* skipped the three test_ICU* due to missing "import icu"
Yes, for this you need to install PyICU: https://github.com/ovalhub/pyicu
Post by Dirk Rothe
* fixed test_PyLucene.py by ignoring open file handles (os.error) in
shutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()
Do you have a patch for me to apply ?
Post by Dirk Rothe
* then stuff like these in test_PythonDirectory.py
======================================================================
ERROR: test_FieldEnumeration (__main__.PythonDirectoryTests)
----------------------------------------------------------------------
File "D:\vls-lucene43\misc\pylucene_62\test\test_PyLucene.py", line 234, in
test_FieldEnumeration
self.test_indexDocument()
File "D:\vls-lucene43\misc\pylucene_62\test\test_PyLucene.py", line 86, in
test_indexDocument
self.closeStore(store, writer)
File "test_PythonDirectory.py", line 254, in closeStore
arg.close()
JavaError: <super: <class 'JavaError'>, <JavaError object>>
java.lang.RuntimeException: InvalidArgsError
at org.apache.pylucene.store.PythonDirectory.deleteFile(Native Method)
at
org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
at
org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:721)
at
org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:715)
at
org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:691)
at
org.apache.lucene.index.IndexWriter.flushFailed(IndexWriter.java:4929)
at
org.apache.lucene.index.DocumentsWriter$FlushFailedEvent.process(DocumentsWriter.java:758)
at
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4991)
at
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4982)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3372)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3333)
at
org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1117)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1162)
Can't make sense of this one, sorry.
Post by Dirk Rothe
* and this one in test_PythonException.py
======================================================================
ERROR: testThroughLayerException (__main__.PythonExceptionTestCase)
----------------------------------------------------------------------
File "test_PythonException.py", line 34, in testThroughLayerException
qp.parse("foo bar")
JavaError: <super: <class 'JavaError'>, <JavaError object>>
java.lang.RuntimeException: TestException
at
org.apache.pylucene.queryparser.classic.PythonQueryParser.getFieldQuery_quoted(Native
Method)
at
org.apache.pylucene.queryparser.classic.PythonQueryParser.getFieldQuery(Unknown
Source)
at
org.apache.lucene.queryparser.classic.QueryParser.MultiTerm(QueryParser.java:585)
at
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:198)
at
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:187)
at
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:111)
This one could be because you may not have built JCC in shared mode ?
I vaguely remember there being a problem with proper cross-boundary
exception propagation requiring JCC to be built in shared mode.

Thank you doing this !

Andi..
Post by Dirk Rothe
--dirk
Andi Vajda
2016-09-08 09:10:14 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
A volunteer is requested to build and test PyLucene's trunk on Windows.
If noone comes forward, I intend to try to release PyLucene 6.2 in a few
weeks, still.
Nice Job!
* skipped the three test_ICU* due to missing "import icu"
Yes, for this you need to install PyICU: https://github.com/ovalhub/pyicu
I'm going to assume this would work for now.
Post by Andi Vajda
Post by Dirk Rothe
* fixed test_PyLucene.py by ignoring open file handles (os.error) in
shutil.rmtree() in Test_PyLuceneWithFSStore.tearDown()
Do you have a patch for me to apply ?
Yes, attached.
Thanks, applied.
Post by Andi Vajda
Post by Dirk Rothe
* then stuff like these in test_PythonDirectory.py
[..]
Post by Andi Vajda
Can't make sense of this one, sorry.
Post by Dirk Rothe
* and this one in test_PythonException.py
[..]
Post by Andi Vajda
This one could be because you may not have built JCC in shared mode ?
I vaguely remember there being a problem with proper cross-boundary
exception propagation requiring JCC to be built in shared mode.
jcc.SHARED reports True, so seems OK.
I don't think these Windows glitches are really problematic, and our
production code runs only in linux environments anyway.
And I'm more interested in whether porting around 3kloc lucene-interfaces
from v3.6 goes smoothly.
I've hit the first problematic case with an custom
PythonAnalyzer/PythonTokenizer where I don't see how to pass the input to the
Tokenizer implementation.
I thought maybe like this, but PythonTokenizer does not accept an INPUT
anymore (available in v4.10 and v3.6).
super(_Tokenizer, self).__init__(INPUT)
# prepare INPUT
# stuff into termAtt/offsetAtt/posIncrAtt
return Analyzer.TokenStreamComponents(_Tokenizer())
The PositionIncrementTestCase is pretty similar but initialized with static
input. Would be a nice place for an example with dynamic input, I think.
data = data_from_reader(reader)
super(_tokenStream, self).__init__()
# prepare termAtt/offsetAtt/posIncrAtt
# stuff from data into termAtt/offsetAtt/posIncrAtt
return _tokenStream()
Any hints how to get Analyzer6 working?
I've lost track of the countless API changes since 3.x.

The Lucene project does a good job at tracking them in the CHANGES.txt file,
usually pointing at the issue that tracked it, often with examples about how
to accomplish the same in the new way and the rationale behind the change.

You can also look at the PyLucene tests I just ported to 6.x. For example,
in test_Analyzers.py, you can see that Tokenizer no longer takes a reader
but can be set one with setReader() after construction.

Andi..
--dirk
Dirk Rothe
2016-09-08 13:42:38 UTC
Permalink
Post by Andi Vajda
super(_Tokenizer, self).__init__(INPUT)
# prepare INPUT
# stuff into termAtt/offsetAtt/posIncrAtt
return Analyzer.TokenStreamComponents(_Tokenizer())
The PositionIncrementTestCase is pretty similar but initialized with
static input. Would be a nice place for an example with dynamic input,
I think.
data = data_from_reader(reader)
super(_tokenStream, self).__init__()
# prepare termAtt/offsetAtt/posIncrAtt
# stuff from data into termAtt/offsetAtt/posIncrAtt
return _tokenStream()
Any hints how to get Analyzer6 working?
I've lost track of the countless API changes since 3.x.
The Lucene project does a good job at tracking them in the CHANGES.txt
file, usually pointing at the issue that tracked it, often with examples
about how to accomplish the same in the new way and the rationale behind
the change.
I guess we are here:
https://issues.apache.org/jira/browse/LUCENE-5388
https://svn.apache.org/viewvc?view=revision&revision=1556801
Post by Andi Vajda
You can also look at the PyLucene tests I just ported to 6.x. For
example, in test_Analyzers.py, you can see that Tokenizer no longer
takes a reader but can be set one with setReader() after construction.
Yes, I've done that pretty carefully. I think, this quote points in the
right direction: "The tokenStream method takes a String or Reader and will
pass this to Tokenizer#setReader()."
from:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201502.mbox/%3C021701d04f86$55331f10$ff995d30$@thetaphi.de%3E

I've checked the lucene source and this happens automatically an cannot be
overwritten.

So I've hacked something ugly together which seems to work.

class _Tokenizer(PythonTokenizer):
def __init__(self, getReader):
super(_Tokenizer, self).__init__()
self.getReader = getReader
self.i = 0
self.data = []

def incrementToken(self):
if self.i == 0:
self.data = data_from_reader(self.getReader())
if self.i == len(self.data):
# we are reused - reset
self.i = 0
return False
# stuff from self.data into termAtt/offsetAtt/posIncrAtt
self.i += 1
return True

class Analyzer6(PythonAnalyzer):
def createComponents(self, fieldName):
return Analyzer.TokenStreamComponents(_Tokenizer(lambda:
self._reader))
def initReader(self, fieldName, reader):
# capture reader
self._reader = reader
return reader

I've made initReader() python-overridable (see patch). What do you think?

--dirk
Andi Vajda
2016-09-08 13:56:58 UTC
Permalink
Post by Dirk Rothe
Post by Andi Vajda
super(_Tokenizer, self).__init__(INPUT)
# prepare INPUT
# stuff into termAtt/offsetAtt/posIncrAtt
return Analyzer.TokenStreamComponents(_Tokenizer())
The PositionIncrementTestCase is pretty similar but initialized with
static input. Would be a nice place for an example with dynamic input, I
think.
data = data_from_reader(reader)
super(_tokenStream, self).__init__()
# prepare termAtt/offsetAtt/posIncrAtt
# stuff from data into termAtt/offsetAtt/posIncrAtt
return _tokenStream()
Any hints how to get Analyzer6 working?
I've lost track of the countless API changes since 3.x.
The Lucene project does a good job at tracking them in the CHANGES.txt
file, usually pointing at the issue that tracked it, often with examples
about how to accomplish the same in the new way and the rationale behind
the change.
https://issues.apache.org/jira/browse/LUCENE-5388
https://svn.apache.org/viewvc?view=revision&revision=1556801
Post by Andi Vajda
You can also look at the PyLucene tests I just ported to 6.x. For example,
in test_Analyzers.py, you can see that Tokenizer no longer takes a reader
but can be set one with setReader() after construction.
Yes, I've done that pretty carefully. I think, this quote points in the right
direction: "The tokenStream method takes a String or Reader and will pass
this to Tokenizer#setReader()."
I've checked the lucene source and this happens automatically an cannot be
overwritten.
So I've hacked something ugly together which seems to work.
super(_Tokenizer, self).__init__()
self.getReader = getReader
self.i = 0
self.data = []
self.data = data_from_reader(self.getReader())
# we are reused - reset
self.i = 0
return False
# stuff from self.data into termAtt/offsetAtt/posIncrAtt
self.i += 1
return True
self._reader))
# capture reader
self._reader = reader
return reader
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided to
make this 'hard', it may be a sign that you're doing something wrong or
going the wrong way about it.

I suggest you ask on the java-***@lucene.apache.org list as you're probably
not the first one to transition from 3.x to something more recent.

Please let pylucene-dev@ know what you find out...

Andi..
Post by Dirk Rothe
--dirk
Dirk Rothe
2016-09-08 20:17:46 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided
to make this 'hard', it may be a sign that you're doing something wrong
or going the wrong way about it.
probably not the first one to transition from 3.x to something more
recent.
OK.

Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.

--dirk
Andi Vajda
2016-09-08 22:29:36 UTC
Permalink
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided to
make this 'hard', it may be a sign that you're doing something wrong or
going the wrong way about it.
probably not the first one to transition from 3.x to something more recent.
OK.
Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.
Actually, your patch is not good enough. You need to add an implementation
for initReader() in all the tests that make a subclass of PythonAnalyzer
(search for createComponents() implementations) otherwise, when initReader()
gets called from Java, you'll get a stack overflow (it'd be good, as an
aside, if I could make a better error out of that...).

Thanks !

Andi..
Post by Dirk Rothe
--dirk
Dirk Rothe
2016-09-09 07:05:18 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene
decided to make this 'hard', it may be a sign that you're doing
something wrong or going the wrong way about it.
probably not the first one to transition from 3.x to something more recent.
OK.
Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.
Actually, your patch is not good enough. You need to add an
implementation for initReader() in all the tests that make a subclass of
PythonAnalyzer (search for createComponents() implementations)
otherwise, when initReader() gets called from Java, you'll get a stack
overflow (it'd be good, as an aside, if I could make a better error out
of that...).
OK, I see the effect in samples/PorterStemmerAnalyzer.py (fixed some
imports there, use patch).

Shouldn't the need for implementation be optional? I don't understand.

--dirk
Andi Vajda
2016-09-09 08:51:48 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided
to make this 'hard', it may be a sign that you're doing something wrong
or going the wrong way about it.
probably not the first one to transition from 3.x to something more recent.
OK.
Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.
Actually, your patch is not good enough. You need to add an implementation
for initReader() in all the tests that make a subclass of PythonAnalyzer
(search for createComponents() implementations) otherwise, when
initReader() gets called from Java, you'll get a stack overflow (it'd be
good, as an aside, if I could make a better error out of that...).
OK, I see the effect in samples/PorterStemmerAnalyzer.py (fixed some imports
there, use patch).
Shouldn't the need for implementation be optional? I don't understand.
Once you define a native method on a class, a native method implementation
must be provided. JCC does that but that native implementation just invokes
the python implementation on the python subclass instance. If that python
subclass has no method implementation, the inherited method is invoked
again, which in turn calls the native method again, and so on until the
stack overflows.

This could maybe be improved at the JCC level but until it is, a Python
implementation must be provided. The default initReader() method just
returns 'reader' and so should the python default implementation.

Andi..
--dirk
Andi Vajda
2016-09-09 09:51:32 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided
to make this 'hard', it may be a sign that you're doing something wrong
or going the wrong way about it.
probably not the first one to transition from 3.x to something more recent.
OK.
Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.
Actually, your patch is not good enough. You need to add an implementation
for initReader() in all the tests that make a subclass of PythonAnalyzer
(search for createComponents() implementations) otherwise, when
initReader() gets called from Java, you'll get a stack overflow (it'd be
good, as an aside, if I could make a better error out of that...).
OK, I see the effect in samples/PorterStemmerAnalyzer.py (fixed some
imports there, use patch).
Shouldn't the need for implementation be optional? I don't understand.
Once you define a native method on a class, a native method implementation
must be provided. JCC does that but that native implementation just invokes
the python implementation on the python subclass instance. If that python
subclass has no method implementation, the inherited method is invoked again,
which in turn calls the native method again, and so on until the stack
overflows.
This could maybe be improved at the JCC level but until it is, a Python
implementation must be provided. The default initReader() method just returns
'reader' and so should the python default implementation.
I just did this so that I could produce a new RC and restart the voting
process. I added initReader() and all needed implementations in tests and
samples.

Thanks !

Andi..
Post by Andi Vajda
Andi..
Post by Dirk Rothe
--dirk
Andi Vajda
2016-09-09 08:52:50 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided
to make this 'hard', it may be a sign that you're doing something wrong
or going the wrong way about it.
probably not the first one to transition from 3.x to something more recent.
OK.
Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.
Actually, your patch is not good enough. You need to add an implementation
for initReader() in all the tests that make a subclass of PythonAnalyzer
(search for createComponents() implementations) otherwise, when
initReader() gets called from Java, you'll get a stack overflow (it'd be
good, as an aside, if I could make a better error out of that...).
OK, I see the effect in samples/PorterStemmerAnalyzer.py (fixed some imports
there, use patch).
Ooh, I forgot about samples. I need to check them and produce a new release
candidate. Thanks for pointing this out.

Andi..
Shouldn't the need for implementation be optional? I don't understand.
--dirk
Andi Vajda
2016-09-09 08:54:37 UTC
Permalink
Post by Andi Vajda
Post by Dirk Rothe
Post by Andi Vajda
Post by Dirk Rothe
I've made initReader() python-overridable (see patch). What do you think?
Not sure what to think. While your change looks fine, if Lucene decided
to make this 'hard', it may be a sign that you're doing something wrong
or going the wrong way about it.
probably not the first one to transition from 3.x to something more recent.
OK.
Making Analyzer.initReader() python-overridable is also important for
use-cases like this: http://stackoverflow.com/a/10290635
So the patch should be fine independently of my usage/hack.
Actually, your patch is not good enough. You need to add an implementation
for initReader() in all the tests that make a subclass of PythonAnalyzer
(search for createComponents() implementations) otherwise, when
initReader() gets called from Java, you'll get a stack overflow (it'd be
good, as an aside, if I could make a better error out of that...).
OK, I see the effect in samples/PorterStemmerAnalyzer.py (fixed some imports
there, use patch).
I see no patch attached probably because the apache mail server strips
attachments sent to mailing lists. Either you include the patch inline or
you send it to me as an attachment directly.

Thanks !

Andi..
Shouldn't the need for implementation be optional? I don't understand.
--dirk
Petrus Hyvönen
2016-07-02 06:07:57 UTC
Permalink
Hi,

I'm a user of JCC only. I would like to see a python-3 version but I do not
have the skills / time to do it myself unfortunately. I can help in testing.

I have looked at other alternative for JCC but none seems to provide the
integration such well, the "import" statement also is difficult in an
dynamic integration.

Many Thanks
/Petrus
Post by Jan Høydahl
Hi
As you all know not much has happened with PyLucene lately.
So I’m throwing out this poll to check the sentiment of the community.
Question: What should happen to PyLucene now?
[ ] I’m happy with the last 4.x release, no need for new releases
[ ] Please, a new 6.x release (but I can’t contribute)
[ ] I’ll help make a new release happen, if I get some help!
[X] Only care about the JCC part
[ ] Close down the sub project
[ ] Don’t care. I’m no longer a user
--
Jan HÞydahl, search solution architect
Cominvent AS - www.cominvent.com
Lucene commiter & PMC member
--
_____________________________________________
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00
Oliver Frietsch
2016-07-11 13:50:40 UTC
Permalink
Hi,

(Sorry, just joined the list and so I'm unable to use a "real reply" in
my mail client)

As for my company, our answers are:

[X] Still mostly happy with the 3.6 release, planning to upgrade to 4.x
some time in future (please don't ask when, also requires a lot of
changes in our product)...

[X] Have developed/updated Python3 Port of PyLucene3.6 and would be
happy to contribute this

Please note: Two tests are still failing, seem to be some encoding issues.

[X] Please, a new 6.x release (but we can’t contribute and are uncertain
if we'll upgrade soon)

Our PyLucene python 3 port is currently pushed forward by some
"almost-retired" (but still having fun to code!) partner of us, so there
is not too much manpower behind and not much more to expect in near
future. But if that helps, we're very happy to contribute that existing
patch.

Also, a huge thank you to all PyLucene committers so far! You did a
great job!

Oliver
Loading...