Discussion:
pylucene-6.4.1: Missing/Can't unzip jars Under lucene-java-6.4.1 Directory
Junjie Wei
2017-03-15 21:25:16 UTC
Permalink
Hi,

When I was trying to build pylucene-6.4.1 in Cygwin on Windows, the "$ make
build" exit with errors complaining that some jar files cannot be open. It
seems because some of the jars under lucene-java-6.4.1 are symbolic links
with size of 1k instead of concrete ones. Here is a list that I located
them with find command:

$ find ./lucene-java-6.4.1/ -name *.jar -size 1k
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/icu/lib/icu4j-56.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-fsa-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-polish-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-stemming-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/phonetic/lib/commons-codec-1.10.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Tagger-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/uimaj-core-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/WhitespaceTokenizer-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/antlr4-runtime-4.5.1-1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-5.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-commons-5.1.jar


After downloaded and replaced lucene-java-6.4.1 from
https://archive.apache.org/dist/lucene/java/6.4.1/, things went all good.

Is it an issue in the release, or I have missed something before built?

Thanks,

Junjie
Ruediger Meier
2017-03-15 21:53:16 UTC
Permalink
Post by Junjie Wei
Hi,
When I was trying to build pylucene-6.4.1 in Cygwin on Windows, the
"$ make build" exit with errors complaining that some jar files
cannot be open. It seems because some of the jars under
lucene-java-6.4.1 are symbolic links with size of 1k instead of
$ find ./lucene-java-6.4.1/ -name *.jar -size 1k
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/icu/lib/icu4j
-56.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
b/morfologik-fsa-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
b/morfologik-polish-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
b/morfologik-stemming-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/phonetic/lib/
commons-codec-1.10.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Tagg
er-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/uima
j-core-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Whit
espaceTokenizer-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/antlr4
-runtime-4.5.1-1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-5.
1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-co
mmons-5.1.jar
After downloaded and replaced lucene-java-6.4.1 from
https://archive.apache.org/dist/lucene/java/6.4.1/, things went all good.
Is it an issue in the release, or I have missed something before built?
Yes this is a minor but annoying issue of this realease. There are some
dead links packaged, pointing to Andi's home. like this one

./lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-stemming-2.1.1.jar -> /Users/vajda/.ivy2/cache/org.carrot2/morfologik-stemming/bundles/morfologik-stemming-2.1.1.jar

Maybe the "make release/distrib" target has a bug or these links where
commited to svn by mistake.

BTW this is no real issue on a real POSIX system. Cygwin seems to make
this worse as it has to emulate symlinks somehow. I guess instead of
downloading lucene manually you could have fixed it by just removing
all the bad links.

cu,
Rudi
Andi Vajda
2017-03-16 19:43:42 UTC
Permalink
Post by Ruediger Meier
Post by Junjie Wei
Hi,
When I was trying to build pylucene-6.4.1 in Cygwin on Windows, the
"$ make build" exit with errors complaining that some jar files
cannot be open. It seems because some of the jars under
lucene-java-6.4.1 are symbolic links with size of 1k instead of
$ find ./lucene-java-6.4.1/ -name *.jar -size 1k
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/icu/lib/icu4j
-56.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
b/morfologik-fsa-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
b/morfologik-polish-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/morfologik/li
b/morfologik-stemming-2.1.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/phonetic/lib/
commons-codec-1.10.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Tagg
er-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/uima
j-core-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/analysis/uima/lib/Whit
espaceTokenizer-2.3.1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/antlr4
-runtime-4.5.1-1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-5.
1.jar
./test/pylucene-6.4.1/lucene-java-6.4.1/lucene/expressions/lib/asm-co
mmons-5.1.jar
After downloaded and replaced lucene-java-6.4.1 from
https://archive.apache.org/dist/lucene/java/6.4.1/, things went all good.
Is it an issue in the release, or I have missed something before built?
Yes this is a minor but annoying issue of this realease. There are some
dead links packaged, pointing to Andi's home. like this one
./lucene-java-6.4.1/lucene/analysis/morfologik/lib/morfologik-stemming-2.1.1.jar -> /Users/vajda/.ivy2/cache/org.carrot2/morfologik-stemming/bundles/morfologik-stemming-2.1.1.jar
Maybe the "make release/distrib" target has a bug or these links where
commited to svn by mistake.
BTW this is no real issue on a real POSIX system. Cygwin seems to make
this worse as it has to emulate symlinks somehow. I guess instead of
downloading lucene manually you could have fixed it by just removing
all the bad links.
Indeed, this is a bug of mine.
What would you prefer:
- include the actual .jar files in the distribution archive (tell tar to
follow the symlinks when I build the PyLucene distribution)
- or exclude the symlinks (tell tar to exclude symlinks); your
running build would then use ivy to fetch them

Andi..
Post by Ruediger Meier
cu,
Rudi
Ruediger Meier
2017-03-16 22:01:31 UTC
Permalink
Post by Andi Vajda
Indeed, this is a bug of mine.
- include the actual .jar files in the distribution archive (tell
tar to follow the symlinks when I build the PyLucene distribution) -
or exclude the symlinks (tell tar to exclude symlinks); your running
build would then use ivy to fetch them
Usually my opinion is that tarballs should have the least possible
dependencies. But in this case where all the deps are hosted on the
same source (apache.org) I would not include it but download on build
time (if user has not downloaded it manually already).

Maybe we could even enhance the Makefile to automatically find an
already installed lucene or download the latest minor version. IMO it
makes no sense that pylucene users by default always use a non-bugfixed
outdated lucene. And I saw on this mailing list how difficult it can be
to get enough votes for a pylucene minor update.

The same goes for the jcc python package which the user has to install
manually anyways. We don't need to ship it with pylucene. I guess jcc
would be far more famous if it would be hosted decoupled of pylucene.
IMO jcc is a really amazing good working thing. pylucene is just a nice
example how easy you can use java libs via python.

cheers,
Rudi
Andi Vajda
2017-03-17 02:06:08 UTC
Permalink
Post by Ruediger Meier
Post by Andi Vajda
Indeed, this is a bug of mine.
- include the actual .jar files in the distribution archive (tell
tar to follow the symlinks when I build the PyLucene distribution) -
or exclude the symlinks (tell tar to exclude symlinks); your running
build would then use ivy to fetch them
Usually my opinion is that tarballs should have the least possible
dependencies. But in this case where all the deps are hosted on the
same source (apache.org) I would not include it but download on build
time (if user has not downloaded it manually already).
+1, I'm leaning towards not including these .jar files as well.
It saves about 20Mb on the pylucene distribution tar file and they can be
obtained from ivy anyway.
Post by Ruediger Meier
Maybe we could even enhance the Makefile to automatically find an
already installed lucene or download the latest minor version. IMO it
makes no sense that pylucene users by default always use a non-bugfixed
outdated lucene. And I saw on this mailing list how difficult it can be
to get enough votes for a pylucene minor update.
There is no such thing as a bugfixed Lucene. Each Lucene release has new bug
fixes but also new bugs, such is software development. Lucene also breaks
things on a regular basis inspite of being quite careful about backwards
compatibility, thus PyLucene unit tests have to be checked for each release.

The problem you're referring to would not be much of an issue if it was
easier to garner votes for a PyLucene release. A new release would happen in
lock step with each Lucene release, as was the case in the past, a few years
ago. There is a Lucene 6.5 release being talked about and I intend to
release a PyLucene 6.5 shortly thereafter.
Post by Ruediger Meier
The same goes for the jcc python package which the user has to install
manually anyways. We don't need to ship it with pylucene. I guess jcc
would be far more famous if it would be hosted decoupled of pylucene.
IMO jcc is a really amazing good working thing. pylucene is just a nice
example how easy you can use java libs via python.
Thank you for the kind words. JCC is already available without PyLucene from
Python's PyPI: https://pypi.python.org/pypi/JCC/2.23
JCC gets released on PyPI at the same time as the main Apache PyLucene release.

I agree that PyLucene is just an example of JCC usage but it's the main one
and PyLucene has been driving the features of JCC.

Andi..
Post by Ruediger Meier
cheers,
Rudi
Ruediger Meier
2017-03-17 03:34:49 UTC
Permalink
Post by Andi Vajda
Post by Ruediger Meier
Post by Andi Vajda
Indeed, this is a bug of mine.
- include the actual .jar files in the distribution archive
(tell tar to follow the symlinks when I build the PyLucene
distribution) - or exclude the symlinks (tell tar to exclude
symlinks); your running build would then use ivy to fetch them
Usually my opinion is that tarballs should have the least possible
dependencies. But in this case where all the deps are hosted on the
same source (apache.org) I would not include it but download on
build time (if user has not downloaded it manually already).
+1, I'm leaning towards not including these .jar files as well.
It saves about 20Mb on the pylucene distribution tar file and they
can be obtained from ivy anyway.
Post by Ruediger Meier
Maybe we could even enhance the Makefile to automatically find an
already installed lucene or download the latest minor version. IMO
it makes no sense that pylucene users by default always use a
non-bugfixed outdated lucene. And I saw on this mailing list how
difficult it can be to get enough votes for a pylucene minor
update.
There is no such thing as a bugfixed Lucene. Each Lucene release has
new bug fixes but also new bugs, such is software development. Lucene
also breaks things on a regular basis inspite of being quite careful
about backwards compatibility, thus PyLucene unit tests have to be
checked for each release.
The problem you're referring to would not be much of an issue if it
was easier to garner votes for a PyLucene release. A new release
would happen in lock step with each Lucene release, as was the case
in the past, a few years ago. There is a Lucene 6.5 release being
talked about and I intend to release a PyLucene 6.5 shortly
thereafter.
Well, I was speaking about the minor maintenance updates like 6.4.2 but
you know surely better about the quality of lucene updates.
Post by Andi Vajda
Post by Ruediger Meier
The same goes for the jcc python package which the user has to
install manually anyways. We don't need to ship it with pylucene. I
guess jcc would be far more famous if it would be hosted decoupled
of pylucene. IMO jcc is a really amazing good working thing.
pylucene is just a nice example how easy you can use java libs via
python.
Thank you for the kind words. JCC is already available without
PyLucene from Python's PyPI: https://pypi.python.org/pypi/JCC/2.23
JCC gets released on PyPI at the same time as the main Apache
PyLucene release.
I agree that PyLucene is just an example of JCC usage but it's the
main one and PyLucene has been driving the features of JCC.
Yep, jcc only exists because of pylucene. And good that pylucene's
development and user base guarantees that jcc will be well maintained
in future too. On the other hand pylucene may be some kind of show
stopper for jcc. Why wasn't the old experimental jcc/py3 port released
quickly on PyPI 7 years ago? Is there any chance to get the recent
jcc/py3 port released soon even pylucene still cares for stable py2
only? I mean releasing jcc for py3 cannot break any existing project.
No need to wait for the right time to test it more carefully.

Cheers,
Rudi
Andi Vajda
2017-03-17 04:23:24 UTC
Permalink
Post by Ruediger Meier
Post by Andi Vajda
Post by Ruediger Meier
Post by Andi Vajda
Indeed, this is a bug of mine.
- include the actual .jar files in the distribution archive
(tell tar to follow the symlinks when I build the PyLucene
distribution) - or exclude the symlinks (tell tar to exclude
symlinks); your running build would then use ivy to fetch them
Usually my opinion is that tarballs should have the least possible
dependencies. But in this case where all the deps are hosted on the
same source (apache.org) I would not include it but download on
build time (if user has not downloaded it manually already).
+1, I'm leaning towards not including these .jar files as well.
It saves about 20Mb on the pylucene distribution tar file and they
can be obtained from ivy anyway.
Post by Ruediger Meier
Maybe we could even enhance the Makefile to automatically find an
already installed lucene or download the latest minor version. IMO
it makes no sense that pylucene users by default always use a
non-bugfixed outdated lucene. And I saw on this mailing list how
difficult it can be to get enough votes for a pylucene minor
update.
There is no such thing as a bugfixed Lucene. Each Lucene release has
new bug fixes but also new bugs, such is software development. Lucene
also breaks things on a regular basis inspite of being quite careful
about backwards compatibility, thus PyLucene unit tests have to be
checked for each release.
The problem you're referring to would not be much of an issue if it
was easier to garner votes for a PyLucene release. A new release
would happen in lock step with each Lucene release, as was the case
in the past, a few years ago. There is a Lucene 6.5 release being
talked about and I intend to release a PyLucene 6.5 shortly
thereafter.
Well, I was speaking about the minor maintenance updates like 6.4.2 but
you know surely better about the quality of lucene updates.
Post by Andi Vajda
Post by Ruediger Meier
The same goes for the jcc python package which the user has to
install manually anyways. We don't need to ship it with pylucene. I
guess jcc would be far more famous if it would be hosted decoupled
of pylucene. IMO jcc is a really amazing good working thing.
pylucene is just a nice example how easy you can use java libs via
python.
Thank you for the kind words. JCC is already available without
PyLucene from Python's PyPI: https://pypi.python.org/pypi/JCC/2.23
JCC gets released on PyPI at the same time as the main Apache
PyLucene release.
I agree that PyLucene is just an example of JCC usage but it's the
main one and PyLucene has been driving the features of JCC.
Yep, jcc only exists because of pylucene. And good that pylucene's
development and user base guarantees that jcc will be well maintained
in future too. On the other hand pylucene may be some kind of show
stopper for jcc. Why wasn't the old experimental jcc/py3 port released
quickly on PyPI 7 years ago?
Because it was an experimental branch that was never finished.
Post by Ruediger Meier
Is there any chance to get the recent
jcc/py3 port released soon even pylucene still cares for stable py2
only?
I don't think PyLucene cares either way. I have not had enough time in a long while to do a releasable version of jcc with python 3 support. Now, several people, including yourself, have proposed python 3 ports. I still have to figure a way to package this all up into a release that works with both.
I need some time to integrate the three python 3 ports, update it to do proper string conversions and package it in a way that it works both with python 2 and 3 (can be different sets of sources, with possible overlaps, but in the same source egg).

Andi..
Post by Ruediger Meier
I mean releasing jcc for py3 cannot break any existing project.
No need to wait for the right time to test it more carefully.
Cheers,
Rudi
Ruediger Meier
2017-03-17 18:52:48 UTC
Permalink
Post by Andi Vajda
Now, several people, including yourself, have proposed
python 3 ports. I still have to figure a way to package this all up
into a release that works with both. I need some time to integrate
the three python 3 ports,
FYI I have the other two ports also imported into my github repo which
makes it easy to compare again.

$ git ls-remote https://github.com/rudimeier/jcc |cut -f2

refs/heads/master <<< my final one, works for py2 and py3
refs/heads/py3-old-orig <<< old svn, pylucene/branches/python_3
refs/heads/py3-tommykoch <<< from https://gist.github.com/tommykoch
refs/tags/v2.23


cu,
Rudi
Andi Vajda
2017-03-17 21:29:46 UTC
Permalink
Post by Ruediger Meier
Post by Andi Vajda
Now, several people, including yourself, have proposed
python 3 ports. I still have to figure a way to package this all up
into a release that works with both. I need some time to integrate
the three python 3 ports,
FYI I have the other two ports also imported into my github repo which
makes it easy to compare again.
$ git ls-remote https://github.com/rudimeier/jcc |cut -f2
refs/heads/master <<< my final one, works for py2 and py3
refs/heads/py3-old-orig <<< old svn, pylucene/branches/python_3
refs/heads/py3-tommykoch <<< from https://gist.github.com/tommykoch
refs/tags/v2.23
Thank you. I got started on this today and I'm now starting to look at the
three ports. So far, I've got jcc split into two parts (still one module,
one egg) to work with both python2 and python3 but keeping the code
separate. It's too much of a mess to keep both versions together in the same
file and I don't expect the python2 version to change too much since jcc has
been quite stable...

Andi..
Post by Ruediger Meier
cu,
Rudi
Loading...