Pylucene jvm CharArraySet Error

Discussion:

Alexander Alex

2014-10-17 10:31:27 UTC

I added a customized lucene analyzer class to lucene core in Pylucene. This
class is google guava as a dependency because of the array handling
function available in com.google.common.collect.Iterables in guava. When I
tried to index using this analyzer, I got the following error:

Traceback (most recent call last): File "C:\IndexFiles.py", line 78, in
lucene.initVM() JavaError: java.lang.NoClassDefFoundError:
org/apache/lucene/analysis/CharArraySet Java stacktrace:
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/CharArraySet
Caused by: java.lang.ClassNotFoundException:
org.apache.lucene.analysis.CharArraySet at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)

Even the example indexing code in Lucene in Action that I tried earlier and
worked, when I retried it after adding this class is returning the same
error above. Am not too familiar with CharArraySet class as I can see the
problem is from it. How do i handle this? Attached is the java files whose
class were added to lucene core in pylucene. Thanks

Alexander Alex

2014-10-17 10:40:51 UTC

Permalink

Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation
from any python code using lucene caused as a result of the classes I added
to lucene core.

---------- Forwarded message ----------
From: Alexander Alex <***@gmail.com>
Date: Fri, Oct 17, 2014 at 12:31 PM
Subject: Pylucene jvm CharArraySet Error
To: pylucene-***@lucene.apache.org

I added a customized lucene analyzer class to lucene core in Pylucene. This
class is google guava as a dependency because of the array handling
function available in com.google.common.collect.Iterables in guava. When I
tried to index using this analyzer, I got the following error:

Traceback (most recent call last): File "C:\IndexFiles.py", line 78, in
lucene.initVM() JavaError: java.lang.NoClassDefFoundError:
org/apache/lucene/analysis/CharArraySet Java stacktrace:
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/CharArraySet
Caused by: java.lang.ClassNotFoundException:
org.apache.lucene.analysis.CharArraySet at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)

Even the example indexing code in Lucene in Action that I tried earlier and
worked, when I retried it after adding this class is returning the same
error above. Am not too familiar with CharArraySet class as I can see the
problem is from it. How do i handle this? Attached is the java files whose
class were added to lucene core in pylucene. Thanks

Andi Vajda

2014-10-17 17:23:11 UTC

Permalink

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation
from any python code using lucene caused as a result of the classes I added
to lucene core.
---------- Forwarded message ----------
I added a customized lucene analyzer class to lucene core in Pylucene.

Please explain in _detail_ the steps you followed to accomplish this.
A log of all the commands you ran would be ideal.

Thanks !

Andi..

Post by Alexander Alex
This class is google guava as a dependency because of the array handling
function available in com.google.common.collect.Iterables in guava. When I
Traceback (most recent call last): File "C:\IndexFiles.py", line 78, in
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/CharArraySet
org.apache.lucene.analysis.CharArraySet at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Even the example indexing code in Lucene in Action that I tried earlier and
worked, when I retried it after adding this class is returning the same
error above. Am not too familiar with CharArraySet class as I can see the
problem is from it. How do i handle this? Attached is the java files whose
class were added to lucene core in pylucene. Thanks

Alexander Alex

2014-10-17 22:10:52 UTC

Permalink

ok. I built the class files for the java files attached herein, add them to
lucene-core-3.6.2.jar at org.apache.lucene.analysis and
lucene-analyzers-3.6.2.jar at org.apache.lucene.analysis. I then added the
path of the dependencies to classpath in the init.py file. I ran the
typical index file using this customized analyzer through PythonAnalyzer
and got the above error. Meanwhile, I had earlier ran the index file using
standard analyzer before adding the classes and it worked. After running
the index file with the customized analyzer failed, I tried again with the
standard analyzer which had earlier worked before adding the classes but
failed this time around with same error message as above. I guess the
problem has to do with array compatibility in java and python but I don't
really know. Thanks.

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation

Post by Alexander Alex
from any python code using lucene caused as a result of the classes I added
to lucene core.
---------- Forwarded message ----------
I added a customized lucene analyzer class to lucene core in Pylucene.

Please explain in _detail_ the steps you followed to accomplish this.
A log of all the commands you ran would be ideal.
Thanks !
Andi..
This class is google guava as a dependency because of the array handling

Post by Alexander Alex
function available in com.google.common.collect.Iterables in guava. When
Traceback (most recent call last): File "C:\IndexFiles.py", line 78, in
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/CharArraySet
org.apache.lucene.analysis.CharArraySet at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Even the example indexing code in Lucene in Action that I tried earlier and
worked, when I retried it after adding this class is returning the same
error above. Am not too familiar with CharArraySet class as I can see the
problem is from it. How do i handle this? Attached is the java files whose
class were added to lucene core in pylucene. Thanks

Andi Vajda

2014-10-17 22:29:54 UTC

Permalink

Post by Alexander Alex
ok. I built the class files for the java files attached herein, add them to
lucene-core-3.6.2.jar at org.apache.lucene.analysis and
lucene-analyzers-3.6.2.jar at org.apache.lucene.analysis. I then added the
path of the dependencies to classpath in the init.py file.

What init.py file ?
Can you paste the contents of that file here, please ?

Andi..

Post by Alexander Alex
I ran the
typical index file using this customized analyzer through PythonAnalyzer
and got the above error. Meanwhile, I had earlier ran the index file using
standard analyzer before adding the classes and it worked. After running
the index file with the customized analyzer failed, I tried again with the
standard analyzer which had earlier worked before adding the classes but
failed this time around with same error message as above. I guess the
problem has to do with array compatibility in java and python but I don't
really know. Thanks.

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation

Alexander Alex

2014-10-18 10:13:48 UTC

Permalink

The init file in the pylucene egg. Below is it:

import os, sys

if sys.platform == 'win32':
from jcc.windows import add_jvm_dll_directory_to_path
add_jvm_dll_directory_to_path()
import jcc, _lucene
else:
import _lucene

__dir__ = os.path.abspath(os.path.dirname(__file__))

class JavaError(Exception):
def getJavaException(self):
return self.args[0]
def __str__(self):
writer = StringWriter()
self.getJavaException().printStackTrace(PrintWriter(writer))
return "\n".join((super(JavaError, self).__str__(), " Java
stacktrace:", str(writer)))

class InvalidArgsError(Exception):
pass

_lucene._set_exception_types(JavaError, InvalidArgsError)

VERSION = "3.6.2"
CLASSPATH = [os.path.join(__dir__, "lucene-core-3.6.2.jar"),
os.path.join(__dir__, "lucene-analyzers-3.6.2.jar"), os.path.join(__dir__,
"lucene-memory-3.6.2.jar"), os.path.join(__dir__,
"lucene-highlighter-3.6.2.jar"), os.path.join(__dir__, "extensions.jar"),
os.path.join(__dir__, "lucene-queries-3.6.2.jar"), os.path.join(__dir__,
"lucene-grouping-3.6.2.jar"), os.path.join(__dir__,
"lucene-join-3.6.2.jar"), os.path.join(__dir__, "lucene-facet-3.6.2.jar"),
os.path.join(__dir__, "lucene-spellchecker-3.6.2.jar")]
CLASSPATH = os.pathsep.join(CLASSPATH)
_lucene.CLASSPATH = CLASSPATH
_lucene._set_function_self(_lucene.initVM, _lucene)

from _lucene import *

Post by Alexander Alex
ok. I built the class files for the java files attached herein, add them

Post by Alexander Alex
to
lucene-core-3.6.2.jar at org.apache.lucene.analysis and
lucene-analyzers-3.6.2.jar at org.apache.lucene.analysis. I then added the
path of the dependencies to classpath in the init.py file.

What init.py file ?
Can you paste the contents of that file here, please ?
Andi..
I ran the

Post by Alexander Alex
typical index file using this customized analyzer through PythonAnalyzer
and got the above error. Meanwhile, I had earlier ran the index file using
standard analyzer before adding the classes and it worked. After running
the index file with the customized analyzer failed, I tried again with the
standard analyzer which had earlier worked before adding the classes but
failed this time around with same error message as above. I guess the
problem has to do with array compatibility in java and python but I don't
really know. Thanks.

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation

Andi Vajda

2014-10-18 19:55:26 UTC

Permalink

Post by Alexander Alex
import os, sys
from jcc.windows import add_jvm_dll_directory_to_path
add_jvm_dll_directory_to_path()
import jcc, _lucene
import _lucene
__dir__ = os.path.abspath(os.path.dirname(__file__))
return self.args[0]
writer = StringWriter()
self.getJavaException().printStackTrace(PrintWriter(writer))
return "\n".join((super(JavaError, self).__str__(), " Java
stacktrace:", str(writer)))
pass
_lucene._set_exception_types(JavaError, InvalidArgsError)
VERSION = "3.6.2"
CLASSPATH = [os.path.join(__dir__, "lucene-core-3.6.2.jar"),
os.path.join(__dir__, "lucene-analyzers-3.6.2.jar"), os.path.join(__dir__,
"lucene-memory-3.6.2.jar"), os.path.join(__dir__,
"lucene-highlighter-3.6.2.jar"), os.path.join(__dir__, "extensions.jar"),
os.path.join(__dir__, "lucene-queries-3.6.2.jar"), os.path.join(__dir__,
"lucene-grouping-3.6.2.jar"), os.path.join(__dir__,
"lucene-join-3.6.2.jar"), os.path.join(__dir__, "lucene-facet-3.6.2.jar"),
os.path.join(__dir__, "lucene-spellchecker-3.6.2.jar")]
CLASSPATH = os.pathsep.join(CLASSPATH)
_lucene.CLASSPATH = CLASSPATH
_lucene._set_function_self(_lucene.initVM, _lucene)
from _lucene import *

Thanks. This looks like the vanilla __init__.py file in the pylucene egg.
I see no modifications from you for, I quote "path of the dependencies to
classpath in the init.py file".

To be sure there is no misunderstanding here, this is what I understand from
you so far:
- you downloaded, built and installed PyLucene 3.6.2
(with what Python version and what Java version ?)
- you then compiled a new class and added it to two JAR files,
lucene-core-3.6.2.jar and lucene-analyzers-3.6.2.jar
(with that Java version ?, why did you modify two JAR files ?
why not create your own JAR file with your extra stuff ?)
- you then edited __init__.py to reflect this change but I don't see
any change in the file you pasted nor why the change is needed if you
just modified existing JAR files (in the right location, inside the
PyLucene egg, right ?)
- you did not rebuild PyLucene itself after making any of these changes

If this mental picture is correct then this is not the right way to go about
it. The proper way to modify Lucene Core and then PyLucene is to:
- compile and build your new classes using the same version of Java (and
Lucene)
- create a new JAR file containing your extra stuff
- test that it all works with a simple Java program that uses Lucene core
and your new code together
- _then_ rebuild PyLucene including your new JAR file either by:
- adding it to the list of JAR files being wrapped by JCC via --jar
in the PyLucene Makefile
- OR pass it to JCC via --include instead so that it just becomes part
of the new PyLucene egg (ensuring it being inside the egg and on the
classpath but no Python wrappers for it are generated)

To get command line argument help from JCC run python -m jcc --help (or
whatever the correct invocation is for your version of Python).

Andi..

Post by Alexander Alex

Post by Alexander Alex
ok. I built the class files for the java files attached herein, add them

What init.py file ?
Can you paste the contents of that file here, please ?
Andi..
I ran the

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation

Alexander Alex

2014-10-18 20:11:50 UTC

Permalink

Thanks Andi. am going to try these suggestions out.

Post by Andi Vajda

Thanks. This looks like the vanilla __init__.py file in the pylucene egg.
I see no modifications from you for, I quote "path of the dependencies to
classpath in the init.py file".
To be sure there is no misunderstanding here, this is what I understand
- you downloaded, built and installed PyLucene 3.6.2
(with what Python version and what Java version ?)
- you then compiled a new class and added it to two JAR files,
lucene-core-3.6.2.jar and lucene-analyzers-3.6.2.jar
(with that Java version ?, why did you modify two JAR files ?
why not create your own JAR file with your extra stuff ?)
- you then edited __init__.py to reflect this change but I don't see
any change in the file you pasted nor why the change is needed if you
just modified existing JAR files (in the right location, inside the
PyLucene egg, right ?)
- you did not rebuild PyLucene itself after making any of these changes
If this mental picture is correct then this is not the right way to go
- compile and build your new classes using the same version of Java (and
Lucene)
- create a new JAR file containing your extra stuff
- test that it all works with a simple Java program that uses Lucene core
and your new code together
- adding it to the list of JAR files being wrapped by JCC via --jar
in the PyLucene Makefile
- OR pass it to JCC via --include instead so that it just becomes part
of the new PyLucene egg (ensuring it being inside the egg and on the
classpath but no Python wrappers for it are generated)
To get command line argument help from JCC run python -m jcc --help (or
whatever the correct invocation is for your version of Python).
Andi..

Post by Alexander Alex

Post by Alexander Alex
ok. I built the class files for the java files attached herein, add them

What init.py file ?
Can you paste the contents of that file here, please ?
Andi..
I ran the

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation
from any python code using lucene caused as a result of the classes I

Post by Alexander Alex
added
to lucene core.
---------- Forwarded message ----------
I added a customized lucene analyzer class to lucene core in Pylucene.
Please explain in _detail_ the steps you followed to accomplish this.

A log of all the commands you ran would be ideal.
Thanks !
Andi..
This class is google guava as a dependency because of the array handling
function available in com.google.common.collect.Iterables in guava.

Post by Alexander Alex
When
Traceback (most recent call last): File "C:\IndexFiles.py", line 78, in
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/
CharArraySet
org.apache.lucene.analysis.CharArraySet at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Even the example indexing code in Lucene in Action that I tried
earlier
and
worked, when I retried it after adding this class is returning the same
error above. Am not too familiar with CharArraySet class as I can see the
problem is from it. How do i handle this? Attached is the java files whose
class were added to lucene core in pylucene. Thanks

Alexander Alex

2014-11-09 09:53:58 UTC

Permalink

Traceback (most recent call last):
File "C:\index1.py", line 94, in <module>
IndexFiles(sys.argv[1], os.path.join(base_dir, INDEX_DIR),
EnglishLemmaAnalyzer("english-bidirectional-distsim.tagger"))
File "C:\index1.py", line 48, in __init__
self.indexDocs(root, writer)
File "C:\index1.py", line 81, in indexDocs
writer.addDocument(doc)
JavaError: org.apache.jcc.PythonException: ('while calling', 'tokenStream',
<class '__main__.EnglishLemmaTokenizer'>)
TypeError: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)

Java stacktrace:
org.apache.jcc.PythonException: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)
TypeError: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)

at org.apache.pylucene.analysis.PythonAnalyzer.tokenStream(Native
Method)

at
org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java:80)

at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:137)

at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)

at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)

Then I tried to change the return object and runit as index2.py, again I
have the following errors:

Traceback (most recent call last):
File "C:\newIndexfiles.py", line 94, in <module>
IndexFiles(sys.argv[1], os.path.join(base_dir, INDEX_DIR),
EnglishLemmaAnalyzer("english-bidirectional-distsim.tagger"))
File "C:\newIndexfiles.py", line 48, in __init__
self.indexDocs(root, writer)
File "C:\newIndexfiles.py", line 81, in indexDocs
writer.addDocument(doc)
JavaError: java.lang.NullPointerException
Java stacktrace:
java.lang.NullPointerException

at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141)

at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)

at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)

I cannot figure out the issues here. Thanks

On Sat, Oct 18, 2014 at 10:11 PM, Alexander Alex <

Post by Alexander Alex
Thanks Andi. am going to try these suggestions out.

Post by Andi Vajda

Post by Alexander Alex
import os, sys
from jcc.windows import add_jvm_dll_directory_to_path
add_jvm_dll_directory_to_path()
import jcc, _lucene
import _lucene
__dir__ = os.path.abspath(os.path.dirname(__file__))
return self.args[0]
writer = StringWriter()
self.getJavaException().printStackTrace(PrintWriter(writer))
return "\n".join((super(JavaError, self).__str__(), " Java
stacktrace:", str(writer)))
pass
_lucene._set_exception_types(JavaError, InvalidArgsError)
VERSION = "3.6.2"
CLASSPATH = [os.path.join(__dir__, "lucene-core-3.6.2.jar"),
os.path.join(__dir__, "lucene-analyzers-3.6.2.jar"),
os.path.join(__dir__,
"lucene-memory-3.6.2.jar"), os.path.join(__dir__,
"lucene-highlighter-3.6.2.jar"), os.path.join(__dir__,
"extensions.jar"),
os.path.join(__dir__, "lucene-queries-3.6.2.jar"), os.path.join(__dir__,
"lucene-grouping-3.6.2.jar"), os.path.join(__dir__,
"lucene-join-3.6.2.jar"), os.path.join(__dir__,
"lucene-facet-3.6.2.jar"),
os.path.join(__dir__, "lucene-spellchecker-3.6.2.jar")]
CLASSPATH = os.pathsep.join(CLASSPATH)
_lucene.CLASSPATH = CLASSPATH
_lucene._set_function_self(_lucene.initVM, _lucene)
from _lucene import *

Thanks. This looks like the vanilla __init__.py file in the pylucene egg.
I see no modifications from you for, I quote "path of the dependencies to
classpath in the init.py file".
To be sure there is no misunderstanding here, this is what I understand
- you downloaded, built and installed PyLucene 3.6.2
(with what Python version and what Java version ?)
- you then compiled a new class and added it to two JAR files,
lucene-core-3.6.2.jar and lucene-analyzers-3.6.2.jar
(with that Java version ?, why did you modify two JAR files ?
why not create your own JAR file with your extra stuff ?)
- you then edited __init__.py to reflect this change but I don't see
any change in the file you pasted nor why the change is needed if you
just modified existing JAR files (in the right location, inside the
PyLucene egg, right ?)
- you did not rebuild PyLucene itself after making any of these changes
If this mental picture is correct then this is not the right way to go
- compile and build your new classes using the same version of Java (and
Lucene)
- create a new JAR file containing your extra stuff
- test that it all works with a simple Java program that uses Lucene core
and your new code together
- adding it to the list of JAR files being wrapped by JCC via --jar
in the PyLucene Makefile
- OR pass it to JCC via --include instead so that it just becomes part
of the new PyLucene egg (ensuring it being inside the egg and on the
classpath but no Python wrappers for it are generated)
To get command line argument help from JCC run python -m jcc --help (or
whatever the correct invocation is for your version of Python).
Andi..

Post by Alexander Alex

Post by Alexander Alex
ok. I built the class files for the java files attached herein, add them

What init.py file ?
Can you paste the contents of that file here, please ?
Andi..
I ran the

Post by Alexander Alex
typical index file using this customized analyzer through
PythonAnalyzer
and got the above error. Meanwhile, I had earlier ran the index file using
standard analyzer before adding the classes and it worked. After running
the index file with the customized analyzer failed, I tried again with the
standard analyzer which had earlier worked before adding the classes but
failed this time around with same error message as above. I guess the
problem has to do with array compatibility in java and python but I don't
really know. Thanks.

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation
from any python code using lucene caused as a result of the classes I

this.
A log of all the commands you ran would be ideal.
Thanks !
Andi..
This class is google guava as a dependency because of the array handling
function available in com.google.common.collect.Iterables in guava.

Andi Vajda

2014-11-10 01:45:52 UTC

Permalink

Post by Alexander Alex
File "C:\index1.py", line 94, in <module>
IndexFiles(sys.argv[1], os.path.join(base_dir, INDEX_DIR),
EnglishLemmaAnalyzer("english-bidirectional-distsim.tagger"))
File "C:\index1.py", line 48, in __init__
self.indexDocs(root, writer)
File "C:\index1.py", line 81, in indexDocs
writer.addDocument(doc)
JavaError: org.apache.jcc.PythonException: ('while calling', 'tokenStream',
<class '__main__.EnglishLemmaTokenizer'>)
TypeError: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)
org.apache.jcc.PythonException: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)
TypeError: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)
at org.apache.pylucene.analysis.PythonAnalyzer.tokenStream(Native
Method)
at
org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java:80)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:137)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
Then I tried to change the return object and runit as index2.py, again I
File "C:\newIndexfiles.py", line 94, in <module>
IndexFiles(sys.argv[1], os.path.join(base_dir, INDEX_DIR),
EnglishLemmaAnalyzer("english-bidirectional-distsim.tagger"))
File "C:\newIndexfiles.py", line 48, in __init__
self.indexDocs(root, writer)
File "C:\newIndexfiles.py", line 81, in indexDocs
writer.addDocument(doc)
JavaError: java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
I cannot figure out the issues here. Thanks

Me neither. If you wrote new Java code you need to make sure it runs before
you wrap it with JCC. It's a lot easier to debug this way.

Andi..

Post by Alexander Alex
On Sat, Oct 18, 2014 at 10:11 PM, Alexander Alex <

Post by Alexander Alex
Thanks Andi. am going to try these suggestions out.

Post by Andi Vajda

Post by Alexander Alex
import os, sys
from jcc.windows import add_jvm_dll_directory_to_path
add_jvm_dll_directory_to_path()
import jcc, _lucene
import _lucene
__dir__ = os.path.abspath(os.path.dirname(__file__))
return self.args[0]
writer = StringWriter()
self.getJavaException().printStackTrace(PrintWriter(writer))
return "\n".join((super(JavaError, self).__str__(), " Java
stacktrace:", str(writer)))
pass
_lucene._set_exception_types(JavaError, InvalidArgsError)
VERSION = "3.6.2"
CLASSPATH = [os.path.join(__dir__, "lucene-core-3.6.2.jar"),
os.path.join(__dir__, "lucene-analyzers-3.6.2.jar"),
os.path.join(__dir__,
"lucene-memory-3.6.2.jar"), os.path.join(__dir__,
"lucene-highlighter-3.6.2.jar"), os.path.join(__dir__,
"extensions.jar"),
os.path.join(__dir__, "lucene-queries-3.6.2.jar"), os.path.join(__dir__,
"lucene-grouping-3.6.2.jar"), os.path.join(__dir__,
"lucene-join-3.6.2.jar"), os.path.join(__dir__,
"lucene-facet-3.6.2.jar"),
os.path.join(__dir__, "lucene-spellchecker-3.6.2.jar")]
CLASSPATH = os.pathsep.join(CLASSPATH)
_lucene.CLASSPATH = CLASSPATH
_lucene._set_function_self(_lucene.initVM, _lucene)
from _lucene import *

Thanks. This looks like the vanilla __init__.py file in the pylucene egg.
I see no modifications from you for, I quote "path of the dependencies to
classpath in the init.py file".
To be sure there is no misunderstanding here, this is what I understand
- you downloaded, built and installed PyLucene 3.6.2
(with what Python version and what Java version ?)
- you then compiled a new class and added it to two JAR files,
lucene-core-3.6.2.jar and lucene-analyzers-3.6.2.jar
(with that Java version ?, why did you modify two JAR files ?
why not create your own JAR file with your extra stuff ?)
- you then edited __init__.py to reflect this change but I don't see
any change in the file you pasted nor why the change is needed if you
just modified existing JAR files (in the right location, inside the
PyLucene egg, right ?)
- you did not rebuild PyLucene itself after making any of these changes
If this mental picture is correct then this is not the right way to go
- compile and build your new classes using the same version of Java (and
Lucene)
- create a new JAR file containing your extra stuff
- test that it all works with a simple Java program that uses Lucene core
and your new code together
- adding it to the list of JAR files being wrapped by JCC via --jar
in the PyLucene Makefile
- OR pass it to JCC via --include instead so that it just becomes part
of the new PyLucene egg (ensuring it being inside the egg and on the
classpath but no Python wrappers for it are generated)
To get command line argument help from JCC run python -m jcc --help (or
whatever the correct invocation is for your version of Python).
Andi..

Post by Alexander Alex

Post by Alexander Alex
ok. I built the class files for the java files attached herein, add them

What init.py file ?
Can you paste the contents of that file here, please ?
Andi..
I ran the

Post by Alexander Alex
typical index file using this customized analyzer through
PythonAnalyzer
and got the above error. Meanwhile, I had earlier ran the index file using
standard analyzer before adding the classes and it worked. After running
the index file with the customized analyzer failed, I tried again with the
standard analyzer which had earlier worked before adding the classes but
failed this time around with same error message as above. I guess the
problem has to do with array compatibility in java and python but I don't
really know. Thanks.

Post by Alexander Alex
Meanwhile, am using lucene 3.6.2 version. The problem is jvm instantiation
from any python code using lucene caused as a result of the classes I