Discussion:
Strange JCC issues on armel/armhf (and other arches)
Dmitry Nezhevenko
2016-08-08 11:01:26 UTC
Permalink
Hi,

I'm trying to figure out why pylucene fails to build on Debian on a lot of
architectures:

https://buildd.debian.org/status/package.php?p=pylucene

I've tried to debug it as much as possible and found that JCC crashes
inside native initVM call just inside JNI_CreateJavaVM.

Sometimes under certain conditions it works, but doesn't work when called
from debian package build script.

I've found that crash depends on number and content of environment
variables or command-line options (even if I commented out access to
sys.argv and os.environ).

Finally I've figured out that I don't need JCC python code at all to
reproduce it and got followed:

echo 'import jcc; jcc.initVM()' | python2.7 - 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23
+0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
+1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
+2345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
+3456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
+4567890123456789012345678901234567890123456789012345678901
NATIVE: initVM
NATIVE: initVM: 1
NATIVE: initVM: 3
NATIVE: initVM: 4
NATIVE: initVM: 6
NATIVE: initVM: 7!!!
NATIVE: initVM: 7. Calling JNI_CreateJavaVM
Segmentation fault

[ NATIVE: lines are just printf() calls I've added to native initVM
function ]

The magic happens in arguments. Just removing last '1' from last arg fixes
crash for me. Or just remove one symbol from any environment variables.

Unfortunately it crashes such way that I can't get call stack.

ltrace suggests me that crash happens just after reading /proc/self/maps
file. It's probably pthread_getattr_np() function.

I was trying to debug it step-by-step but with no luck for now (it works
if I'm just stepping after breaking at pthread_getattr_np call

I've also tried to remove all logic from initVM() and just copy/paste
JNI_CreateJavaVM usage from docs:

https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html

and it still crashes. Playing with -Xmx -Xms, -Xss doesn't help at all.

Any suggestions about cause of this and possible solution?
--
WBR, Dmitry
Andi Vajda
2016-08-13 06:59:00 UTC
Permalink
Post by Dmitry Nezhevenko
Hi,
I'm trying to figure out why pylucene fails to build on Debian on a lot of
https://buildd.debian.org/status/package.php?p=pylucene
I've tried to debug it as much as possible and found that JCC crashes
inside native initVM call just inside JNI_CreateJavaVM.
Sometimes under certain conditions it works, but doesn't work when called
from debian package build script.
I've found that crash depends on number and content of environment
variables or command-line options (even if I commented out access to
sys.argv and os.environ).
Finally I've figured out that I don't need JCC python code at all to
echo 'import jcc; jcc.initVM()' | python2.7 - 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23
+0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
+1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
+2345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
+3456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
+4567890123456789012345678901234567890123456789012345678901
NATIVE: initVM
NATIVE: initVM: 1
NATIVE: initVM: 3
NATIVE: initVM: 4
NATIVE: initVM: 6
NATIVE: initVM: 7!!!
NATIVE: initVM: 7. Calling JNI_CreateJavaVM
Segmentation fault
[ NATIVE: lines are just printf() calls I've added to native initVM
function ]
The magic happens in arguments. Just removing last '1' from last arg fixes
crash for me. Or just remove one symbol from any environment variables.
Unfortunately it crashes such way that I can't get call stack.
ltrace suggests me that crash happens just after reading /proc/self/maps
file. It's probably pthread_getattr_np() function.
I was trying to debug it step-by-step but with no luck for now (it works
if I'm just stepping after breaking at pthread_getattr_np call
I've also tried to remove all logic from initVM() and just copy/paste
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html
and it still crashes. Playing with -Xmx -Xms, -Xss doesn't help at all.
Any suggestions about cause of this and possible solution?
No one replied to your question so far. I don't have much to add since I
have no access to your platform and can't reproduce any of this to debug it.
It looks like the symptoms you describe could hint at a version mismatch
between the headers you use to compile (and the libs to link) and then the
ones you use at runtime. Maybe version skew or 32-bit vs 64-bit (?) or
different frameworks.
Sorry for not having anything more useful to say.

Andi..
Post by Dmitry Nezhevenko
--
WBR, Dmitry
Dmitry Nezhevenko
2016-08-13 12:57:37 UTC
Permalink
Post by Andi Vajda
Post by Dmitry Nezhevenko
and it still crashes. Playing with -Xmx -Xms, -Xss doesn't help at all.
Any suggestions about cause of this and possible solution?
No one replied to your question so far. I don't have much to add since I
have no access to your platform and can't reproduce any of this to debug it.
It looks like the symptoms you describe could hint at a version mismatch
between the headers you use to compile (and the libs to link) and then the
ones you use at runtime. Maybe version skew or 32-bit vs 64-bit (?) or
different frameworks.
Sorry for not having anything more useful to say.
Andi..
Well. I'm pretty sure that this is not header/version mismatch. Because it
happens in isolated environment with only one JDK installed.

I've replaced jcc.__main__ entry point with wrapper that creates child
process and passes args to it via stdin instead of sys.argv and it works.

I was able to build pylucene with such wrapper and it passes all tests. So
I'm pretty sure that it's not header/binary mismatch. Because in such code
should crash somewhere else.

From my point of view it's stack size limit or something like this.

In any case thanks for you reply. I've working workaround. I know that
pylucene currently lacks development effort, so this probably will not be
fixed.

In any case if you have time and want to take a look, I can prepare qemu VM
that reproduces it.

Code that I've used:

# Usage: replace jcc calls with wrapper call:
#
# Before:
# python2.7 -m jcc.__main__ --jar file1.jar ... --build
#
# After:
# debian/jcc_wrapper python2.7 -m jcc.__main__ --jar file1.jar ... --build
#
# jcc_wrapper will read all arguments from sys.argv, invoke itself with --stdin arg and
# pass parameters to JCC without touching of sys.argv

import os
import sys
import subprocess

if len(sys.argv) > 1 and sys.argv[1] == '--stdin':
# Child process. Read options and call jcc
args = sys.stdin.read().split('\n')

# Get rid of -m jcc.__main__ (expected to be first two args)
if len(args) > 2 and args[1] == '-m':
del args[1:3]

# python -m jcc fills sys.argv[0] with path to module. Emulate this
import jcc
args[0] = os.path.join(jcc.__path__[0], '__main__.py')

from jcc import cpp
cpp.jcc(args)
else:
# parent process. Read opts, and pass them to child
# expected args: jcc_wrapper python2.7 -m jcc.__main__ ...
proc = subprocess.Popen([sys.argv[1], sys.argv[0], '--stdin'], stdin=subprocess.PIPE)
proc.stdin.write('\n'.join(sys.argv[1:]))
proc.stdin.close()
sys.exit(proc.wait())
--
WBR, Dmitry
Andi Vajda
2016-08-13 14:50:48 UTC
Permalink
Post by Dmitry Nezhevenko
Post by Andi Vajda
Post by Dmitry Nezhevenko
and it still crashes. Playing with -Xmx -Xms, -Xss doesn't help at all.
Any suggestions about cause of this and possible solution?
No one replied to your question so far. I don't have much to add since I
have no access to your platform and can't reproduce any of this to debug it.
It looks like the symptoms you describe could hint at a version mismatch
between the headers you use to compile (and the libs to link) and then the
ones you use at runtime. Maybe version skew or 32-bit vs 64-bit (?) or
different frameworks.
Sorry for not having anything more useful to say.
Andi..
Well. I'm pretty sure that this is not header/version mismatch. Because it
happens in isolated environment with only one JDK installed.
I've replaced jcc.__main__ entry point with wrapper that creates child
process and passes args to it via stdin instead of sys.argv and it works.
I was able to build pylucene with such wrapper and it passes all tests. So
I'm pretty sure that it's not header/binary mismatch. Because in such code
should crash somewhere else.
From my point of view it's stack size limit or something like this.
In any case thanks for you reply. I've working workaround. I know that
pylucene currently lacks development effort, so this probably will not be
fixed.
If you send in a patch that fixes it, it mught be :-)

Andi..
Post by Dmitry Nezhevenko
In any case if you have time and want to take a look, I can prepare qemu VM
that reproduces it.
#
# python2.7 -m jcc.__main__ --jar file1.jar ... --build
#
# debian/jcc_wrapper python2.7 -m jcc.__main__ --jar file1.jar ... --build
#
# jcc_wrapper will read all arguments from sys.argv, invoke itself with --stdin arg and
# pass parameters to JCC without touching of sys.argv
import os
import sys
import subprocess
# Child process. Read options and call jcc
args = sys.stdin.read().split('\n')
# Get rid of -m jcc.__main__ (expected to be first two args)
del args[1:3]
# python -m jcc fills sys.argv[0] with path to module. Emulate this
import jcc
args[0] = os.path.join(jcc.__path__[0], '__main__.py')
from jcc import cpp
cpp.jcc(args)
# parent process. Read opts, and pass them to child
# expected args: jcc_wrapper python2.7 -m jcc.__main__ ...
proc = subprocess.Popen([sys.argv[1], sys.argv[0], '--stdin'], stdin=subprocess.PIPE)
proc.stdin.write('\n'.join(sys.argv[1:]))
proc.stdin.close()
sys.exit(proc.wait())
--
WBR, Dmitry
Loading...