I'm struggling with this problem for more than a week now, and still can't find a solution...
I'm trying to cross-compile Qt 4.7 embedded open-source version for an ARM device. The build process itself completes without problems, but the generated binaries seem to contain instructions that the processor does not understand.
Build host is Debian 5 (Etch) on i386 (running on a virtual PC)
The device is a Trimble Nomad handheld with an ARM processor (see full cpuinfo and kernel configuration)
I use the original build toolchain that was made for the device and that worked fine to date (even could build Gnash successfully) - see compiler settings and version
I'm using a custom qmake.conf based on linux-arm-gnueabi-g++ and adapted to use the correct toolchain - see source code here
I had a partial improvement by adding -msoft-float -D__GCC_FLOAT_NOT_NEEDED to the compiler flags but I still get "Illegal instruction" errors in some situations (but at least this was a big improvement)
The binaries themselves basically work, but in certain situations the program crashes with the "Illegal instruction" error. I believe this happens during certain floating point operations while doing graphics stuff.
Adding -mcpu=xscale, -march=armv4, -O0, -march=armv4, -mtune=arm920t (not all at the same time) did not help in any way.
Building Qt with the --debug flag appears to resolve all problems but adding the -O2 flag reintroduces them. Strangely the -O0 setting without --debug does not help.
The compilete configure and make output can be seen here. There are lots of alignment warnings but they are said to be false warnings of the compiler.
there must have been some change in Qt 4.7.2 because earlier versions (4.7.1, 4.7.0) do run fine.
configure settings:
./configure \
-embedded arm \
-xplatform qws/linux-arm-angstrom-gnueabi-g++ \
-debug \
-no-largefile \
-no-multimedia \
-no-audio-backend \
-no-phonon \
-no-phonon-backend \
-webkit \
-javascript-jit \
-no-xshape \
-no-xvideo \
-no-xsync \
-no-xinerama \
-no-xcursor \
-no-xfixes \
-no-xrandr \
-no-xrender \
-no-xinput \
-no-xkb \
-no-opengl \
-nomake docs \
-nomake examples \
-nomake tools \
-nomake demos \
-nomake translations \
-opensource \
-qt-mouse-tslib \
-qt-libjpeg \
-qt-gif
strace before the crash:
$ LD_LIBRARY_PATH=. QT_QWS_FONTDIR=$PWD/fonts QT_PLUGIN_PATH=$PWD/plugins QWS_MOUSE_PROTO=tslib:/dev/input/touchscreen0 strace ./digitalclock -qws test.htm
...
lseek(15, 0, SEEK_END) = 16998
write(15, "\t\n\f\0\367\t", 6) = 6
write(15, "\0\0+\234\325\343\306{\3\0\0\0\0J\370\377\351\301\336\377"..., 120) = 120
lseek(15, 0, SEEK_END) = 17124
write(15, "\10\10\10\0\371\10", 6) = 6
write(15, "\0\6j\251\260\201\27\0\2\276\377\351\334\377\346\32K\377"..., 64) = 64
lseek(15, 0, SEEK_END) = 17194
write(15, "\7\10\10\0\371\7", 6) = 6
write(15, "\0\4c\245\263\224 \0\1\271\377\367\315\356P\0I\377\364"..., 64) = 64
lseek(15, 0, SEEK_END) = 17264
write(15, "\10\n\10\1\366\10", 6) = 6
write(15, "\37 \3\0\0\0\0\0\374\377\34\0\0\0\0\0\374\377\34\0\0\0"..., 80) = 80
fcntl64(15, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
lseek(15, 0, SEEK_END) = 17350
mremap(0x415f5000, 16552, 17350, MREMAP_MAYMOVE) = 0x415f5000
--- SIGILL (Illegal instruction) # 0 (0) ---
rt_sigaction(SIGILL, {SIG_DFL}, {0x401b7d34, [ILL], SA_RESTART|0x4000000}, 8) = 0
socket_subcall(0x1f8004, 0, 0x100, 0, 0, 0x18844, 0x18840, 0x12c) = 0
ioctl(12, KDSKBMODE, 0x2) = 0
ioctl(12, SNDCTL_TMR_START or TCSETS, {B38400 -opost -isig -icanon -echo ...}) = 0
close(12) = 0
ioctl(10, KDSETMODE, 0x1) = 0
write(10, "\33[9;15]\33[?33h\33[?25h\33[?0c\0", 25) = 25
close(10) = 0
statfs64(umovestr: Input/output error
0x6d4f, 27983, {???}) = 0
sigreturn() = ? (mask now [ILL ABRT BUS FPE USR1 SEGV USR2 PIPE STKFLT CHLD CONT STOP TTOU URG XCPU VTALRM PROF WINCH IO PWR RTMIN])
--- SIGILL (Illegal instruction) # 0 (0) ---
+++ killed by SIGILL +++
Process 27983 detached
gdb backtrace of the crash (I'm missing debug symbols since compiling with debug information resolves the problem):
(gdb) run -qws
Starting program: /home/.qt-test2/digitalclock -qws
Program received signal SIGILL, Illegal instruction.
0x4130268c in __sigsetjmp () from /lib/libc.so.6
(gdb) bt
#0 0x4130268c in __sigsetjmp () from /lib/libc.so.6
#1 0x4046ee5c in ?? () from ./libQtGui.so.4
(gdb)
Note the device comes with Qtopia 4.3 preinstalled and the vendor can't explain the problem with my build either.
Update
With help from Igor Skochinsky I could find the exact assembler instruction that is causing the SIGILL. For some reason the instruction works fine for 47 times before causing the error. See gdb output below (note I'm not familiar with ARM assembler at all ):
$ LD_LIBRARY_PATH=. QT_QWS_FONTDIR=$PWD/fonts QT_PLUGIN_PATH=$PWD/plugins QWS_MOUSE_PROTO=tslib:/dev/input/touchscreen0 gdb ./digitalclock
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "arm-angstrom-linux-gnueabi"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) start -qws
Breakpoint 1 at 0xaa58: file main.cpp, line 47.
Starting program: /home/.qt-test2/digitalclock -qws
[Thread debugging using libthread_db enabled]
[New Thread 1073870720 (LWP 2799)]
[Switching to Thread 1073870720 (LWP 2799)]
main (argc=2, argv=0xbea17d04) at main.cpp:47
47 main.cpp: No such file or directory.
in main.cpp
(gdb) display/i $pc
1: x/i $pc 0xaa58 <main+24>: sub r3, r11, #28 ; 0x1c
(gdb) display/x $r2
2: /x $r2 = 0xbea17d10
(gdb) display/x $f2
3: /x $f2 = 0x0
(gdb) b *0x41302684
Breakpoint 2 at 0x41302684
(gdb) continue
Continuing.
---> no problem here:
Breakpoint 2, 0x41302684 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x41302684 <__sigsetjmp+52>: beq 0x413026a0 <Lno_iwmmxt>
(gdb) si
0x41302688 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x41302688 <__sigsetjmp+56>: stfp f2, [r12], #8
(gdb) si
0x4130268c in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x4130268c <__sigsetjmp+60>: stfp f3, [r12], #8
(gdb) si
0x41302690 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x41302690 <__sigsetjmp+64>: stfp f4, [r12], #8
(gdb) continue
Continuing.
Breakpoint 2, 0x41302684 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x41302684 <__sigsetjmp+52>: beq 0x413026a0 <Lno_iwmmxt>
(gdb) continue 46
Will ignore next 45 crossings of breakpoint 2. Continuing.
---> __sigsetjmp still working fine, but then:
Breakpoint 2, 0x41302684 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x41302684 <__sigsetjmp+52>: beq 0x413026a0 <Lno_iwmmxt>
(gdb) si
0x41302688 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x41302688 <__sigsetjmp+56>: stfp f2, [r12], #8
(gdb) si
Program received signal SIGILL, Illegal instruction.
0x4130268c in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc 0x4130268c <__sigsetjmp+60>: stfp f3, [r12], #8
Any suggestions what I could try next?
The posted disassembly is quite interesting.
0x41302678 <__sigsetjmp+40>: fmrx r2, fpscr
0x4130267c <__sigsetjmp+44>: str r2, [r12], #4
0x41302680 <__sigsetjmp+48>: tst r2, #512 ; 0x200
0x41302684 <__sigsetjmp+52>: beq 0x413026a0 <__sigsetjmp+80>
0x41302688 <__sigsetjmp+56>: stfp f2, [r12], #8
*0x4130268c <__sigsetjmp+60>: stfp f3, [r12], #8*
0x41302690 <__sigsetjmp+64>: stfp f4, [r12], #8
0x41302694 <__sigsetjmp+68>: stfp f5, [r12], #8
0x41302698 <__sigsetjmp+72>: stfp f6, [r12], #8
0x4130269c <__sigsetjmp+76>: stfp f7, [r12], #8
The code checks for bit 9 in fpscr, and, if set, tries to save registers f2-f7. What are those? I've never seen them in recent processors, but I think those are FPA ("Floating Point Accelerator") registers, implemented in a few old cores, and used for soft FP before VFP appeared.
So, here's what I think happens:
The libc on your device was compiled
with FPA support, probably by
mistake.
In FPA processors bit 9
meant "FPA enabled" or something
similar
In the debug version of Qt
the bit 9 of FPSCR (DZE = Division
by Zero exception enable bit) is not
set, so they don't try to save FPA
registers. However, it gets set in
the release version.
I see here two options:
Rebuild libc without FPA support
Find where DZE gets set in the release ver (not sure how to do that)
Update: I was wrong. The gdb disassembly confused me. I found the source of setjmp.S, here's the relevant part:
tst a3, #HWCAP_ARM_VFP
beq Lno_vfp
/* Store the VFP registers. */
/* Following instruction is fstmiax ip!, {d8-d15}. */
stc p11, cr8, [r12], #68
/* Store the floating-point status register. */
/* Following instruction is fmrx r2, fpscr. */
mrc p10, 7, r2, cr1, cr0, 0
str r2, [ip], #4
Lno_vfp:
tst a3, #HWCAP_ARM_IWMMXT
beq Lno_iwmmxt
/* Save the call-preserved iWMMXt registers. */
/* Following instructions are wstrd wr10, [ip], #8 (etc.) */
stcl p1, cr10, [r12], #8
stcl p1, cr11, [r12], #8
stcl p1, cr12, [r12], #8
stcl p1, cr13, [r12], #8
stcl p1, cr14, [r12], #8
stcl p1, cr15, [r12], #8
Lno_iwmmxt:
So, it's trying to store WMMXt registers, not FPA. However, there is a bug here. It's using r2 to temporarily store fpscr, but that ovewrites the previously loaded hwcap value in a3 (a3 is the APCS name for r2). Maybe the author meant to use a2, not r2, or maybe the two parts were done by different people. In either case, somehow the release version of Qt changes FPSCR (which is most likely emulated by the kernel) and the code storing iwmmxt regs is triggered.
Still, that's not the whole story. The hwcaps you pasted claim that the CPU does support iWMMXt, so I'm not sure why those instructions would be giving trouble. Maybe the reported PC value is wrong somehow. I think you should try putting breakpoint on __sigsetjmp and stepping through it by instruction (stepi), to see where exactly it crashes.
Hello I had similar problem few days ago... But I run Qt Creator 5.7 on my Slackware Linux in VMware player (not ARM device).
After successful installation I could not start Qt Creator. I tried to run Qt Creator with the following terminal command /opt/Qt5.7.0/Tools/QtCreator/bin/qtcreator and it gave me an error Illegal instruction.
After few hours spent with Google, I tried to run Qt Creator with this terminal commands /opt/Qt5.7.0/Tools/QtCreator/bin/qtcreator -noload Welcome and this worked for me.
Hope this helps someone. Sorry for late response.
Related
I am trying to debug Rcpp compiled code at run-time. I have been trying to get this to work unsuccessfully, for a very long time.
A very similar question was asked here: Debugging (line by line) of Rcpp-generated DLL under Windows which asks the same question, but both the question and the answer are far beyond my understanding.
Here is what I have:
Windows 7 Pro SP1
R 3.5
Rstudio 1.1.463 with Rcpp.
Rbuild Tools from Rstudio. (c++ compiler)
Procedure:
In Rstudio File->New File->C++ File (creates a sample file with a timesTwo function.)
I added a new function in this file:
// [[Rcpp::export]]
NumericVector timesTwo2(NumericVector x) {
for(int ii = 0; ii <= x.size(); ii++)
{
x.at(ii) = x.at(ii) * 2;
}
return x;
}
I checked Source on Save and saved the file as RcppTest.cpp which sources or complies the file successfully.
Run code in Rstudio:
data = c(1:10)
data
[1] 1 2 3 4 5 6 7 8 9 10
timesTwo2(data)
Error in timesTwo2(data) : Index out of bounds: [index=10; extent=10].
The error is because in the for loop is <= x.size() so the result is a run-time error.
The question is how can get a debug output about this error that reasonably tells me what happened?
At the very least I would like to know the line in the code that triggered the exception and with which parameters.
Furthermore, I would really like to execute the code line-by-line to just before the exception so I can monitor exactly what is happening.
I can install any additional programs or apply any other settings as long as I can find precise details on how to do it. For now I am starting from scratch just to get it working. Thank you.
Update:
I found this site: Debugging Rcpp c++ code using gdb
I installed the latest gcc 8.1 with gdb
I found the CXXFLAGS in the makeconf file located in C:\Program Files\R\R-3.5.1\etc\x64
Then I started the Rgui as suggested, but when I try Rcpp:::sourceCpp I get an error:
> library(Rcpp)
> Rcpp::sourceCpp('Rcpptest.cpp')
C:/PROGRA~1/R/R-35~1.1/etc/x64/Makeconf:230: warning: overriding recipe for target '.m.o'
C:/PROGRA~1/R/R-35~1.1/etc/x64/Makeconf:223: warning: ignoring old recipe for target '.m.o'
c:/Rtools/mingw_64/bin/g++ -I"C:/PROGRA~1/R/R-35~1.1/include" -DNDEBUG -I"C:/Users/Michael/Documents/R/win-library/3.5/Rcpp/include" -I"C:/PROGRA~1/R/R-35~1.1/bin/x64" -ggdb -O0 -Wall -gdwarf-2 -mtune=generic -c Rcpptest.cpp -o Rcpptest.o
process_begin: CreateProcess(NULL, c:/Rtools/mingw_64/bin/g++ -IC:/PROGRA~1/R/R-35~1.1/include -DNDEBUG -IC:/Users/Michael/Documents/R/win-library/3.5/Rcpp/include -IC:/PROGRA~1/R/R-35~1.1/bin/x64 -ggdb -O0 -Wall -gdwarf-2 -mtune=generic -c Rcpptest.cpp -o Rcpptest.o, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [C:/PROGRA~1/R/R-35~1.1/etc/x64/Makeconf:215: Rcpptest.o] Error 2
Error in Rcpp::sourceCpp("Rcpptest.cpp") :
Error 1 occurred building shared library.
WARNING: The tools required to build C++ code for R were not found.
Please download and install the appropriate version of Rtools:
http://cran.r-project.org/bin/windows/Rtools/
It looks like it is loading the new CXXFLAGS and it is using DEBUG, but it seems that it still cannot compile. Anybody know why from the error?
I tried running Rstudio the same way as Rgui and it started with many threads showing in the gdb window, but everything in Rstudio ran exactly as before with no additional debug information from Rstudio or gdb.
Update 2:
As the error above states that Rgui did not have Rtools for compiling so I installed the Rtools from the provide link. It installed in C:\Rtools while Rstudio installed in C:\RBuildTools. So I now have 3 compilers, Rtools, RbuildTools and gcc with gdb.
It compiles now, but still gives the same error as I did in Rstudio. I would like to at least get better error output, like the line and value passed.
The instruction say Rgui should have a spot for a break-point, but I cannot find such an option.
Update 3
I was finally able to set up and run a Linux install (Ubuntu 16.04.05).
First here are my CXXFLAGS:
$ R CMD config CXXFLAGS
-g -O0 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g
I had to create a .R folder in my home directory and a Makevar file in it with just the line
CXXFLAGS = -g -O0 -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2
This alone took hours as nowhere did it actually say make the folder and file.
Then I executed the commands as Ralf posted, at the break point:
> timesTwo2(d1)
Thread 1 "R" hit Breakpoint 1, timesTwo2 (x=...) at RcppTest.cpp:19
19 NumericVector timesTwo2(NumericVector x) {
(gdb) n
20 for (int ii = 0; ii <= x.size(); ii++)
(gdb) n
22 x.at(ii) = x.at(ii) * 2;
(gdb) display ii
1: ii = 0
(gdb) n
20 for (int ii = 0; ii <= x.size(); ii++)
1: ii = 0
(gdb) n
22 x.at(ii) = x.at(ii) * 2;
1: ii = 1
(gdb) n
20 for (int ii = 0; ii <= x.size(); ii++)
1: ii = 1
(gdb) display x.at(ii)
2: x.at(ii) = <error: Attempt to take address of value not located in memory.>
(gdb) n
22 x.at(ii) = x.at(ii) * 2;
1: ii = 2
2: x.at(ii) = <error: Attempt to take address of value not located in memory.>
(gdb)
And finally at n = 10:
1: ii = 10
2: x.at(ii) = <error: Attempt to take address of value not located in memory.>
(gdb) n
0x00007ffff792d762 in Rf_applyClosure () from /usr/lib/R/lib/libR.so
(gdb)
This is definitely the furthest I have come to debugging, but this is a very basic function and the debug output and even the error output was not very useful. It gave me the line it was executing and it could display ii, but I could not display the array value or the entire array. Is it possible to create a more specific break point such that it only breaks when ii == 10?
Ideally I would like this in Rstudio or some other GUI that can display the entire vector. Still doing more testing.
The usual approach R -d gdb, which I also suggested in my original answer below, does not work on Windows:
--debugger=name
-d name
(UNIX only) Run R through debugger name. For most debuggers (the exceptions are valgrind and recent versions of gdb), further command line options are disregarded, and should instead be given when starting the R executable from inside the debugger.
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Invoking-R-from-the-command-line
Alternative:
Start R in debugger: gdb.exe Rgui.exe
Set break point: break TimesTwo2
Run R: run
Source file: Rcpp::sourceCpp("debug.cpp")
Use next, print, display to step through the code.
An alternative to step 1. would be to start R, get the PID with Sys.getpid(), attach debugger with gdb -p <pid>. You will than have to use continue instead of run.
I don't have a Windows machine right now, so the following was done on Linux. I hope it is transferable, though. Let's start with a simple cpp file (debug.cpp in my case) that contains your code:
#include <Rcpp.h>
using Rcpp::NumericVector;
// [[Rcpp::export]]
NumericVector timesTwo2(NumericVector x) {
for(int ii = 0; ii <= x.size(); ii++)
{
x.at(ii) = x.at(ii) * 2;
}
return x;
}
/*** R
data = c(1:10)
data
timesTwo2(data)
*/
I can reproduce the error by calling R on the command line:
$ R -e "Rcpp::sourceCpp('debug.cpp')"
R version 3.5.1 (2018-07-02) -- "Feather Spray"
[...]
> Rcpp::sourceCpp('debug.cpp')
> data = c(1:10)
> data
[1] 1 2 3 4 5 6 7 8 9 10
> timesTwo2(data)
Error in timesTwo2(data) : Index out of bounds: [index=10; extent=10].
Calls: <Anonymous> ... source -> withVisible -> eval -> eval -> timesTwo2 -> .Call
Execution halted
Next we can start R with gdb as debugger (c.f. Writing R Extensions as Dirk said):
$ R -d gdb -e "Rcpp::sourceCpp('debug.cpp')"
GNU gdb (Debian 8.2-1) 8.2
[...]
(gdb) break timesTwo2
Function "timesTwo2" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (timesTwo2) pending.
(gdb) run
[...]
> Rcpp::sourceCpp('debug.cpp')
[Thread 0xb40d3b40 (LWP 31793) exited]
[Detaching after fork from child process 31795]
> data = c(1:10)
> data
[1] 1 2 3 4 5 6 7 8 9 10
> timesTwo2(data)
Thread 1 "R" hit Breakpoint 1, 0xb34f3310 in timesTwo2(Rcpp::Vector<14, Rcpp::PreserveStorage>)#plt ()
from /tmp/RtmphgrjLg/sourceCpp-i686-pc-linux-gnu-1.0.0/sourcecpp_7c2d7f56744b/sourceCpp_2.so
(gdb)
At this point you can single step through the program using next (or just n) and output variables using print (or just p). A useful command is also display:
Thread 1 "R" hit Breakpoint 1, timesTwo2 (x=...) at debug.cpp:5
5 NumericVector timesTwo2(NumericVector x) {
(gdb) n
6 for(int ii = 0; ii <= x.size(); ii++)
(gdb) n
8 x.at(ii) = x.at(ii) * 2;
(gdb) display ii
2: ii = 0
(gdb) n
8 x.at(ii) = x.at(ii) * 2;
2: ii = 0
[...]
2: ii = 9
(gdb)
46 inline proxy ref(R_xlen_t i) { return start[i] ; }
2: ii = 9
(gdb)
6 for(int ii = 0; ii <= x.size(); ii++)
2: ii = 10
(gdb)
8 x.at(ii) = x.at(ii) * 2;
2: ii = 10
(gdb)
Error in timesTwo2(data) : Index out of bounds: [index=10; extent=10].
Calls: <Anonymous> ... source -> withVisible -> eval -> eval -> timesTwo2 -> .Call
Execution halted
[Detaching after fork from child process 32698]
[Inferior 1 (process 32654) exited with code 01]
BTW, I used the following compile flags:
$ R CMD config CXXFLAGS
-g -O2 -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2
You might want to switch to -O0.
It can be done with Visual Studio Code, since it can handle both R and C++. This allows you to step through your Rcpp code one line at a time in a GUI environment.
See this demo to get started.
I find breakpad does not handle sigsegv sometimes.
and i wrote a simple example to reproduce it:
#include <vector>
#include <breakpad/client/linux/handler/exception_handler.h>
int InitBreakpad()
{
char core_file_folder[] = "/tmp/cores/";
google_breakpad::MinidumpDescriptor descriptor(core_file_folder);
auto exception_handler_ =
new google_breakpad::ExceptionHandler(descriptor,
nullptr,
nullptr,
nullptr,
true,
-1);
}
int main()
{
InitBreakpad();
// int* ptr = nullptr;
// *ptr = 1;
std::vector<int> sum;
sum.push_back(1);
auto it = sum.begin();
sum.erase(it);
sum.erase(it);
return 0;
}
and gcc is 4.8.5 and my comiple cmd is
g++ test_breakpad.cpp -I./include -I./include/breakpad -L./lib -lbreakpad -lbreakpad_client -std=c++11 -lpthread
run a.out, get "Segmentation fault" but no minidump is generated.
if i uncomment nullptr write, breakpad works!
what should i do to correct it?
GDB debug output:
(gdb) b google_breakpad::ExceptionHandler::~ExceptionHandler()
Breakpoint 2 at 0x402ed0: file src/client/linux/handler/exception_handler.cc, line 264.
(gdb) c
The program is not being run.
(gdb) r
Starting program: /home/zen/tmp/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, google_breakpad::ExceptionHandler::ExceptionHandler (this=0x619040, descriptor=..., filter=0x0, callback=0x0, callback_context=0x0, install_handler=true, server_fd=-1) at src/client/linux/handler/exception_handler.cc:224
224 ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
and i tried breakpad out of process dump, but still got nothing(nullptr write works).
After some debugging I think that the reason that the sum.erase(it) does not create a minidump in your example is due to stack corruption.
While debugging you can see that the variable g_handler_stack_ in src/client/linux/handler/exception_handler.cc is correctly initialized and the google_breakpad::ExceptionHandler instance is correctly added to the vector. However when google_breakpad::ExceptionHandler::SignalHandler is called the vector is reported empty despite no calls to google_breakpad::ExceptionHandler::~ExceptionHandler or any of the std::vector methods that would change the vector.
Some further data points that point to stack corruption is that the code works with clang++. Additionally, as soon as we change the std::vector<int> sum; to a std::vector<int>* sum, which will ensure that we don't corrupt the stack, the minidump is written to disk.
Can someone explain why the following code yields different results on the second printf if I comment the first printf line or not, in 64 bits?
/* gcc -O0 -o test test.c */
#include <stdio.h>
#include <stdlib.h>
int main() {
char a[20] = {0};
char b = 'a';
int count=-1;
// printf("%.16llx %.16llx\n", a, &b);
printf("%x\n", *(a+count));
return 0;
}
I get the following results for the second printf:
commented: 0
uncommented: 61
Thanks in advance!
iansus
Can someone explain why the following code yields different results on the second printf if I comment the first printf line or not
Your program uses a[-1], and thus exhibits undefined behavior. Anything can happen, and figuring out exactly why one or the other thing happenes is pointless.
The precise reason is that you are reading memory that gets written to by the first printf (when commented in).
I get a different result (which is expected with undefined behavior):
// with first `printf` commented out:
ffffffff
// with it commented in:
00007fffffffdd20 00007fffffffdd1b
ffffffff
You could see where that memory is written to by setting a GDB watchpoint on it:
(gdb) p a[-1]
$1 = 0 '\000'
(gdb) p &a[-1]
$2 = 0x7fffffffdd1f ""
(gdb) watch *(int*)0x7fffffffdd1f
Hardware watchpoint 4: *(int*)0x7fffffffdd1f
(gdb) c
Continuing.
Hardware watchpoint 4: *(int*)0x7fffffffdd1f
Old value = 0
New value = 255
main () at t.c:12
12 printf("%.16llx %.16llx\n", a, &b);
It my case above, the value is written as part of initializing count=-1. That is, with my version of gcc, count is located just before a[0]. But this may depend on compiler version, exactly how this compiler was built, etc. etc.
I wanted to use Commonqt using Clozure CL on OS X Lion.
But it was not working...
Commonqt
Commonqt is a Common Lisp binding to the smoke library for Qt.
http://common-lisp.net/project/commonqt/
My setting is that
OS X Lion 10.7.4
Xcode 4.3.3
Clozure CL version 1.8
qt 4.8.2 (git clone git://gitorious.org/qt/qt.git) "./configure && make install && make"
smoke
somkegen (git clone git://anongit.kde.org/smokegen) "cmake . && make install"
smokeqt (git clone git://anongit.kde.org/smokeqt) "cmake . && make install"
quicklisp
local-projects commonqt (git clone git://gitorious.org/commonqt/commonqt.git) "qmake && make clean && make"
cf: http://kvardek-du.kerno.org/2011/12/setting-up-commonqt-on-osx.html
qt.asd(in Commonqt)
(defmethod output-files ((operation compile-op) (c cpp->so)) (values
(loop for filename in '("libcommonqt.so" "libcommonqt.so.1"
"libcommonqt.so.1.0" "libcommonqt.so.1.0.0")
collect (merge-pathnames filename (component-pathname c)))
;; libcommonqt.so* files are never moved to separate FASL directory
t))
so I was changed (so -> dylib) that
(defmethod output-files ((operation compile-op) (c cpp->so)) (values
(loop for filename in '("libcommonqt.dylib" "libcommonqt.1.dylib"
"libcommonqt.1.0.dylib" "libcommonqt.1.0.0.dylib")
collect (merge-pathnames filename (component-pathname c)))
;; libcommonqt.so* files are never moved to separate FASL directory
t))
But Error has Occurred. For example
Source code
http://pleasegodno.wordpress.com/common-lisp-tutorials/common-lisp-gui-programming-with-commonqt/2-classes-and-methods/
? (main)
MAIN
? > Error: Unable to load foreign library (LIBCOMMONQT.DYLIB-31136).
> Error opening shared library /Users/jk/.quicklisp/local-projects/commonqt/libcommonqt.dylib : dlopen(/Users/jk/.quicklisp/local-projects/commonqt/libcommonqt.dylib, 10): no suitable image found. Did find:
> /Users/jk/.quicklisp/local-projects/commonqt/libcommonqt.dylib: mach-o, but wrong architecture.
> While executing: CFFI::FL-ERROR, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 >
Is this My wrong Setting? or Is Commonqt's source not working using Clozure Cl on OS X Lion?
Next, what should I do?
user1234192's advice: ccl -> ccl64
I retry it.
> (qt-conv:main)
2012-07-12 21:46:11.630 dx86cl64[93621:c403] *** Assertion failure in +[NSUndoManager _endTopLevelGroupings], /SourceCache/Foundation/Foundation-833.25/Misc.subproj/NSUndoManager.m:324
2012-07-12 21:46:11.631 dx86cl64[93621:c403] +[NSUndoManager(NSInternal) _endTopLevelGroupings] is only safe to invoke on the main thread.
2012-07-12 21:46:11.633 dx86cl64[93621:c403] (
0 CoreFoundation 0x00007fff8dc49f56 __exceptionPreprocess + 198
1 libobjc.A.dylib 0x00007fff9600dd5e objc_exception_throw + 43
2 CoreFoundation 0x00007fff8dc49d8a +[NSException raise:format:arguments:] + 106
3 Foundation 0x00007fff8de4671f -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 169
4 Foundation 0x00007fff8ddb595f +[NSUndoManager(NSPrivate) _endTopLevelGroupings] + 144
5 AppKit 0x00007fff8b15a0ef -[NSApplication run] + 596
6 QtGui 0x000000000346cc60 _ZN19QEventDispatcherMac13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE + 840
7 QtCore 0x0000000004122838 _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE + 394
8 QtCore 0x0000000004125b0b _ZN16QCoreApplication4execEv + 175
9 libsmokeqtgui.dylib 0x00000000043718e1 _ZN12__smokeqtgui14x_QApplication4x_84EPN5Smoke9StackItemE + 17
10 libsmokeqtgui.dylib 0x00000000043579d8 _ZN12__smokeqtgui18xcall_QApplicationEsPvPN5Smoke9StackItemE + 1304
11 dx86cl64 0x000000000001b0db SPffcall + 99
12 ??? 0x0000000001219328 0x0 + 18977576
)
2012-07-12 21:46:11.634 dx86cl64[93621:c403] *** Assertion failure in +[NSUndoManager _endTopLevelGroupings], /SourceCache/Foundation/Foundation-833.25/Misc.subproj/NSUndoManager.m:324
Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt. You must
reimplement QApplication::notify() and catch all exceptions there.
Unhandled exception 10 at 0x0, context->regs at #xb029af30
Exception occurred while executing foreign code
received signal 10; faulting address: 0x0
? for help
[93621] Clozure CL kernel debugger: help
[93621] Clozure CL kernel debugger: [93621] Clozure CL kernel debugger: [93621] Clozure CL kernel debugger: Segmentation fault: 11
http://kvardek-du.kerno.org/2011/12/setting-up-commonqt-on-osx.html
I was success on clozure cl(Lion)
and...your error message.."libcommonqt.dylib: mach-o, but wrong architecture"
maybe..your ccl and libcommonqt.dylib are differently architecture.
if your ccl is x86(32bit) , libcommonqt.dylib must x86.
if your ccl is x8664(64bit), libcommonqt.dylib must x8664...
I recommend 64-bit.
My English is very ugly. sorry.......
try
(ql:quickload :trivial-main-thread)
(trivial-main-thread:call-in-main-thread #'qt-conv:main)
Hi
I am running a bi-di 'iperf' test on an interface using my driver.
Steps to repro would be to run bi-di I/O on one interface(other interface is not active):
Run iperf -c -P 8 -t 100000 -I 10 on DUT
iperf -c with same params as above from peer almost immediately ( after 1st 10s of above 'iperf send' are over)
With 'iperf -s -w 256K' on both
The crash is not happening as such in the driver but in the 'iperf' context. I am going to copy-paste the stack trace:
PID: 8855 TASK: f7036550 CPU: 0 COMMAND: "iperf"
#0 [c074bed0] crash_kexec at c0443233
#1 [c074bf14] die at c04064d3
#2 [c074bf44] do_page_fault at c062134b
#3 [c074bf94] error_code (via page_fault) at c0405abb
EAX: f5888100 EBX: 00000000 ECX: 00100100 EDX: 00200200 EBP: 00000001
DS: 007b ESI: f5888000 ES: 007b EDI: cb614000
CS: 0060 EIP: c05c4e94 ERR: ffffffff EFLAGS: 00010046
#4 [c074bfc8] net_rx_action at c05c4e94
#5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
#0 [f281ac4c] do_softirq at c04073e5
#1 [f281ac58] do_IRQ at c04074d9
#2 [f281ac70] common_interrupt at c0405975
EAX: 39383736 EBX: f281af4c ECX: 00000428 EDX: 31303938 EBP: f378b042
DS: 007b ESI: f378b1c2 ES: 007b EDI: 09fdb448
CS: 0060 EIP: c04f1c07 ERR: ffffffba EFLAGS: 00000202
#3 [f281aca4] __copy_to_user_ll at c04f1c07
#4 [f281acb0] memcpy_toiovec at c05bfecc
#5 [f281acc4] skb_copy_datagram_iovec at c05c059b
#6 [f281acf4] tcp_rcv_established at c05ef40a
#7 [f281ad20] tcp_v4_do_rcv at c05f48c5
#8 [f281ad54] tcp_prequeue_process at c05e6bdd
#9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
EAX: ffffffda EBX: 0000000a ECX: b6ba2340 EDX: 00014268
DS: 007b ESI: 00000000 ES: 007b EDI: 09fbe630
SS: 007b ESP: b6ba2328 EBP: b6ba2378
CS: 0073 EIP: 004ad410 ERR: 00000066 EFLAGS: 00000293
crash>
the EIP at the time of crash is net_rx_action:0xdd/19ca. Now i have compiled the kernel-2.6.18-238 sources( the source version of the OS on which the DUT is running) and did an 'objdump -S ./net/core/dev.o > dev_o_dmp' on the ./net/core/dev.c which has the definition of the net_rx_acdtion(). Now in the 'dev_o_dmp' file the net_rx_action() has lots of inline definitions and hence somehow does not exactly mirror the flow in the source file. In such a scenario ,is it safe to add 0xdd to the base addr of net_rx_action (say 32FF) => 340C .i.e would 340C be the offending line number that is giving rise to the crash ' kernel paging request error'
Any tips /recommendations on how to go about debugging this problem would be of great help
Unfortunately, or fortunately depending on your perspective, with high levels of optimization it is possible for the compiler to create assembly code that the debug format cannot make a reasonable C code line to assembly instruction(s) mapping. What type of cases you can run into this problem depends on the compiler, optimization level, debug symbol format, debug symbol level, and the code itself.
You have to assume that line numbers gained via this technique could be wrong. That being said, I use this technique frequently in my own kernel work and I have not had any problems yet (knocks on wood). Just remember that if you are faced with something that just makes no sense, you could have a bad line number.