On raspbian , I can't run a program linked with libcrypto from gdb. It doesn't matter what the program contains. Example:
(gdb) r
Starting program: /home/pi/test/test
Program received signal SIGILL, Illegal instruction.
0x400844c0 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0
(gdb) signal SIGILL
Continuing with signal SIGILL.
Cannot access memory at address 0x0
Program received signal SIGILL, Illegal instruction.
0x400844c8 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0
(gdb) signal SIGILL
Continuing with signal SIGILL.
Cannot access memory at address 0x0
[Inferior 1 (process 1566) exited normally]
(gdb)
*
Related
I have compiled this code:
program mpisimple
implicit none
integer ierr
include 'mpif.h'
call mpi_init(ierr)
write(6,*) 'Hello World!'
call mpi_finalize(ierr)
end
using the command: mpif90 -o helloworld simplempi.f90
When I run with this command:
$ mpiexec -np 1 ./helloworld
Hello World!
it works fine as you can see. But when I run with any other number of processors (here 4) I get the errors and I basically have to ctrl+C to kill it.
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805).....: fail failed
MPID_Init(1859)...........: channel initialization failed
MPIDI_CH3_Init(126).......: fail failed
MPID_nem_init_ckpt(858)...: fail failed
MPIDI_CH3I_Seg_commit(427): PMI_KVS_Get returned 4
In: PMI_Abort(69777679, Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805).....: fail failed
MPID_Init(1859)...........: channel initialization failed
MPIDI_CH3_Init(126).......: fail failed
MPID_nem_init_ckpt(858)...: fail failed
MPIDI_CH3I_Seg_commit(427): PMI_KVS_Get returned 4)
forrtl: severe (174): SIGSEGV, segmentation fault occurred
What could be the problem? I am doing this on a Linux hpc system.
I figured out why this happened. The system I am using does not require users to submit single-core jobs through the scheduler, but does require it for multi-core jobs. Once the mpiexec command was submitted through a PBS bash script, the errors went away and output was as expected.
I'm testing simple MPI programs locally on my Ubuntu Focal server (Open MPI 4.0.3). However whatever I run with mpirun it produces an annoying message No protocol specified. The problem appears to be related to the fact that mpirun is trying to connect to the X server. How can I disable this behavior so I can use mpirun without a X server ready? I primarily work over SSH (text-only, with tmux).
An example of what I'm doing:
ubuntu#iBug-Server:~$ cat test.c
#include <mpi.h>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
// This is a stub program
MPI_Finalize();
return 0;
}
ubuntu#iBug-Server:~$ mpicc test.c
ubuntu#iBug-Server:~$ mpirun -np 2 a.out
No protocol specified
ubuntu#iBug-Server:~$
Update 1: It appears to be related to LightDM and Xorg. The unwanted message goes away after systemctl stop lightdm. Alternatively, running Open MPI in a graphical terminal (connected via VNC or RDP (xrdp), both work) also eliminates the message, as strace shows that the connection to the X server is successful.
I have a custom nrf52 chip on a pcb with swd pins exposed. I have cloned and installed the latest openocd from https://github.com/ntfreak/openocd. The latest version includes all the latest pathes for the nrf52 chip, so no need for any additional changes as suggested in many older guides online. I am able to connect to the chip using ST-LinkV2. when connected I can read and write memory locations using mdw and mdb. I can also run some basic openocd commands like dump_image e.t.c, which confirms that the setup is good. But halt and program commannds always lead to errors like:
JTAG failure -4
JTAG failure -4
JTAG failure -4
JTAG failure -4
JTAG failure -4
JTAG failure -4
target halted due to debug-request, current mode: Thread
xPSR: 00000000 pc: 00000000 msp: 00000000
jtag status contains invalid mode value - communication failure
Polling target nrf52.cpu failed, trying to reexamine
Examination failed, GDB will be halted. Polling again in 100ms
Previous state query failed, trying to reconnect
jtag status contains invalid mode value - communication failure
Polling target nrf52.cpu failed, trying to reexamine
if I try to use flash image_write I get the error,
JTAG failure
Error setting register
error starting target flash write algorithm
Failed to enable read-only operation
Failed to write to nrf52 flash
error writing to flash at address 0x00000000 at offset 0x00000000
in procedure 'dap'
jtag status contains invalid mode value - communication failure
Polling target nrf52.cpu failed, trying to reexamine
I have read different guides online, and one of the possible solutions involves the APPPROTECT register which has to be disabled to enable any writes to flash.
APP_PROTECT, But the dap commmand which is supposed to help us access this bit,
dap apreg 1 0x04 0x01
returns an error:
invalid subcommand apreg 1 0x04 0x01
Please, I will like to know if anyone has had success programming a new empty nrf52 chip with the stlink-v2 and the steps which are necessary, or if any one has encountered similar problems. Thanks.
Here is my config file:
#nRF52832 Target
source [find interface/stlink.cfg]
transport select hla_swd
source [find target/nrf52.cfg]
#reset_config srst_nogate connect_assert_srst
I solved the "protected nRF52" chip problem this way, on Windows, using a Particle.io Debugger https://store.particle.io/products/particle-debugger setup to program nRF52 chips from Arduino as described in https://www.forward.com.au/pfod/BLE/LowPower/index.html
Note: The recovery process described here does NOT need Arduino to be installed
Download OpenOCD-20181130.7z pre-compiled openocd for windows from
http://gnutoolchains.com/arm-eabi/openocd/
The latest version of openocd src on https://github.com/ntfreak/openocd should also work as it includes the apreg cmd in target\arm_adi_v5.c
unzip, open cmd prompt to unzip dir, enter cmd
bin\openocd.exe -d2 -f interface/cmsis-dap.cfg -f target/nrf52.cfg
response
Info : auto-selecting first available session transport "swd". To override use '
transport select <transport>'.
adapter speed: 1000 kHz
cortex_m reset_config sysresetreq
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : CMSIS-DAP: SWD Supported
Info : CMSIS-DAP: FW Version = 1.10
Info : CMSIS-DAP: Interface Initialised (SWD)
Info : SWCLK/TCK = 1 SWDIO/TMS = 1 TDI = 0 TDO = 0 nTRST = 0 nRESET = 1
Info : CMSIS-DAP: Interface ready
Info : clock speed 1000 kHz
Info : SWD DPIDR 0x2ba01477
Error: Could not find MEM-AP to control the core
Info : Listening on port 3333 for gdb connections
Open telnet program eg teraTerm and connect to localhost on port 4444, i.e. 127.0.0.1 telnet port 4444
cmd window shows
Info : accepting 'telnet' connection on tcp/4444
in telnet (i.e. teraTerm) type
nrf52.dap apreg 1 0x04
returns 0 <<< chip protected
then
nrf52.dap apreg 1 0x04 0x01
then
nrf52.dap apreg 1 0x04
returns 1 << chip un-protected
then power cycle board
Can now use arduino ide to flash softdevice and code low power BLE
Even though the dap command is listed by openOCD help, it isn't implemented for transport hla_swd that you have to use with ST-Link.
If the ST-Link is a generic type from China, it can be upgraded to CMSIS-DAP which uses the swd transport and supports the nrf52.dap apreg 1 0x04 0x01 command to disable the readback protection and erase the flash. You'll need another ST-Link to do that, or you can instead install CMSIS-DAP on a generic STM32F103C8T6 board.
After that you can either use ST-Link to program the nRF52 or continue using CMSIS-DAP, which can also be used to program STM32 MCU.
Nucleo board embedded ST-Links can also be upgraded to J-Link, which allow the use of the "recover" option in nRFgo Studio to erase the flash, it should also work with "nrfjtool --recover" or OpenOCD.
If anyone encounters this problem, I solved the problem by getting an original Jlink-Edu. I also had to pull the reset pin of the microcontroller high to get the jlink working.
There are lots of JTAG messages.
I think you might be missing the
transport select hla_swd
line in your (board) cfg file. The NRF5x chips only work properly with SWD, and ST-Link uses the hla_swd variant.
I am running a MPI application on a cluster, using 4 nodes each with 64 cores.
The application performs an all to all communication pattern.
Executing the application by the following runs fine:
$: mpirun -npernode 36 ./Application
Adding a further process per node let the application crash:
$: mpirun -npernode 37 ./Application
--------------------------------------------------------------------------
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.
For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: laser045
Local device: qib0
Queue pair type: Reliable connected (RC)
--------------------------------------------------------------------------
[laser045:15359] *** An error occurred in MPI_Issend
[laser045:15359] *** on communicator MPI_COMM_WORLD
[laser045:15359] *** MPI_ERR_OTHER: known error not in list
[laser045:15359] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[laser040:49950] [[53382,0],0]->[[53382,1],30] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 163]
[laser040:49950] [[53382,0],0]->[[53382,1],21] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 154]
--------------------------------------------------------------------------
mpirun has exited due to process rank 128 with PID 15358 on
node laser045 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[laser040:49950] 4 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed
[laser040:49950] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[laser040:49950] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
EDIT added some source code of all to all communication pattern:
// Send data to all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
if((unsigned)rank == i){
continue;
}
MPI_Request request;
MPI_Issend(&data, dataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &request);
requests.push_back(request);
}
// Recv data from all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
if((unsigned)rank == i){
continue;
}
MPI_Status status;
MPI_Recv(&recvData, recvDataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
}
// Finish communication operations
for(MPI_Request &r: requests){
MPI_Status status;
MPI_Wait(&r, &status);
}
Is there something I can do as cluster user or some advices I can give the cluster admin ?
The line mca_oob_tcp_msg_send_handler error line may indicate that the node corresponding to a receiving rank died (ran out of memory or received a SIGSEGV):
http://www.open-mpi.org/faq/?category=tcp#tcp-connection-errors
The OOB (out-of-band) framework in Open-MPI is used for control messages, not for the messages of your applications. Indeed, messages typically go throught byte transfer layers (BTLs) such as self, sm, vader, openib (Infiniband), and so on.
The output of 'ompi_info -a' is useful in that regard.
Finally, it is not specified in the question is the Infiniband hardware vendor is Mellanox, so the XRC option may not work (for instance, Intel/QLogic Infiniband does not support this option).
The error is connected to the buffer size
of the mpi message queues commented here:
http://www.open-mpi.org/faq/?category=openfabrics#ib-xrc
The following environment setting solved my problem:
$ export OMPI_MCA_btl_openib_receive_queues="P,128,256,192,128:S,65536,256,192,128"
I'm a beginner in using MPI. Here I wrote a very simple program to test if MPI can run. Here is my hello.c:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
MPI_Barrier(MPI_COMM_WORLD);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
MPI_Finalize();
}
I use to node to test, the hostfile is: node1 node2
So I have two machines with name node1 and node2. I can ssh to each other without password.
I launch the program by typing: mpirun -np 2 -f hostfile ./hello.
The executable hello is in the same directory in both machine.
Then after I run, I get an error:
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(331)....: Failure during collective
MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....:
dequeue_and_set_error(596): Communication error with rank 0 Fatal
error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(331)....: Failure during collective
MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....:
dequeue_and_set_error(596): Communication error with rank 1
If I comment out the MPI_Barrier(), it can work properly. It seems the communication between machines has problem? Or I didn't install openmpi correctly? Any ideas?
I'm using Ubuntu 12.10
I got some hints: This doesn't work well in MPICH2, if I use openmpi, then it works. I installed MPICH just by sudo apt-get install mpich2. Do I miss something? The size of mpich2 is much smaller than openmpi
In /etc/hosts, newer versions of some Linux distros add the following types of lines at the top of the file:
127.0.0.1 localhost
127.0.0.1 [hostname]
This should be changed so that the hostname line contains your actual IP address. The MPI hydra process will abort if you do not make this change with errors like:
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(292)......:
MPIR_Barrier_or_coll_fn(121):
MPIR_Barrier_intra(83)......:
dequeue_and_set_error(596)..: Communication error with rank 0