PIN Assert- AddVmThread: 155: assertion failed: m_sysIdMap.find(sysThreadId) == m_sysIdMap.end() - intel-pin

I'm using latest version of Pin (3.24) on propriety version of CentOS (version not sure, I'm checking) inside a container, running on x86-64.
I am getting following assert on running my pintool which adds sleep at a particular line# in code.
Application is our propriety OS.
Assert -
A: /tmp_proj/pinjen/workspace/pypl-pin-nightly/GitPin/Source/pin/vm/vm_threaddb.cpp: AddVmThread: 155: assertion failed: m_sysIdMap.find(sysThreadId) == m_sysIdMap.end()
Can you pls explain what this assert means ? How to avoid this, I'm running inside a container. Stack is not dumped. Instrumentation (Sleep at correct line#) is added but after 10mins of run it crashes.
Sometimes it crashes after executing sleep(analysis) function 50k times and sometimes 150k times..
Not sure what is happening. Pls help.
Later I tried to simply run "WITHOUT my PIN TOOL" and then also it hit the same assert.
$ <pin> -- <my custom OS executable>
[ Note there is NO PIN TOOL above but still got below assert.]
A: /tmp_proj/pinjen/workspace/pypl-pin-nightly/GitPin/Source/pin/vm/vm_threaddb.cpp: AddVmThread: 155: assertion failed: m_sysIdMap.find(sysThreadId) == m_sysIdMap.end()
#############################################################
## STACK TRACE
#############################################################
Pin must be run with tool in order to generate Pin stack trace
Detach Service Count: 20269696
Pin: pin-3.24-98612-6bd5931f2
Could you pls explain how can I avoid the above assert ?
Thanks

Related

SLURM: how to disable automatic job cleanup when one PE crashes

I distribute OpenMPI-based application using SLURM launcher srun. When one the process crashes, I would like to detect that in the other PEs and to do some actions. I am aware of the fact that OpenMPI does not have fault-tolerance, but still I need to perform a graceful exit in other PEs.
To do this, every PE has to be able:
To continue running despite the crash of another PE.
To detect that one of the PEs crashed.
Currently I'm focusing on the first task. According to the manual, srun has --no-kill flag. However, it does not seem to work for me. I see the following log messages:
srun: error: node0: task 0: Aborted // this is where I crash the PE deliberately
slurmstepd: error: node0: [0] pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: status = -25: Interrupted system call (4)
srun: Jb step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: ***STEP 12123.0 ON node0 CANCELLED AT 2020-12-02 ***
srun: error: node0: task 1: Killed // WHY?!
Why does it happen? Is there any other relevant flag or environment variable, or any configuration option that might help?
To reproduce the problem, one can use the following program (it uses Boost.MPI for brevity, but has the same effect without Boost as well):
#include <boost/mpi.hpp>
int main() {
using namespace boost::mpi;
environment env;
communicator comm;
comm.barrier();
if (comm.rank() == 0) {
throw 0;
}
while (true) {}
}
According to the documentation that you linked, the --no-kill flag only affects the behaviour in case of node failure.
In your case you should be using the --kill-on-bad-exit=0 option that will prevent the rest of the tasks to be killed if one of them exits with a non-zero exit code.

Pintos - UserProg all tests fail is_kernel_vaddr()

I am doing the Pintos project on the side to learn more about operating systems. I had tons of devops trouble at first with it not running well on an 18.04 Ubuntu droplet. I am now running it on the VirtualBox image that UCCS tells students to download for pintos.
I finished project 1 and started to map out my solution to project 2. Following the instructions to create a file I ran
pintos-mkdisk filesys.dsk --filesys-size=2
pintos -- -f -q
but am getting error
Kernel PANIC at ../../threads/vaddr.h:87 in vtop(): assertion
`is_kernel_vaddr (vaddr)' failed.
I then tried running make check (all the tests). They are all failing for the same reason.
Am I missing something? Is there something I need to implement to fix this? I reread the instructions and didnt see anything?
Would appreciate help!
Thanks
I had a similar problem. My code for Project 1 ran fine, but I could not format the filesystem for Project 2.
The failure for me came from the following call chain:
thread_init() -> ... -> thread_schedule_tail() -> process_activate() -> pagedir_activate() -> vtop()
The problem is that init_page_dir is still NULL when pagedir_activate() is called. init_page_dir should have been initialized in paging_init() but this is called after thread_init().
The root cause was that my scheduler was being called too early, i.e. before the call to thread_start(). The reason for my problem was that I had built in a call to thread_yield() upon completion of every call to lock_release() which makes sense from a priority donation standpoint. Unfortunately, locks are used prior to the scheduler being ready! To fix this, I installed a flag called threading_started that bails in the first line of my thread_block() and thread_yield() functions if thread_start() has not yet been called.
Good luck!

Why can an Rsession process continue after SIGSEGV and what does it mean

I'm developing an R package for myself that interacts with both Java code using rJava and C++ code using Rcpp. While trying to debug Rsession crashes when working under Rstudio using lldb, I noticed that lddb outputs the following message when I try to load the package I'm developing:
(lldb) Process 19030 stopped
* thread #1, name = 'rsession', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007fe6c7b872b4
-> 0x7fe6c7b872b4: movl (%rsi), %eax
0x7fe6c7b872b6: leaq 0xf8(%rbp), %rsi
0x7fe6c7b872bd: vmovdqu %ymm0, (%rsi)
0x7fe6c7b872c1: vmovdqu %ymm7, 0x20(%rsi)
(where 19030 is the pid of rsession). At this point, Rstudio stops waiting for lldb to resume execution, but instead of getting the dreaded "R session aborted" popup, entering the 'c' command in lldb resumes the rsession process and Rstudio continues chugging along just fine and I can use the loaded package with no problems. i.e.:
c
Process 19030 resuming
What is going on here? Why is Rstudio's rsession not crashing if lldb says it has "stopped"? Is this due to R's (or Rstudio's?) SIGSEGV handling mechanism? Does that mean that the original SIGSEGV is spurious and should not be a cause of concern? And of course (but probably off topic in this question): how do I make sense of lldb's output in order to ascertain if this SIGSEGV on loading my package should be debugged further?
The SIGSEGV does not occur in Rsession's process, but in the JVM process launched by rJava on package load. This behaviour is known and due to JVM's memory management, as stated here:
Java uses speculative loads. If a pointer points to addressable
memory, the load succeeds. Rarely the pointer does not point to
addressable memory, and the attempted load generates SIGSEGV ... which
java runtime intercepts, makes the memory addressable again, and
restarts the load instruction.
The proposed workaround for gdb works fine:
(gdb) handle SIGSEGV nostop noprint pass

ESP8266 : WPA ENTERPRISE LOGIN WITHOUT CA CERTIFICATE

Thanks a lot for giving your time and reading this,
I have gone though the WPA enterprise question posted in the ESP8266 forum and the related links, but it's not able to help me so I'm starting another topic.
I am trying to connect my ESP to my office network.
To connect through mobile this is usually is the setting:
SSID:PanLAN
EAP methode : PEAP
Phase 2 authentication: None
Ca Certificate: None
Identity: Asia-Pacific\SLAIK
password: Badonkadong
I took the edurom.ino from joostd's github and tried to run it.
https://github.com/genomics-admin/esp8266-eduroam/tree/master/Arduino
I got the following error.
error: 'wifi_station_set_username' was not declared in this scope
wifi_station_set_username(identity, sizeof(identity));
^
ESP8266_PEAPAuth:323: error: 'wifi_station_set_cert_key' was not declared in this scope
if( wifi_station_set_cert_key(testuser_cert_pem, testuser_cert_pem_len, testuser_key_pem, testuser_key_pem_len, NULL, 0) == 0 ) {
^
ESP8266_PEAPAuth:337: error: 'wifi_station_clear_cert_key' was not declared in this scope
wifi_station_clear_cert_key();
^
ESP8266_PEAPAuth:338: error: 'wifi_station_clear_username' was not declared in this scope
wifi_station_clear_username();
^
exit status 1
'wifi_station_set_username' was not declared in this scope.
I'm new to ESP/Arduino however to my eyes it looks like i'm missing the user_interface.h and c_type.h file.
The github account don't have them, so where can I get
them from?
I know user_interface.h is available on github
from other projects, but can I use them?
If yes, then
where shall I put the user_interface.h after downloading?
I found ctype.h in my computer # "C:\Program Files
(x86)\Arduino\hardware\tools\avr\avr\include" is it the same as
c_type.h mentioned in the code? can I rename and use ctype.h?
Is there any other/better example available for me to
follow and get my ESP connected to the network?

Riak: "Failed to read test value: {error,{insufficient_vnodes,0,need,1}}" after running "riak-admin test"

Getting this error soon after running riak start despite a config file that should be working correctly.
Turns out that this is a limit of Riak's error messaging: you will get the above message if you try to do a riak-admin test on your setup before the configuration has finished loading.
I encountered the same problem while starting new Riak clusters over and over again during automated testing. My solution was, in my test fixture setup, to execute code that keeps trying to put an object into a Riak bucket and then eventually succeeding.
Granted, my solution here is an Erlang snippet but it generally solves this problem in lieu of any Riak-supplied admin/wait functions. But since I've used a number of different Riak versions this technique here seems to work for all of them.
wait_for_riak() ->
{ok, C} = riak:local_client(),
io:format("Waiting for Raik..."),
wait_for_riak(C),
io:format("and had a successful put.~n").
wait_for_riak(C) ->
Strawman = riak_object:new(<<"test">>, <<"strawman">>, []),
case C:put(Strawman, 1) of
ok ->
ok;
_Error ->
receive after 1000 -> ok end,
wait_for_riak(C)
end.
adding sleep 4 like so:
brew install riak
riak start
sleep 4
riak-admin test
should help

Resources