All processes have my_rank = 0, how to fix? - mpi

I have a problem concerning running mpi programs. All processes claims that their rank is 0.
I have searched a lot around, and found out that this is caused by communication between the openMP version and something else, and that I have to check which mpi I invoke. But no one explains properly, in a way that works, how to fix this.
Do I have to uninstall something? In that case, what should I uninstall and how do I do it?
Do I have to install something? What and how?
If the answer is no to the previous questions, how can I then fix it?
How could this problem occur as I only did what my lecturer told me, I think?

This typically occurs when you are mixing two MPI libraries.
For example, you are using mpirun from MPICH but your app is using the libraries from Open MPI.
You should first double check that, for example
$ which mpirun
$ mpirun -np 1 ldd a.out
both should point to the same directory (e.g. same vendor and version)

Related

Program made with PyInstaller now seen as a Trojan Horse by AVG

About a month ago, I used PyInstaller and Inno Setup to produce an installer for my Python 3 script. My AVG Business Edition AntiVirus just started complaining with today's update that the program has an SCGeneric Trojan Horse in the main .exe file used to start the program (in the folder created by PyInstaller that has all of the Python "guts"). At first I just thought it was a false positive in AVG, but submitting the .exe file to VirusTotal I get this analysis:
https://virustotal.com/en/file/9b0c24a5a90d8e3a12d2e07e3f5e5224869c01732b2c79fd88a8986b8cf30406/analysis/1493881088/
Which shows that 11 out of 61 scanners detect a problem:
TheHacker Trojan/Agent.am
NANO-Antivirus Trojan.Win32.Agent.elyxeb
DrWeb Trojan.Starter.7246
Yandex Trojan.Crypren!52N9f3NgRrY
Jiangmin Trojan.Agent.asnd
SentinelOne (Static ML) static engine - malicious
AVG SCGeneric.KTO
Rising Malware.Generic.5!tfe (thunder:5:ujHAaqkyw6C)
CrowdStrike Falcon (ML) malicious_confidence_93% (D)
Endgame malicious (high confidence) 20170503
Zillya Dropper.Sysn.Win32.5954
Now I can't say that these other scanners are ones that I have heard of before... but still I'm concerned that it is not just AVG giving a false positive.
I have submitted the .exe file in question to AVG for their analysis. Hopefully they will back off on whatever it is that they thought they were trying to detect.
Is there anything else I can do with PyInstaller to make it so that the .exe launcher that it created won't be considered a Trojan?
I was always getting some false positives with PyInstaller from VirusTotal. This is how I fixed it:
PyInstaller comes with pre-compiled bootloader binaries for different OSs. I suggest compile them by yourself on your machine. Make sure everything is consistent on your machine. For Windows 64-bit, install Python 64-bit. Download PyInstaller 64-bit for Windows. Make sure Visual Studio (VS) corresponding to your Python is installed, check below:
https://wiki.python.org/moin/WindowsCompilers
Compile the bootloader of PyInstaller on your machine with VS. It automatically updates the run.exe, runw.exe, run_d.exe, runw_d.exe in DownloadedPyinstallerFolder\PyInstaller\bootloader\Windows-64bit. Check below for more info on how to compile the bootloader:
https://pyinstaller.readthedocs.io/en/stable/bootloader-building.html
At the end, install PyInstaller. Within the PyInstaller directory, run
python setup.py install
I was able to submit the file in question to AVG's "Report a false detection" page, at https://secure.avg.com/submit-sample. I received a response back fairly quickly (I can't remember exactly how long, but it was less than a day) that they had analyzed my file and determined that it did not have a virus. They said that they had adjusted their virus definitions so that it would not trigger a false positive anymore. I updated my definitions and it was still triggering, so I contacted them again with my virus definition version, and I heard back that the version I had wasn't high enough - I think there was some delay on my definitions because I get them from a local server. But within a day I had the right version of the definitions and the false positive didn't trigger anymore.
So if you have a false positive with AVG, I would recommend this solution - fairly quick and easy to get a resolution to the problem.
I puzzled over this question for two days and finally found a problem with my application. The issue was with the application's icon.
Example for tkinter:
root.iconbitmap('./icon.ico')
When I removed this line of code, the false-positive Trojan was gone.
Also, make sure not to use --icon dependency when you are converting your .py file into .exe. Otherwise, this will cause the same false-positive Trojan detection.
I faced same issue for my small document register project code.
My temporary solution was to allow the app in windows defender and
other solution was to use the command pyinstaller filename.py instead of pyinstaller --onefile filename.py.
I dont know if it is correct. But it worked for me.
I searched many blogs for weeks. But I found nothing..
Today I found a way to convert py to exe without any virus errors.
Virus Total Report
So in this method you do not need to send any reports.. Actually It is very simple.
You need to install a module named Nuitka.
python -m pip install nuitka
Then you need to open command from from the file path. And use the command;
python -m nuitka --mingw64 filename.py
And that's all.
You can use the command
nuitka --help
You can find more at - Nuitka Guide
I had this same problem using python 3.8.5 and pyinstaller 4.5.1
In my case the first exe build was accepted by the antivirus (Windows Defender) but subsequent builds were flagged as having a trojan.
I solved it by using the pyinstaller --clean option every time I built the executable
Reverting back to PyInstaller 3.1.1 from 3.4 resolved similar issues on my end (at least temporarily).
As #boogie_bullfrog told, reverting to a previous version could be a solution. However I used *.spec file to store some data (like pictures and icons). I had the latest 3.5 version (August, 2019) and moving to 3.1.1 caused error when app was compiled (probably due to supporting Python 3.7).
So right now the easiest solution is to downgrade to 3.4
It supports specs from pyinstaller 3.5 and the onefile-app wasn't detected by Windows 10 built-in firewall
What I did was to solve this(make exe files non detectable as virus) was to downgrade pyinstaller by typing in cmd: pip install pyinstaller==4.1.0
And by the way it didn't work on 3.4.0 so I just randomly picked that version(4.1) and its pretty good looking so far :>
I'm pretty sure that it works on more than only that one version but that i experienced personally
Recompile and then reinstall your Pyinstaller bootloader manually.
This was a problem I had for a while, and my friend and I figured out this resolution with the help of many others. It almost always works to resolve the issue.
I posted the specific steps on my medium blog. Shared the link below, but the basic steps are as follows
Purge Pyinstaller Files within your Project and Rebuild
Uninstall Pyinstaller
Build a Pyinstaller Bootloader with your Compiler
Install the newly compiled Pyinstaller
Re-build your EXE with Pyinstaller, and make sure it’s not being be flagged as a virus
How to Resolve the Python Pyinstaller False Positive Trojan Virus
Part 1. Manually Compile your Pyinstaller Bootloader
Part 2. Working with Anti-Virus Developer(s)
I had a similar problem with a pyinstaller exe under Windows. Avira put that file into quarantine since it was considered potentially dangerous (due to heuristics, which means that some segments look typical for a virus, but no virus is actually found).
Keep in mind that the exe files you generate yourself are unique (as a consequence, the Avast scanner usually returns a message "you have found a rare file, we are doing a quick test", and delays execution for 15 seconds to perform a more thorough test).
My solution consists of some steps:
I have uploaded the exe to https://www.virustotal.com/gui/home/upload to check it with many scanners. If just one or two are detecting a virus, you should be on the safe side.
In order to make your local virus scanner accept the file, you can manually accept it for your computer, but this does not solve the underlying problem, so on other computers it would still be flagged as a virus.
Therefore I reported the file as false positive to Avira, which can simply be done by sending it by email. Other scanners have similar feedback lines. I got a feedback by email within one day that it is ok, and the scanner on my pc agrees with this now. Hope that this helps with the next iterations of my exe so that it stays clean.
Had the same problem today. Win8.1 would keep flagging .exe as virus. Updated to pyinstaller 5.7.0 but the issue persisted. Uninstalled pyinstaller 5.7.0 and did a fresh install. Strangely, Win8.1 isn't complaining anymore!

weird "*** stack smashing detected ***" issue

I have a simple batch program that works on 2 CentOS 6.6 machines ( a 32 bit machine and a 64 bit machine ), but not on a third CentOS 6.6 machine ( a 64 bit machine ). So how can the exact same executable work on 2 machines, but not work on the third machine?
Note that I am not asking how to fix this issue, I am asking how the same exact executable can behave differently on 3 different machines. I actually have 3 or four different C programs that match this behavior, but I am choosing the simplest one to troubleshoot the issue. My theory is that something is setup differently at the OS level between the 3 machines ( maybe I forgot to install some library or set some environmental variable ). I just need help narrowing down where to look, what OS type things to look at.
This probably should go onto serverfault, as it is more a server related question, but I was afraid people there would see my reference to C programs and ask me to come here, so I am going to start here.
Note that valgrind does not help. I would just fix the issue in my code if that were the case, but it revealed no memory issues. When I say it is simple, I mean it. It just reads some records from the database, massages them, and then prints them to the screen.
Thanks for any help you can provide.
Generally the stack smashing warnings / errors are caused by buffer overflow type issues.
I don't know enough about CentOS / Linux to know the exact way this gets configured (my main experience with this type of issue is when running on OpenBSD). Usually this stack smashing detection feature is enabled at compile time.
GDB may be able to help here if you compile your program with the debugging symbols enabled (-g) and load up the resulting core file to look at the backtrace.
For instance, in a simple test program on OpenBSD I see the following backtrace in GDB:
(gdb) bt
#0 0x00001e13837081ea in kill () at <stdin>:2
#1 0x00001e1383745b2c in __stack_smash_handler (func=0x1e117f400ebf "test_smash", damaged=Variable "damaged" is not available.
) at /usr/src/lib/libc/sys/stack_protector.c:61
#2 0x00001e117f300e91 in test_smash () at test.c:10
#3 0x0000000000000000 in ?? ()
where test_smash() is a function that intentionally overflows the stack.
Using this method should allow you to quickly determine which function is causing the stack overflow, and allow you to fix it in the source code.
Ok, it was sort of an OS related issue, peripherally. Basically, when I was installed my needed shared libraries, I installed the wrong version of my ODBC library ( a newer version then my code is used to and was compiled against ). Once I got the older version installed correctly, the error went away. So for anyone else having this problem with the same executable on one machine but not other machines, check your shared libraries, making sure the versions match. Your executable may not like newer or older versions of a given library, for whatever reason.

Torque + mpirun + resources allocation

I'm running Torque with Open MPI on a single machine with 24 cores. Why is it possible to specify in my job,sh, for instance, nodes=1:ppn:2 and still be able to run a job specified by mpirun -np 12 WhatEverCommand? In such case the job is executed on 12 cores, even though the "nodes" says 2 cpus.
Doesn't specifying the "nodes" option make any restrictions on the resources to be used by the submitted job? If it doesn't, then how to prevent users from violating the server rules by overriding the declared resources?
On the other hand - specifying the nodes=1:ppn=8 and mpirun without "-np" option, gives me only 1 cpu running the job.
Am I that bad and missing something fundamental here?
By default, OpenMPI doesn't integrate with Torque at all. You have to compile OpenMPI using the --with-tm configure option, which doesn't seem to be enabled in most distro packages. The OpenMPI project mentions Torque integration in its FAQs on building and running OpenMPI.
Similarly, Torque doesn't actually restrict access to CPUs unless cpuset support is enabled. Again, this seems absent in most distro packages. This is why your OpenMPI app, when compiled without Torque integration, can hit all the cores without restriction.
Building both packages from source is not too difficult, so it's worth researching the configure options and building the support that makes sense for you.

How can my program detect, whether it was launch via mpirun

How can my MPI program detect, if it was launched as a standalone application or via mpirun?
Considering the answer and comments by semiuseless and Hristo Iliev, there is no general and portable way to do this. As a workaround, you can check for environment variables that are set by mpirun. See e.g.:
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
There is no MPI standard way to tell the difference between an MPI application that is launched directly, or as a single rank with mpirun. See "Singleton MPI_Init" for more on this kind of MPI job.
The environment variable checking answer from Douglas is a reasonable hack...but is not portable to any other MPI implementation.

Cannot step into system call source code

I have compiled my freebsd libc source with -g option, so that now I can step in into libc functions.
But I am having trouble stepping into system calls code. I have compiled the freebsd kernel source code with -g. On setting the breakpoint, gdb informs about breakpoint on .S files. On hitting the breakpoint, gdb is unable to step into the syscall source code.
Also, I have tried: gdb$catch syscall open
but this is also not working.
Can you please suggest something?
Thanks.
You appear to have fundamental lack of understanding of how UNIX systems work.
Think about it. Suppose you were able to step into the kernel function that implements a system call, say sys_open. So now you are looking at the kernel source for sys_open in the debugger. The question is: is the kernel running at that point, or is it stopped. Since you will want to do something like next in the debugger, let's assume the kernel is stopped.
So now you press the n key, and what happens?
Normally, the kernel will react to an interrupt raised by the keyboard, figure out which key was pressed, and send that key to the right process (the one that is blocked in read(2) from the terminal that has control of the keyboard).
But your kernel is stopped, so no key press for you.
Conclusion: debugging the kernel via debugger that is running on that same machine is impossible.
In fact, when people debug the kernel, they usually do it by running debugger on another machine (this is called remote debugging).
If you really want to step into kernel, the easiest way to do that is with UML.
After you've played with UML and understand how the userspace/kernel interface works and interacts, you can try kgdb, though the setup is usually a bit more complicated. You don't actually have to have a separate machine for this, you could use VMWare or VirtualPC, or VirtualBox.
As Employed Russian already stated, gdb being in userland cannot inspect anything running in the kernel.
However, nothing prevents to implement a debugger in the kernel itself. In such case, it is possible to set breakpoints and run kernel code step by step from a local debugging session (console). With FreeBSD, such a debugger is available as ddb.
Some limitations would be the lack of connection between your gdb and ddb sessions and I'm unsure source level debugging (-g) is available for kernel code under FreeBSD/ddb.
An alternate and much less intrusive way to 'debug' the kernel from userland would be to use dtrace.

Resources