I'm really just looking to pick the community's brain for some leads in figuring out what is going on with the issue I'm having.
I'm writing a MR job with RHadoop (rmr2, v3.0.0) and things are great -- IO with HDFS, mapping, reducing. No problems. Life is great.
I'm trying to schedule the job with Apache Oozie, and am running into some issues:
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
I've read the rmr2 debugging guide, but nothing is really getting to the stderr because the job fails before anything even gets scheduled.
In my head, everything points to a difference in environments. However, Oozie is running the job as the same user that I'm able to run everything with via cli, and all of the R environment variables (fetched with Sys.getenv()) are the same, excepting there's some additional class path stuff set with Oozie.
I can post more of the OS or Hadoop versions and config details, but sleuthing some version-specific bugs seems like a bit of a red herring as everything runs fine at the command line.
Anybody have any thoughts what might be some helpful next steps in hunting this beast down?
UPDATE:
I overwrote the system function in the base package to log the user, the host name of the node, and the command being executed before the internal call to system. So before any system call is actually executed, I get something like the following in the stderr:
user#host.name
/usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-102.jar ...
When ran with Oozie, the command printed in the stderr fails with an exit status of 1. When I run the command on user#host.name, it runs successfully. So essentially the EXACT same command with the SAME user on the SAME node fails with Oozie, but runs successfully from cli.
Related
I am running a python code on a remote machine. When I run it on the head node of the computer, it executes with no problem.
But when I use Slurm workload manager:
sbatch --wrap="python mycode.py" -N 1 --cpus-per-task=8 -o mycode.o
Then the code fails with the following error (only showing the end of the error):
.
.
line 91, in open
"available".format(result))
dbm.error: db type is dbm.gnu, but the module is not available
I'm just confused how a code could run fine without submitting through Slurm, but fail when I do use Slurm.
The compute (remote) nodes probably don't have the same software installed as the head node, or you may need to do some configuration steps before running. Check with the administrator of the cluster.
I am doing the Pintos project on the side to learn more about operating systems. I had tons of devops trouble at first with it not running well on an 18.04 Ubuntu droplet. I am now running it on the VirtualBox image that UCCS tells students to download for pintos.
I finished project 1 and started to map out my solution to project 2. Following the instructions to create a file I ran
pintos-mkdisk filesys.dsk --filesys-size=2
pintos -- -f -q
but am getting error
Kernel PANIC at ../../threads/vaddr.h:87 in vtop(): assertion
`is_kernel_vaddr (vaddr)' failed.
I then tried running make check (all the tests). They are all failing for the same reason.
Am I missing something? Is there something I need to implement to fix this? I reread the instructions and didnt see anything?
Would appreciate help!
Thanks
I had a similar problem. My code for Project 1 ran fine, but I could not format the filesystem for Project 2.
The failure for me came from the following call chain:
thread_init() -> ... -> thread_schedule_tail() -> process_activate() -> pagedir_activate() -> vtop()
The problem is that init_page_dir is still NULL when pagedir_activate() is called. init_page_dir should have been initialized in paging_init() but this is called after thread_init().
The root cause was that my scheduler was being called too early, i.e. before the call to thread_start(). The reason for my problem was that I had built in a call to thread_yield() upon completion of every call to lock_release() which makes sense from a priority donation standpoint. Unfortunately, locks are used prior to the scheduler being ready! To fix this, I installed a flag called threading_started that bails in the first line of my thread_block() and thread_yield() functions if thread_start() has not yet been called.
Good luck!
This is the first time I have tried to connect R and Tableau.
I have downloaded and installed Rserve successfully but every time I try to start Rserve is get this warning:
Starting Rserve...
"C:\Users\SIMON~1.HAR\DOCUME~1\R\WIN-LI~1\3.1\Rserve\libs\x64\Rserve.exe"
Warning message:
running command '"C:\Users\SIMON~1.HAR\DOCUME~1\R\WIN-LI~1\3.1\Rserve\libs\x64\Rserve.exe" ' had status 127
I have been searching for days and couldn't find any fix.
The Rserve() function is trying to start an application (Rserve.exe) and failed. There's a couple of things you can do.
Go to the "C:\Users\SIMON~1.HAR\DOCUME~1\R\WIN-LI~1\3.1\Rserve\libs\x64\" file directory and try loading the exe yourself, troubleshoot from there.
use the run.Rserve() function instead of Rserve(). This will use your current R session to start the Rserve server. This means the exe doesn't need to be run. This worked well for me because I am working in an environment where I don't enough privileges to run the exe. It does mean that your R session can't do anything else while the server is running, but you can always load up 2 sessions at the same time.
I work with Rscript on a cluster. qmake (a specialized version of GNU-make for the cluster) is used to parallelize jobs on several nodes. But Rscript seems to need to write a Xauthority file and it creates an error when every nodes work in the same time. In this way, my makefile-bases pipeline stops after the first group of parallelized tasks and don't start the next group of tasks. But the results of the first group are ok.
I'm also invoking usr/bin/xvfb-run ( https://en.wikipedia.org/wiki/Xvfb) when runnning RScript.
I've already changed the ssh-config (FORWARD X11 yes) but the problem persists.
I also tried to change the name of Xauthority file for each job but it didn't work (option -f in Rscript).
Here is the error which appear at the beginning of the process
/usr/bin/xauth: error in locking authority file .Xauthority
Here is the error which appears before the process stops :
/usr/bin/xvfb-run: line 171: kill: (44402) - No such process
qmake: *** [Data/Median/median_B00FTXC.tmp] Error 1
I'm supposed to run some jbehave(automated) tests in bamboo. Once the tests run I'll generate some junit compatible xml files so that bamboo could understand the same. All the jbehave tests are ran as part of a script, because I need to run the jbehave tests in a separate display screen(remember these are automated browser tests). Example script is as follows.
Ex:
export DISPLAY=:0 && xvfb-run --server-args="-screen 0, 1024x768x24"
mvn clean integration-test -DskipTests -P integration-test -Dtest=*
I have one more junit parser task which points to the generated junit compatible xml files. So, once the bamboo build runs and even if all the tests pass, I get red build with the message "No failed tests found, a possible compilation error occurred."
Can somone please help me on this regard.
Your build script might be producing successful test reports, but one (or both, possibly) of your tasks is failing. That means that the failure is probably* occurring after your tests complete. Check your build logs for errors. You might also try logging in to your Bamboo server (as the bamboo user) and running the commands by hand.
I've seen this message in the past when our test task was crashing halfway through the test run, resulting in one malformed report that Bamboo ignored and a bunch of successful reports.
*Check the build log to make sure that your tests are indeed running. If mvn clean doesn't clean out the test report directory, Bamboo might just be parsing stale test reports.
EDIT: (in response to Kishore's links)
It looks like your task to kill Xvfb is what is causing the build to fail.
18-Jul-2012 09:50:18 Starting task 'Kill Xvfb' of type 'com.atlassian.bamboo.plugins.scripttask:task.builder.script'
18-Jul-2012 09:50:18
Beginning to execute external process for build 'Functional Tests - Application Release Test - Default Job'
... running command line:
/bin/sh
/tmp/FUNC-APPTEST-JOB1-91-ScriptBuildTask-4153769009554485085.sh
... in: /opt/bamboo-home/xml-data/build-dir/FUNC-APPTEST-JOB1
... using extra environment variables:
<..snip (no meaningful output)..>
18-Jul-2012 09:50:18 Failing task since return code was 1 while expected 0
18-Jul-2012 09:50:18 Finished task 'Kill Xvfb'
What does your "Kill Xvfb" script do? Are you trying something like pkill -f "[x]vfb"? pkill -f silently returns non-zero if it can't match the expression to any processes.
My solution was to make a 'script' task:
#!/bin/bash
/usr/local/bin/phpcs --report=checkstyle --report-file=build/logs/checkstyle.xml --standard=PSR2 ./lib | exit 0
Which always exits with status 0.
This is because PHP code sniffer return exit status 1 when only 1 coding violation (warning / error) is found which causes the built to fail.
Turns out to be a simple fix.
General bamboo behavior is to scan the entire log and see for any failure codes(1). For this specific configuration i had some 6 scripts out of which one of them was to kill the xvfb(frame buffer). For some reason server is not able to kill xvfb and that task was returning a failure code. Because of this, though all the tests passed, bamboo got one of this error codes from previous tasks and build was failing.
Current fix is to remove the task which kills xvfb and the build went green! \o/.