I am following the examples given in the “Using MPI” book and am working on the example that turns on MPE logging (pmatmatlog.c - ported it from the Fortran example). It runs and produces a log file called “pmatmat.log.clog2”. I would like to visualize this log file.
I started by trying to use “jumpshot-4”, because that is what is installed and I cannot find a way to download jumpshot-2 (which appears to favor log files in the clog2 format?). Jumpshot-4, however, wants files in slog2 format and gives an error about not finding “clog2TOslog2” in the TAU directory tree.
Looking at the head of the clog2 file, it appears to be correct as far as I can tell:
$ head pmatmat.log.clog2
CLOG-02.44is_big_endian=TRUE is_finalzed=TRUE block_size=65536num_buffered_blocks=128max_comm_world_size=4max_thread_count=1known_eventID_start=0user_eventID_start=600known_solo_eventID_start=-10user_solo_eventID_start=5000known_stateID_count=300user_stateID_count=4known_solo_eventID_count=0user_solo_eventID_count=0commtable_fptr=0107374182466560>? … <and on into a lot of unreadable binary>
When I try to convert my file clog2 file using the command line calling “clog2TOslog2”, I get the following:
$ Clog2ToSlog2 ./pmatmat.log.clog2
GUI_LIBDIR is set. GUI_LIBDIR = /Users/markrbower/mpi/lib
**** Error! State!=State
**** Error! State!=State
**** Error! State!=State
**** Error! State!=State
java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at logformat.clog2TOdrawable.InputLog$ContentIterator.hasNext(InputLog.java:481)
at logformat.clog2TOdrawable.InputLog.peekNextKind(InputLog.java:58)
at logformat.slog2.output.Clog2ToSlog2.main(Clog2ToSlog2.java:77)
Caused by: logformat.clog2TOdrawable.NoMatchingEventException: No matching State end-event for Record RecHeader[ time=1.9073486328125E-5, icomm=0, rank=1, thread=0, rectype=6 ], RecCargo[ etype=1, bytes=bstartbendcomputecomputedpma ]
at logformat.clog2TOdrawable.Topo_State.matchFinalEvent(Topo_State.java:95)
... 7 more
java.lang.reflect.InvocationTargetException
After a lot of errors that all look to be “InvocationTargetException” as above, the tail says:
…
SLOG-2 Header:
version = SLOG 2.0.6
NumOfChildrenPerNode = 2
TreeLeafByteSize = 65536
MaxTreeDepth = 0
MaxBufferByteSize = 1346
Categories is FBinfo(157 # 1454)
MethodDefs is FBinfo(0 # 0)
LineIDMaps is FBinfo(232 # 1611)
TreeRoot is FBinfo(1346 # 108)
TreeDir is FBinfo(38 # 1843)
Annotations is FBinfo(0 # 0)
Postamble is FBinfo(0 # 0)
Number of Drawables = 20
Number of Unmatched Events = 0
Total ByteSize of the logfile = 8320
timeElapsed between 1 & 2 = 13 msec
timeElapsed between 2 & 3 = 79 msec
$
There are several routes I could see to finally get to visualizing the log file:
1. re-install TAU and hope that allows jumpshot-4 to find the converter
2. install MPI2 with MPE2 and try the newer version of everything
3. find a way to download and install Jumpshot-2 and hope that reads clog2 files
4. find some other way to convert clog2 to slog2
Why isn’t the conversion working and which is the best option to pursue?
Related
Long story short: I have been given some Python code for a custom openAI gym environment. I can successfully run the code via ExperimentGrid from the command line but would like to be able to run the entire experiment from within Jupyter notebook, rather than calling scripts. This would be more convenient for some experiments that I will be doing farther down the road.
My question: Is it possible to execute an experiment on a custom OpenAI gym environment entirely from within Jupyter Notebook and if so, how? I've seen plenty of examples of people executing gym's standard environments (like SpaceInvaders-v0 or CartPole-v0) from Jupyter but even then, they are calling the environment with
env=gym.make('SpaceInvaders-v0')
and essentially executing that environment's script behind the scenes.
Below is a basic description of how my code is set-up to run from the command line and the errors that I'm getting in Jupyter.
Any advice would be appreciated. I am admittedly rather new to Gym, Python and Linux.
My basic environment code is structured like this in, say, envs/mygames/Custom_Env.py:
various import statements (numpy, gym, pyglet, copy)
class Entity()
class State()
class The_Custom_Env(core.Env) # This is the main environment class
class Shell_Class # This class calls The_Custom_Env and provides some arguments
In mygames/__ init__.py,I import the Shell_Class:
from gym.envs.mygames.Custom_Env import Shell_Class
In envs/__ init__.py, I have the environment registered
register(
id='TEST-v0',
entry_point='gym.envs.mygames:Shell_Class',
max_episode_steps=200,
reward_threshold=25.0,)
Finally, if I execute a script containing this code from the command line, the experiment works without issue:
from spinup.utils.run_utils import ExperimentGrid
from spinup import ppo_pytorch
import torch
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--cpu', type=int, default=4)
parser.add_argument('--num_runs', type=int, default=1)
args = parser.parse_args()
eg = ExperimentGrid(name='super-cool-test')
eg.add('env_name', 'TEST-v0', '', True)
eg.add('seed', [10*i for i in range(args.num_runs)])
eg.add('epochs', [10])
eg.add('steps_per_epoch', 4000)
eg.add('ac_kwargs:hidden_sizes', [(32, 32)], 'hid')
eg.add('ac_kwargs:activation', [torch.nn.ReLU], '')
eg.add('pi_lr', [0.001])
eg.add('clip_ratio', 0.3)
eg.run(ppo_pytorch, num_cpu=args.cpu)
My Jupyter Attempt
I put all of the code from Custom_env.py in cell #1.
I then registered the environment in cell #2:
gym.register(
id='TEST-v1',
entry_point='__main__:Shell_Class',
max_episode_steps=200,
reward_threshold=25.0,)
based on this Q/A: Register gym environment that is defined inside a jupyter notebook cell
, I make the environment in cell #3:
gym.make('TEST-v1')
and get this non-descriptive output:
<TimeLimit<Shell_Class< TEST-v1 >>>
In cell #4, I tried to execute ExperimentGrid code directly within Jupyter like so:
from spinup.utils.run_utils import ExperimentGrid
from spinup import ppo_pytorch
import torch
num_runs=1
cpu=4
env_name='TEST-v1'
eg = ExperimentGrid(name='Jupyter-test')
eg.add('env_name', env_name, '', True)
eg.add('seed', [10*i for i in range(num_runs)])
eg.add('epochs', 500)
eg.add('steps_per_epoch', 4000)
eg.add('ac_kwargs:hidden_sizes', [(32, 32)], 'hid')
eg.add('ac_kwargs:activation', [torch.nn.ReLU], '')
eg.add('pi_lr', 0.001)
eg.add('clip_ratio', 0.3)
eg.run(ppo_pytorch, num_cpu=cpu)
The experiment starts up as usual but then runs into some kind of error:
> ================================================================================
ExperimentGrid [Jupyter-test] runs over parameters:
env_name []
TEST-v1
seed [see]
0
epochs [epo]
500
steps_per_epoch [ste]
4000
ac_kwargs:hidden_sizes [hid]
(32, 32)
ac_kwargs:activation []
ReLU
pi_lr [pi]
0.001
clip_ratio [cli]
0.3
Variants, counting seeds: 1
Variants, not counting seeds: 1
================================================================================
Preparing to run the following experiments...
Jupyter-test_test-v1
================================================================================
Launch delayed to give you a few seconds to review your experiments.
To customize or disable this behavior, change WAIT_BEFORE_LAUNCH in
spinup/user_config.py.
================================================================================
Running experiment:
Jupyter-test_test-v1
with kwargs:
{
"ac_kwargs": {
"activation": "ReLU",
"hidden_sizes": [
32,
32
]
},
"clip_ratio": 0.3,
"env_name": "TEST-v1",
"epochs": 500,
"pi_lr": 0.001,
"seed": 0,
"steps_per_epoch": 4000
}
================================================================================
There appears to have been an error in your experiment.
Check the traceback above to see what actually went wrong. The
traceback below, included for completeness (but probably not useful
for diagnosing the error), shows the stack leading up to the
experiment launch.
================================================================================
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-14-de843fd528cf> in <module>
15 eg.add('pi_lr', 0.001)
16 eg.add('clip_ratio', 0.3)
---> 17 eg.run(ppo_pytorch, num_cpu=cpu)
~/Downloads/spinningup/spinup/utils/run_utils.py in run(self, thunk, num_cpu, data_dir, datestamp)
544
545 call_experiment(exp_name, thunk_, num_cpu=num_cpu,
--> 546 data_dir=data_dir, datestamp=datestamp, **var)
547
548
~/Downloads/spinningup/spinup/utils/run_utils.py in call_experiment(exp_name, thunk, seed, num_cpu, data_dir, datestamp, **kwargs)
169 cmd = [sys.executable if sys.executable else 'python', entrypoint, encoded_thunk]
170 try:
--> 171 subprocess.check_call(cmd, env=os.environ)
172 except CalledProcessError:
173 err_msg = '\n'*3 + '='*DIV_LINE_WIDTH + '\n' + dedent("""
~/anaconda3/envs/spinningup/lib/python3.6/subprocess.py in check_call(*popenargs, **kwargs)
309 if cmd is None:
310 cmd = popenargs[0]
--> 311 raise CalledProcessError(retcode, cmd)
312 return 0
313
CLIPS version: 6.31 .
language: c++ clips C API .
I get a coredump file when execute ProfileInfoCommand after EnvRun.
Why is this error happening? Is there any usage error here? Or this is a bug?
The c++ code 1:
#define PROFILING_FUNCTIONS 1
// ...
EnvReset(clips);
// ...
EnvLoadFactsFromString(clips, facts.str().c_str(), -1);
// ...
EnvRun(clips, 100000);
ProfileInfoCommand(clips);
I know if PROFILING_FUNCTIONS is defined as 1, the EnvRun function will start profile automatically.So I use ProfileInfoCommand after EnvRun,but the coredump has occurred!
And I also tried using another method,but the process also generated a core dump(the same backtrace like the c++ code 1).
The c++ code 2:
EnvReset(clips);
Profile(clips, "constructs");
// ...
EnvLoadFactsFromString(clips, facts.str().c_str(), -1);
// ...
EnvRun(clips, max_iters);
Profile(clips, "off");
ProfileInfoCommand(clips);
The coredump file is following:
Core was generated by `/mnt/home/worker/project/ae-arbiter/dist/bin/arbiter-8003 --flagfile=flags.'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000bc6b80 in EnvRtnArgCount (theEnv=Cannot access memory at address 0x7f879c3f6af8
) at /mnt/home/worker/project/ae-arbiter/src/clips/argacces.cc:306
306 for (argPtr = EvaluationData(theEnv)->CurrentExpression->argList;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6.x86_64
(gdb) bt
#0 0x0000000000bc6b80 in EnvRtnArgCount (theEnv=0x7f85e6454f70) at /mnt/home/worker/project/ae-arbiter/src/clips/argacces.cc:306
#1 0x0000000000bc6bcd in EnvArgCountCheck (theEnv=0x7f85e6454f70, functionName=0xda1188 "profile", countRelation=2, expectedNumber=1) at /mnt/home/worker/project/ae-arbiter/src/clips/argacces.cc:337
#2 0x0000000000c40803 in ProfileInfoCommand (theEnv=0x7f85e6454f70) at /mnt/home/worker/project/ae-arbiter/src/clips/proflfun.cc:245
#3 0x0000000000b62d12 in arbiter::lib::ClipsModuleExecute (clips=0x7f85e6454f70, features=..., max_iters=100000, result_func=..., module_name=..., halt=#0x7f879c3f6fdc)
at /mnt/home/worker/project/ae-arbiter/src/lib/clips-utils.cc:357
...
...
Setting PROFILING_FUNCTIONS to 1 does not automatically profile code when EnvRun is called. It determines whether the profiling functions are included in the CLIPS executable. See Section 2.2 of the Advanced Programming Guide. The functions available for profiling from the CLIPS command are documented in Section 13.15, Profiling Commands, of the Basic Programming Guide. The ProfileInfoCommand can't be called directly. If you want to call a CLIPS command/function that isn't part of the API described in the Advanced Programming Guide, use the EnvEval API:
DATA_OBJECT returnValue;
EnvEval(theEnv,"(profile-info)",&returnValue);
as suggest by Dirk Eddelbuettel in this talk and this answer I tried to profile compiled R code using gperftools. Here is what I did.
I used Dirks profilingSmall.R as script that I want to profile. I repeat it here:
## R Extensions manual, section 3.2 'Profiling R for speed'
## 'N' reduced to 99 here
suppressMessages(library(MASS))
suppressMessages(library(boot))
storm.fm <- nls(Time ~ b*Viscosity/(Wt - c), stormer, start = c(b=29.401, c=2.2183))
st <- cbind(stormer, fit=fitted(storm.fm))
storm.bf <- function(rs, i) {
st$Time <- st$fit + rs[i]
tmp <- nls(Time ~ (b * Viscosity)/(Wt - c), st, start = coef(storm.fm))
tmp$m$getAllPars()
}
rs <- scale(resid(storm.fm), scale = FALSE) # remove the mean
Rprof("boot.out")
storm.boot <- boot(rs, storm.bf, R = 99) # pretty slow
Rprof(NULL)
To profile it I run the following script
LD_PRELOAD="/usr/lib/libprofiler.so.0"
\CPUPROFILE=sample.log \
Rscript profilingSmall.R
Then I tried to parse the log file using
pprof /usr/bin/R sample.log
This returned the following error
Using local file /usr/bin/R.
Using local file sample.log.
substr outside of string at /usr/local/bin/pprof line 3618.
Use of uninitialized value in string eq at /usr/local/bin/pprof line 3618.
substr outside of string at /usr/local/bin/pprof line 3620.
Use of uninitialized value in string eq at /usr/local/bin/pprof line 3620.
sample.log: header size >= 2**16
sample.log is empty. However, a bunch of sample.log_digit were created that contain information that looks reasonable.
I had the same problem, but realized my problem. I'd done:
export CPUPROFILE=test.prof
export LD_PRELOAD="/usr/local/lib/libprofiler.so"
testprog ...
pprof --web `which testprog` test.prof
If I stopped after running testprog the prof files wasn't empty but after pprof it was. pprof crashed with the substr error.
What I realized later was that by setting and exporting LD_PRELOAD that libprofiler.so was also loaded for pprof, overwriting test.prof.
You just need to ensure LD_PRELOAD is not set when you run pprof.
I'm using gperftools-2.5, and I also encountered the same problem:
[root#localhost ivrserver]# pprof --text ./IvrServer ivr.prof
Using local file ./IvrServer.
Using local file ivr.prof.
substr outside of string at /usr/local/bin/pprof line 3695.
Use of uninitialized value in string eq at /usr/local/bin/pprof line 3695.
substr outside of string at /usr/local/bin/pprof line 3697.
Use of uninitialized value in string eq at /usr/local/bin/pprof line 3697.
ivr.prof: header size >= 2**16
I found this is because the prof file (ivr.prof in my example) is empty.
everytime the profiler start and end, it will create a new prof file, you should use xxx.prof.0 xxx.prof.1 ... to get the right result
Using Spark 1.1
I have 2 datasets. One is very large and the other was reduced (using some 1:100 filtering) to much smaller scale. I need to reduce the large dataset to the same scale, by joining only those items from the smaller list with their corresponding counterparts in the larger list (those lists contain elements that have a mutual join field).
I am doing that using the following code:
The "if(joinKeys != null)" part is the relevant part
Smaller list is "joinKeys", larger list is "keyedEvents"
private static JavaRDD<ObjectNode> createOutputType(JavaRDD jsonsList, final String type, String outputPath,JavaPairRDD<String,String> joinKeys) {
outputPath = outputPath + "/" + type;
JavaRDD events = jsonsList.filter(new TypeFilter(type));
// This is in case we need to narrow the list to match some other list of ids... Recommendation List, for example... :)
if(joinKeys != null) {
JavaPairRDD<String,ObjectNode> keyedEvents = events.mapToPair(new KeyAdder("requestId"));
JavaRDD < ObjectNode > joinedEvents = joinKeys.join(keyedEvents).values().map(new PairToSecond());
events = joinedEvents;
}
JavaPairRDD<String,Iterable<ObjectNode>> groupedEvents = events.mapToPair(new KeyAdder("sliceKey")).groupByKey();
// Add convert jsons to strings and add "\n" at the end of each
JavaPairRDD<String, String> groupedStrings = groupedEvents.mapToPair(new JsonsToStrings());
groupedStrings.saveAsHadoopFile(outputPath, String.class, String.class, KeyBasedMultipleTextOutputFormat.class);
return events;
}
Thing is when running this job, I always get the same error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2757 in stage 13.0 failed 4 times, most recent failure: Lost task 2757.3 in stage 13.0 (TID 47681, hadoop-w-175.c.taboola-qa-01.internal): java.io.FileNotFoundException: /hadoop/spark/tmp/spark-local-20141201184944-ba09/36/shuffle_6_2757_2762 (Too many open files)
java.io.FileOutputStream.open(Native Method)
java.io.FileOutputStream.<init>(FileOutputStream.java:221)
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
I already increased my ulimits, by doing the following on all cluster machines:
echo "* soft nofile 900000" >> /etc/security/limits.conf
echo "root soft nofile 900000" >> /etc/security/limits.conf
echo "* hard nofile 990000" >> /etc/security/limits.conf
echo "root hard nofile 990000" >> /etc/security/limits.conf
echo "session required pam_limits.so" >> /etc/pam.d/common-session
echo "session required pam_limits.so" >> /etc/pam.d/common-session-noninteractive
But doesn't fix my problem...
The bdutil framework works in a way the user "hadoop" is the one running the job. The script that deploys the cluster, created a file /etc/security/limits.d/hadoop.conf that overrided the ulimit settings for "hadoop" user, which I wasn't aware of. By deleting this file, or alternatively setting the desired ulimits there, I was able to resolve the problem.
I'm a newbie trying to lean PIC, I downloaded MPLAB and MPLAB X IDE. I have done this around 100 times and looked over web enough before asking this question, but my code does not compile and it always fails.
Here is what I did:
Created a new project using the project wizard,
Edited the code,
Copied the 16F871.H library header in both folder (I created the project in) and added it to the header files in MPLAB IDE.
Here's my code:
// IFIN.C Tests an input
#include " 16F877A.h "
void main()
{
int x; // Declare variable
output_D(0); // Clear all outputs
while(1) //
{
x = input(PIN_C0); // Get input state
if(x = = 1)
output_high(PIN_D0); // Change output
}
}
But on compiling the code, I'm getting the following error:
Executing:
"C:\Program Files\PICC\Ccsc.exe" +FM "NEW.c" #__DEBUG=1 +ICD +DF +LN
+T +A +M +Z +Y=9 +EA #__16F877A=TRUE
*** Error 18 "NEW.c" Line 2(10,23): File can not be opened
Not in project "C:\Users\jatin\Desktop\DHAKKAN PIC\ 16F877A.h "
Not in "C:\Program Files\PICC\devices\ 16F877A.h "
Not in "C:\Program Files\PICC\drivers\ 16F877A.h "
*** Error 128 "NEW.c" Line 2(10,17): A #DEVICE required before this line
*** Error 12 "NEW.c" Line 6(9,10): Undefined identifier -- output_D
*** Error 12 "NEW.c" Line 9(10,11): Undefined identifier -- input
*** Error 51 "NEW.c" Line 10(8,9): A numeric expression must appear here
5 Errors, 0 Warnings. Build Failed. Halting build on first failure as requested.
BUILD FAILED: Mon Jul 08 15:09:17 2013
I would be grateful if you could help me.
The error with respect to the header file not being found is that you have extra space in the header name. In other words, this:
#include " 16F877A.h "
should be:
#include "16F877A.h"
The other errors are probably a result of this and would go away once the header is properly included.
Note that the compiler literally takes the string inside "" or <> as file name for the header file and doesn't trim whitespaces for you.