Julia shell mode in a .jl file - julia

In REPL mode, Julia lets you type a semicolon and run shell commands, i.e.
;
cd ~
And then to return to Julian REPL
Is there a way to do something similar in a .jl file? The closest I found was run(…) and that has many caveats. This is a Linux environment, so I'm not concerned about the caveats of shell mode on Windows machines.
The broader topic of interest is in doing this for other REPL modes, like the R one provided by using RCall

As you mentioned, the default way to do is via the run command. If you have not already, check out the docs on this https://docs.julialang.org/en/v1/manual/running-external-programs/#Running-External-Programs which go into some of the caveats.
I am not sure I follow what you are getting at with RCall but it may perhaps be worth opening a separate question for that.

You can find the code for this at https://github.com/JuliaLang/julia/tree/master/stdlib/REPL/test.
Seems there is no API, just lots of typing.
Here is a minimal working example (the codes are mostly copied from different places from the folder above):
using REPL
mutable struct FakeTerminal <: REPL.Terminals.UnixTerminal
in_stream::Base.IO
out_stream::Base.IO
err_stream::Base.IO
hascolor::Bool
raw::Bool
FakeTerminal(stdin,stdout,stderr,hascolor=true) =
new(stdin,stdout,stderr,hascolor,false)
end
REPL.Terminals.hascolor(t::FakeTerminal) = t.hascolor
REPL.Terminals.raw!(t::FakeTerminal, raw::Bool) = t.raw = raw
REPL.Terminals.size(t::FakeTerminal) = (24, 80)
input = Pipe()
output = Pipe()
err = Pipe()
Base.link_pipe!(input, reader_supports_async=true, writer_supports_async=true)
Base.link_pipe!(output, reader_supports_async=true, writer_supports_async=true)
Base.link_pipe!(err, reader_supports_async=true, writer_supports_async=true)
repl = REPL.LineEditREPL(FakeTerminal(input.out, output.in, err.in, false), false)
repltask = #async REPL.run_repl(repl)
Now you can do:
julia> println(input,";ls -la *.jld2")
-rw-r--r-- 1 pszufe 197121 5506 Jul 5 2020 file.jld2
-rw-r--r-- 1 pszufe 197121 5506 Jul 5 2020 myfile.jld2

Related

system2("bash", "ls") -> cannot execute binary file

Anyone got an idea why
system2("bash", "ls")
would result in (both in R-Gui and RStudio Console)
/usr/bin/ls: /usr/bin/ls: cannot execute binary file
while other shells work w/o issues (see example below) and the "bash" in the RStudio Terminal also works w/o problems. AND if there is a remedy for that?
Working system2 example:
system2("powershell", "dir")
Running R-3.6.3 on Windows 10 here ... with "Git bash" executing as "bash".
PS: just in case someone wonders - I also tried the full path ie /usr/bin/ls -> same result
Ok - this does the trick
system2("bash", input = "ls")
so instead of args = ... as with the powershell command one needs (or more interestingly sometimes 'can') use the input = ... parameter with bash
OR as mentioned in the comments by #oguzismail this will also work
system2("bash", "-c ls")
as well as pointed out by #Christoph
system2("ls")
also works sometimes (ie in some RStudio[?] setups it will not return results in some it will).
But I have come to notice that different combinations of R-versions with different RStudio versions [or more probably Locales] will not always behave consistently of course also depending on the availability of bash commands like ls (more generally /usr/bin) being on the PATH or not.
So choose what suits u best!

Julia on Windows. How to pass command line options to an executable file

I wish to call an executable file from Julia via Base.run (documented here) and pass command line options to that executable, but I can't figure out how to do that. In my specific example the executable is Notepad++ and the command line options are
-alwaysOnTop -nosession
This example code works, but doesn't pass the command line options:
function open_file_in_notepadpp()
exepath = "C:/Program Files (x86)/notepad++/notepad++.exe" #Default location on 64 bit Windows
command_line_options = "-alwaysOnTop -nosession "
filetoopen = "c:/temp/foo.txt"
Base.run(`$exepath $filetoopen`, wait = false)
end
I've tried incorporating command_line_options a fair number of ways using backticks, double quotes etc. to no avail, so for example neither of the lines below work:
Base.run(`$exepath $filetoopen`, `$command_line_options`,wait = false)
Base.run(`$exepath $command_line_options $filetoopen`,wait = false)
In the Windows Command Prompt the following works correctly:
"C:/Program Files (x86)/notepad++/notepad++.exe" -alwaysOnTop -nosession "c:/temp/foo.txt"
Could someone explain what I'm missing?
If you substitute a string that contains spaces to a command it will get quoted. Hence, your command line arguments will be quoted and you get
julia> `$exepath $filetoopen $command_line_options`
`'C:/Program Files (x86)/notepad++/notepad++.exe' c:/temp/foo.txt '-alwaysOnTop -nosession '`
I guess what you really need is
julia> command_line_options = ["-alwaysOnTop", "-nosession"]
2-element Array{String,1}:
"-alwaysOnTop"
"-nosession"
julia> `$exepath $filetoopen $command_line_options`
`'C:/Program Files (x86)/notepad++/notepad++.exe' c:/temp/foo.txt -alwaysOnTop -nosession`
Running the latter with run should work. Unfortunately I can't test it on my machine.
crstnbr's answer was correct, but he was unable to test on his machine. Here is the corrected code:
function open_file_in_notepadpp()
exepath = "C:/Program Files (x86)/notepad++/notepad++.exe" #Location if one follows the defaults in the notepad++ installer on 64 bit Wndows
command_line_options = ["-alwaysOnTop", "-nosession"] #Use an array to prevent the options being quoted
filetoopen = "c:/temp/foo.txt"
Base.run(`$exepath $filetoopen $command_line_options`,wait = false)
end

How to run multiple julia files parallel?

I have a pretty simple question but I can't seem to find an answer anywhere. I want to run two .jl files parallel using the Julia Terminal.
I tried include("file1.jl" & "file2.jl") and include("file1.jl") & include("file2.jl") but this doesn't work.
I'm not sure exactly what you want to do but if you wanted to run these two files on two different workers from the julia terminal you could for e.g.
addprocs(1) # add a worker
pmap(include,["file1.jl", "file2.jl"]) # apply include to each element
# of the array in parallel
But I'm pretty sure there will be a better way of doing whatever you want to accomplish.
While you can probably wrangle your code into the Julia parallel computing paradigms, it seems like the simplest solution is to execute your Julia scripts from the command line. Here I assume that you are comfortable allowing your CPU to handle the task scheduling, which may or may not result in parallel execution.
What follows below is a skeleton pipeline to get you started. Replace task.jl with your file1.jl, file2.jl, etc.
task.jl
println("running like a cheetah")
run_script.sh
echo `date`
julia task.jl
julia task.jl
echo `date`
run_script_parallel.sh
echo `date`
julia task.jl &
julia task.jl &
wait # do not return before background tasks are complete
echo `date`
From the command line, ensure that your BASH scripts are executable:
chmod +rwx run_script.sh run_script_parallel.sh
Try running the scripts now. Note that my example Julia script task.jl returns almost immediately, so this particular comparison is a little silly:
./run_script.sh
./run_script_parallel.sh
My output
Thu Jan 5 14:24:57 PST 2017
running like a cheetah
running like a cheetah
Thu Jan 5 14:24:57 PST 2017
Thu Jan 5 14:25:05 PST 2017
running like a cheetahrunning like a cheetah
Thu Jan 5 14:25:06 PST 2017
The first output orders the print statements in a clean serial order. But observe in the second case that the text is running together. That is common behavior for parallel print statements.

Opening a new instance of R and sourcing a script within that instance

Background/Motivation:
I am running a bioinformatics pipeline that, if executed from beginning to end linearly takes several days to finish. Fortunately, some of the tasks don't depend upon each other so they can be performed individually. For example, Task 2, 3, and 4 all depend upon the output from Task 1, but do not need information from each other. Task 5 uses the output of 2, 3, and 4 as input.
I'm trying to write a script that will open new instances of R for each of the three tasks and run them simultaneously. Once all three are complete I can continue with the remaining pipeline.
What I've done in the past, for more linear workflows, is have one "master" script that sources (source()) each task's subscript in turn.
I've scoured SO and google and haven't been able to find a solution for this particular problem. Hopefully you guys can help.
From within R, you can run system() to invoke commands within a terminal and open to open a file. For example, the following will open a new terminal instance:
system("open -a Terminal .",wait=FALSE)
Similarly, I can start a new r session by using
system("open -a r .")
What I can't figure out for the life of me is how to set the "input" argument so that it sources one of my scripts. For example, I would expect the following to open a new terminal instance, call r within the new instance, and then source the script.
system("open -a Terminal .",wait=FALSE,input=paste0("r; source(\"/path/to/script/M_01-A.R\",verbose=TRUE,max.deparse.length=Inf)"))
Answering my own question in the event someone else is interested down the road.
After a couple of days of working on this, I think the best way to carry out this workflow is to not limit myself to working just in R. Writing a bash script offers more flexibility and is probably a more direct solution. The following example was suggested to me on another website.
#!/bin/bash
# Run task 1
Rscript Task1.R
# now run the three jobs that use Task1's output
# we can fork these using '&' to run in the background in parallel
Rscript Task2.R &
Rscript Task3.R &
Rscript Task4.R &
# wait until background processes have finished
wait %1 %2 %3
Rscript Task5.R
You might be interested in the future package (I'm the author). It allows you to write your code as:
library("future")
v1 %<-% task1(args_1)
v2 %<-% task2(v1, args_2)
v3 %<-% task3(v1, args_3)
v4 %<-% task4(v1, args_4)
v5 %<-% task5(v2, v3, v4, args_5)
Each of those v %<-% expr statements creates a future based on the R expression expr (and all of it's dependencies) and assigns it to a promise v. It is only when v is used, it will block and wait for the value v to be available.
How and where these futures are resolved is decided by the user of the above code. For instance, by specifying:
library("future")
plan(multiprocess)
at the top, then the futures (= the different tasks) are resolved in parallel on your local machine. If you use,
plan(cluster, workers = c("n1", "n3", "n3", "n5"))
they're resolved on those for machine (where n3 accepts two concurrent jobs).
This works on all operating systems (including Windows).
If you have access to a HPC compute with schedulers such as Slurm, SGE, and TORQUE / PBS, you can use the future.BatchJobs package, e.g.
plan(future.BatchJobs::batchjobs_torque)
PS. One reason for creating future was to do large-scale Bioinformatics in parallel / distributed.

Run R script in parallel sessions in the background

I have a script test.R that takes arguments arg1, arg2 and outputs a arg1-arg2.csv file.
I would like to run the test.R in 6 parallel sessions (i am on a 6 core CPU) and in the background. How can I do it?
I am on linux
I suggest using the doParallel backend for the foreach package. The foreach package provides a nice syntax to write loops and takes care of combining the results. doParallel connects it to the parallel package included since R 2.14. On other setups (older versions of R, clusters, whatever) you could simply change the backend without touching any of your foreach loops. The foreach package in particular has excellent documentation, so it is really easy to use.
If you are going to write the results to individual files, then the result-combining features of foreach won't be of much use to you. So people might argue that direct use of parallel would be better suited to your application. Personally I find the way foreach expresses looping concepts much easier to use.
You did not provide a reproducible example so I am making one up. As you are on Linux, I am also swicthing to littler which was after all writtten for the very purpose of scripting with R.
#!/usr/bin/env r
#
# a simple example to install one or more packages
if (is.null(argv) | length(argv) != 2) {
cat("Usage: myscript.r arg1 arg2\n")
q()
}
filename <- sprintf("%s-%s.csv", argv[1], argv[2])
Sys.sleep(60) # do some real work here instead
write.csv(matrix(rnorm(9), 3, 3), file=filename)
and you can then lauch this either from the command-line as I do here, or from another (shell) script. The key is the & at the end to send it in the background:
edd#max:/tmp/tempdir$ ../myscript.r a b &
[1] 19575
edd#max:/tmp/tempdir$ ../myscript.r c d &
[2] 19590
edd#max:/tmp/tempdir$ ../myscript.r e f &
[3] 19607
edd#max:/tmp/tempdir$
The [$n] indicates how process how been launched in the background, the number that follows is the process id which you can use to monitor or kill. After a little while we get the results:
edd#max:/tmp/tempdir$
[1] Done ../myscript.r a b
[2]- Done ../myscript.r c d
[3]+ Done ../myscript.r e f
edd#max:/tmp/tempdir$ ls -ltr
total 12
-rw-rw-r-- 1 edd edd 192 Jun 24 09:39 a-b.csv
-rw-rw-r-- 1 edd edd 193 Jun 24 09:40 c-d.csv
-rw-rw-r-- 1 edd edd 193 Jun 24 09:40 e-f.csv
edd#max:/tmp/tempdir$
You may want to read up on Unix shells to learn more about &m the fg and bg background s etc.
Lastly, all this can a) also be done with Rscript though picking arguments is slightly different and b) there are CRAN packages getopt and optparse to facilitate working with command-line arguments.
The state of art would be to use the parallel package, but when I am lazy, I simply start 6 batch (cmd, assuming Windows) files with rscript.
You can set parameters in the cmd-file
SET ARG1="myfile"
rscript rest.r
and read it from
Sys.getenv("ARG")
Using 6 batch files, I can also append multiple runs in one batch to be sure that the cores are always busy.

Resources