real-time printing to console with R in jupyter - r

Using an R GUI or just R from a command line, this code results in integers being printed 0.2 seconds apart.
In contrast when I use R in a jupyter notebook, all of the printing happens only after the loop is complete.
for(x in 1:10){
print(x)
Sys.sleep(0.2)
}
I tried to force real-time printing inside of Jupyter with
for(x in 1:10){
print(x)
flush.console()
Sys.sleep(0.2)
}
...to no effect. The results were the same -- printing from within a for loop in jupyter always seems to be delayed until after the loop.
Is there a way to ensure the notebook outputs the results of print statements in a real time way?

Currently, the only way to "trigger" processing of printed output is either using message("text") (instead of print("text") or cat("text")) or not writing it into the loop but in a statement of it's own.
The underlying problem is in https://github.com/IRkernel/IRkernel/issues/3 and a proposed fix is in https://github.com/hadley/evaluate/pull/62 -> It needs a change in evaluate to allow flush.console() and friends to work. The gist of the problem: we use evaluate to execute the code and evaluate process the output one statement at a time and handles the output after the statement completes. Unfortunately, in this case, a for-loop is just one statement (as is everything in {...} blocks), so printed output only appears in the client after the for-loop is done.
A workaround is using the IRdisplay package and the display_...() functions instead of print()/cat() (or plots...). But this needs full control over the printed stuff: It's either everything is using print (and it gets delayed until after the complete statement is finished) or nothing should print (or plot) in that statement. If a called functions prints something, the output would be in the wrong order ({print("a"); display_text("b"); print("c")} would end up as b a c). Using capture.output() might get you around this limitation, if you really have to... If you use plots, there are currently no workarounds apart from writing the plot to disc and sending it via display_png(..) and friends.

This is not an issue any more. The following code works in jupterLab now
for(x in 1:10){
print(x)
flush.console()
Sys.sleep(0.2)
}

R version of the answer for Python Flush output in for loop in Jupyter notebook works for me:
cat(paste0('Your text', '\r'))
Apparently \r will trigger a flush.

Related

Restart R session without interrupting the for loop

In my for loop, I need to remove RAM. So I delete some objects with rm() command. Then, I do gc() but the RAM still the same
So I use .rs.restartR() instead of gc() and it works: a sufficient part of my RAM is removed after the restart of the R session.
My problem is the for loop which is interrupted after the R restart. Do you have an idea to automatically go on the for loop after the .rs.restartR() command ?
I just stumbled across this post as I'm having a similar problem with rm() not clearing memory as expected. Like you, if I kill the script, remove everything using rm(list=ls(all.names=TRUE)) and restart, the script takes longer than it did initially. However, restarting the session using .rs.restartR() then sourcing again works as expected. As you say, there is no way to 'refresh' the session while inside a loop.
My solution was to write a simple bash script which calls my .r file.
Say you have a loop in R that runs from 1 to 3 and you want to restart the session after each iteration. My bash script 'runR.sh' could read as follows:
#!/bin/bash
for i in {1..3}
do
echo "Rscript myRcode.r $i" #check call to r script is as expected
Rscript myRcode.r $i
done
Then at the top of 'myRcode.r':
args <- commandArgs()
print(args) #list the command line arguments.
myvar <- as.numeric(args[6])
and remove your for (myvar in...){}, keeping just the contents of the loop.
You'll see from print(args) that your input from your shell script is the 6th element of the array, hence args[6] in the following line when assigning your variable. If you're passing in a string, e.g. a file name, then you don't need as.numeric of course.
Running ./runR.sh will then call your script and hopefully solve your memory problem. The only minor issue is that you have to reload your packages each time, unlike when using .rs.restartR(), and may have to repeat other bits that ordinarily would only be run once.
It works in my case, I would be interested to hear from other more seasoned R/bash users whether there are any issues with this solution...
By saving the iteration as an external file, and writing an rscript which calls itself, the session can be restarted within a for loop from within rstudio. This example requires the following steps.
#Save an the iteration as a separate .RData file in the working directory.
iter <- 1
save(iter, file="iter.RData")
Create a script which calls itself for a certain number of iterations. Save the following script as "test_script.R"
###load iteration
library(rstudioapi)
load("iter.RData")
###insert function here.
time_now <- Sys.time()
###save output of function to a file.
save(time_now, file=paste0("time_", iter, ".Rdata"))
###update iteration
iter <- iter+1
save(iter, file="iter.RData")
###restart session calling the script again
if(iter < 5){
restartSession(command='source("test_script.R")')
}
Do you have an idea to automatically go on the for loop after the .rs.restartR() command ?
It is not possible.
Okay, you could configure your R system to do something like this, but it sounds like a bad idea. I'm not really sure if you want to restart the for loop from the beginning or pick it up where left off. (I'm also very confused that you seem to have been able to enter commands in the R console while a for loop was executing. I think there's more than you are not telling us.)
You can use your rprofile.site file to automatically run commands when R starts. You could set it up to automatically run your for loop code whenever R starts. But this seems like a bad idea. I think you should find a different sort of fix for your problem.
Some of the things you could do to help the situation: have your for loop write output for each iteration to disk and also write some sort of log to disk so you know where you left off. Maybe write a function around your for loop that takes an argument of where to start, so that you can "jump in" at any point.
With this approach, rather than "restarting R and automatically picking up the loop", a better bet would be to use Rscript (or similar) and use R or the command line to sequentially run each iteration (or batch of iterations) in its own R session.
The best fix would be to solve the memory issue without restarting. There are several questions on SO about memory management - try the answers out and if they don't work, make a reproducible example and ask a new question.
You can make your script recursive by sourcing itself after restarting session.
Make sure the script will take into account the initial status of the loop. So you might have to save the current status of the loop in a .rds file before restarting session. Then call the .rds file from inside the loop after restarting session. This would help you start the loop where it was before restarting r session.
I just found out about this command 'restartSession'. I'm using it because I was also running into memory consumption issues as the garbage collector will not give back the RAM to the OS (Linux).
library(rstudioapi)
restartSession(command = "print('x')")
An approach independent from Rstudio:
If you want to run this in Rstudio, don't use R console, but terminal, otherwise use rstudioapi::restartSession() as in other answers - not recommended (crashes) -.
create iterator and load script (in system terminal would be:)
R -e 'saveRDS(1,"i.rds"); source("Script.R")'
Script.R file:
# read files and iterator
i<-readRDS("i.rds")
print(i)
# open process id of previous loop to kill it
tryCatch(pid <- readRDS(file="pid.rds"), error=function(e){NA} )
if (exists("pid")){
library(tools)
tools::pskill(pid, SIGKILL)
}
# update objects and iterator
i <- i+1
# process
pid <- Sys.getpid()
# save files and iterator
saveRDS(i, file="i.rds")
# process ID to close it in next loop
saveRDS(pid, file="pid.rds")
### restart session calling the script again
if(i <= 20 ) {
print(paste("Processing of", i-1,"ended, restarting") )
assign('.Last', function() {system('Rscript Script.R')} )
q(save = 'no')
}

Analysis by subset does not work [duplicate]

I tried using a for loop to print out a few rows. here is the code.
Weird thing is that it doesn't work for head() function. It works if I replaced head() with print().
kw_id=c('a','b')
keyword_text=data.frame(col=c('a','b'), col2=c(1,2), row.names=(c('r1','r2')))
for (i in 1:2) {
plot_data<-subset(keyword_text,col==kw_id[i])
print(plot_data)
head(plot_data)
}
Could someone help? I suspect it has something to do with head() function.
This is a relatively common class of problem that newcomers to R run into. The issue here is that R serves two mistresses: interactive console work and "true programming".
When you type a command at the console that returns a value, the console automatically calls a print method in order to display the results. When running a script, this doesn't happen unless you tell it to.
So if you changed it to print(head(plot_data)) it should work.
These are discussed in FAQ 7.16 and 7.22
Addendum lifted from the comments:
As Josh points out, copy+pasting the for loop directly to the console also fails to print any output. What's going on in that case is that for loops (like most everything in R) is actually a function, and it's return value (NULL) is returned invisibly, which means no printing. (This is mentioned in ?Control.)

Make an R script write lines to the console faster (Rgui)

I've written quite a large R script (1000+ lines). Currently there is a rm(list=ls()) statement at the top of the script, as I need to test how it runs cleanly.
I run the code by ctrl + A, ctrl + R
The problem is is that this seems to take a long time in the Rgui to write each line to the console before running it. I feel R should be able to write to the the console faster than this and was wondering if there is a faster way to run a script.
(ie hide the lines written to the console and just run the script)
Best way is to simply source the document. If you have it saved then just type source("FullPathOfmyfile.R") and it will run without printing the commands, it will only print the output and print statements. Alternatively you can set echo = FALSE.

Printing from mclapply in R Studio

I am using mclapply from within RStudio and would like to have an output to the console from each process but this seems to be suppressed somehow (as mentioned for example here: Is mclapply guaranteed to return its results in order?).
How could I get R Studio to print something like
x <- mclapply(1:20, function(i) cat(i, "\n"))
to the console?
I've tried print(), cat(), write() but they all seem not to work. I also tried to set mc.silent = FALSE explicitly without an effect.
Parallel processing with GUI's is problematic. I write a lot of parallel code and it's constantly crashing my colleague's computer because he insists on using Rstudio instead of console R.
From what I read, RStudio "does not propagate the output of forked processes to the RStudio console. If you are doing this, it is best to start R via a shell."
This makes sense as a workaround for the RStudio people because parallel processing typically breaks GUI's when people try to output to the GUI from a bunch of different processes. It works in the console (albeit often not in order) but parallel processing gurus will pinch their noses when they hear about any I/O from a forked thread.
If you must have output from forked threads, save them in a string and return it. Then collect and output from the main process. Or just use a console for your parallel runs. What I tell my colleague is to do all his debugging and development in RStudio using lapply(), then switch to a console for the real run.
Here's a workaround which uses shell echo to print to R console in Rstudio:
#' Function which prints a message using shell echo; useful for printing messages from inside mclapply when running in Rstudio
message_parallel <- function(...){
system(sprintf('echo "\n%s\n"', paste0(..., collapse="")))
}
Just expanding a little on the solution used by the asker, i.e. writing to a file to check progress:
write.file = '/temp_output/R_progress'
time1 = proc.time()[3]
outstuff = unlist(mclapply(1:1000000, function(i){
if (i %% 1000 == 0 ){
file.create(write.file)
fileConn<-file(write.file)
writeLines(paste0(i,'/',nrow(loc),' ',(i/nrow(loc)*100)), fileConn)
close(fileConn)
}
#do your stuff here
}, mc.cores=6))
print(proc.time()[3] - time1)
And then you can monitor from a console with
tail -c +0 -f '/temp_output/R_progress'

Switch R script from non-interactive to interactive

I've an R script, that takes commandline arguments, where the top line is:
#!/usr/bin/Rscript --slave
I wanted to interrupt execution in a function (so I can interactively use the data variables that have been loaded by that point to work out the next bit of code I need to write). I added this inside the function in question:
browser()
but it gets ignored. A bit of searching suggests it might be because the program is running in non-interactive mode. But even more searching has not tracked down how I switch the script out non-interactive mode so that browser() will work. Something like a browser_yes_I_really_mean_it() function.
P.S. I want to avoid altering the rest of the script if at all possible. My current approach is to copy and paste the code chunks, needed to prepare the data, into an interactive session; but as the script gets more and more complex this is getting more and more unreasonable.
UPDATE: for anyone else with the same question, it appears the answer to the actual question is that it is impossible. Once you start R in a non-interactive mode the die is cast. The given answers are therefore workarounds: either you hack your code (remembering to unhack it afterwards), or you refactor to make debugging easier. (This comment is not intended as a criticism of the answers; the suggested refactoring makes the code cleaner anyway.)
Can you just fire up R and source the file instead?
R
source("script.R")
Following mdsumner's answer, I edited my script like this:
if(!exists("argv")){
argv=commandArgs(TRUE)
if(length(argv)!=4)usage_and_exit()
}else{
if(length(argv)!=4){
stop("Must set argv as a 4 element vector. E.g. argv=c(...)")
}
}
Then no other change was needed, and I was able to do:
R
> argv=c('a','b','c','d')
> source("script.R")
In addition to the previous answer, I'd create a toplevel function (e.g. doStuff) which performs the analysis you want to perform in batch. The function takes the cmd line options as input. In the batch script you source the script that contains this function and call it. In this way you can easily run the function in interactive mode and use e.g. browser().
In some cases, the suggested solution (workaround) may not work - for example, when the R code needs to be run as a part of an existing bash script. For those cases, I suggest to write in your R script into the bash script using here document:
#!/bin/bash
R --interactive << EOT
# R code starts here
argv=c('a','b','c','d')
print(interactive())
# Rest of script contents
quit("no")
# R code ends here
EOT
This way, print(interactive()) above will yield TRUE.
Sidenote: Make sure to avoid the $ character in your R code, as this would not be processed correctly - for example, retrieve a column from a data.frame() by using df[["X1"]] instead of df$X1.

Resources