How to append to an existing file in R without overwriting it? - r

I would like to write to a file and then append it several times in a loop (on a windows machine). After each time I append it, I want to close the connection because I want the file to sink to a dropbox account so I can open it on other computers, while the code is running, to check the log file's status (note this condition makes this question different from any question asked on SO about sink, writeLines, write, cat, etc). I've tried
#set up writing
logFile = file("log_file.txt")
write("This is a log file for ... ", file=logFile, append=FALSE)
for(i in 1:10){
write(i, file=logFile, append=TRUE)
}
I've also tried sink(file=logFile,append=TRUE); print(i); sink(); in the loop and also cat. Neither option works. The file only displays i=10, the last iteration of the loop. I noticed the following sentence in the documentation for write.
"if TRUE the data x are appended to the connection."
Does the above mean that it won't append to an existing file.

The following seems to work with cat because it doesn't need a file connection:
#set up writing
logFile = "log_file.txt"
cat("This is a log file for ... ", file=logFile, append=FALSE, sep = "\n")
for(i in 1:10){
cat(i, file=logFile, append=TRUE, sep = "\n")
}
The output would look like so it does append each value:
This is a log file for ...
1
2
3
4
5
6
7
8
9
10
Which I think is what you want. If you are on a mac or using linux you can keep track of progress in the file using:
tail -f log_file.txt
I am not sure how this would work with dropbox however. Can you login to the computer running the code (e.g., on mac or linux)?

what about explicitly closing the file after every iteration?
#set up writing
file.text <- "log_file.txt"
logFile = file(file.txt)
write("This is a log file for ... ", file=logFile, append=FALSE)
close(logFile)
for(i in 1:10){
logFile <- file(file.text)
write(i, file=logFile, append=TRUE)
close(logFile)
}

Related

How can I pass the names of a list of files from bash to an R program?

I have a long list of files with names like: file-typeX-sectorY.tsv, where X and Y get values from 0-100. I process each of those files with an R program, but read them one by one like this:
data <- read.table(file='my_info.tsv', sep = '\t', header = TRUE, fill = TRUE)
it is impractical. I want to build a bash program that does something like
#!/bin/bash
for i in {0..100..1}
do
for j in {1..100..1)
do
Rscript program.R < file-type$i-sector$j.tsv
done
done
My problem is not with the bash script but with the R program. How can I receive the files one by one? I have googled and tried instructions like:
args <- commandArgs(TRUE)
either
data <- commandArgs(trailingOnly = TRUE)
but I can't find the way. Could you please help me?
At the simplest level your problem may be the (possible accidental ?) redirect you have -- so remove the <.
Then a mininmal R 'program' to take a command-line argument and do something with it would be
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
stopifnot("require at least one arg" = length(args) > 0)
cat("We were called with '", args[1], "'\n", sep="")
We use a 'shebang' line and make it chmod 0755 basicScript.R to be runnable. The your shell double loop, reduced here (and correcting one typo) becomes
#!/bin/bash
for i in {0..2..1}; do
for j in {1..2..1}; do
./basicScript.R file-type${i}-sector${j}.tsv
done
done
and this works as we hope with the inner program reflecting the argument:
$ ./basicCaller.sh
We were called with 'file-type0-sector1.tsv'
We were called with 'file-type0-sector2.tsv'
We were called with 'file-type1-sector1.tsv'
We were called with 'file-type1-sector2.tsv'
We were called with 'file-type2-sector1.tsv'
We were called with 'file-type2-sector2.tsv'
$
Of course, this is horribly inefficient as you have N x M external processes. The two outer loops could be written in R, and instead of calling the script you would call your script-turned-function.

Ask for user multiple-line input during R-script execution [duplicate]

I am trying to use
var <- as.numeric(readline(prompt="Enter a number: "))
and later use this in a calculation.
It works fine when running in RStudio but I need to be able to pass this input from the command line in Windows 10
I am using a batch file with a single line
Rscript.exe "C:\My Files\R_scripts\my_script.R"
When it gets to the user input part it freezes and it doesn't provide expected output.
From the documentation of readline():
This can only be used in an interactive session. [...] In non-interactive use the result is as if the response was RETURN and the value is "".
For non-interactive use - when calling R from the command line - I think you've got two options:
Use readLines(con = "stdin", n = 1) to read user input from the terminal.
Use commandArgs(trailingOnly = TRUE) to supply the input as an argument from the command line when calling the script instead.
Under is more information.
1. Using readLines()
readLines() looks very similar to readline() which you're using, but is meant to read files line by line. If we instead of a file points it to the standard input (con = "stdin") it will read user input from the terminal. We set n = 1 so that it stops reading from the command line when you press Enter (that is, it only read one line).
Example
Use readLines() in a R-script:
# some-r-file.R
# This is our prompt, since readLines doesn't provide one
cat("Please write something: ")
args <- readLines(con = "stdin", n = 1)
writeLines(args[[1]], "output.txt")
Call the script:
Rscript.exe "some-r-file.R"
It will now ask you for your input. Here is a screen capture from PowerShell, where I supplied "Any text!".
Then the output.txt will contain:
Any text!
2. UsingcommandArgs()
When calling an Rscript.exe from the terminal, you can add extra arguments. With commandArgs() you can capture these arguments and use them in your code.
Example:
Use commandArgs() in a R-script:
# some-r-file.R
args <- commandArgs(trailingOnly = TRUE)
writeLines(args[[1]], "output.txt")
Call the script:
Rscript.exe "some-r-file.R" "Any text!"
Then the output.txt will contain:
Any text!

How do I use the shell function (Windows OS) in R to write data to a SQLite3 database?

I have a lengthy bit of code that basically loops through a series of functions for numerous sites. The output data for each site is appended and saved to a .sql file within the loop.
Once the loop is done, I want to insert the data from the .sql file into a pre-constructed sqlite database.
On Linux my working code looks like this:
settings <- c("PRAGMA cache_size = 400000;","PRAGMA synchronous = 1;","PRAGMA locking_mode = EXCLUSIVE;","PRAGMA temp_store = MEMORY;","PRAGMA auto_vacuum = NONE;")
command1 <- paste(paste(settings,collapse="\n"),"BEGIN;",paste(".read ", (theFile.sql)),sep=""),"COMMIT;",sep="\n")
command2 <- paste(" | sqlite3 ", shQuote(name.OutputDB))
sqlCommand1 <- paste("echo ",shQuote(command1),command2)
system(sqlCommand1)
On Windows I can get this to work, but I lose the PRAGMA settings I want:
simplecommand <- paste0('cat ', (normalizePath(theFile.sql)), " | sqlite3 ", normalizePath((name.OutputDB)))
shell(simplecommand)
I have tried these following options on Windows that don't work:
command1 <- paste(paste(settings,collapse="\n"),"BEGIN;",paste(".read ", (theFile.sql)),sep=""),"COMMIT;",sep="\n")
command3 <- paste("sqlite3\n.open", name.OutputDB)
command4 <- command4 <- paste0(".open ", name.OutputDB)
sqlCommand2 <- paste("echo ",shQuote(paste(command3,command1,sep="\n")))
shell(sqlCommand2)#Does echo work on Windows? Returns '"sqlite3'
sqlCommand3 <- shQuote(paste(command3,command1,sep="\n"))
shell(sqlCommand3)#Only seems to run the first line of code and open the database
sqlCommand4 <- shQuote(paste(command4,command1,sep="\n"))
shell(sqlCommand4,shell='c:/SQLIte/sqlite3.exe') # No error message, but no results written
So, what am I doing wrong here? Am I able to write multiple lines of SQL code using the shell command?
Thank you.

R - Connect Scripts via Pipes

I have a number of R scripts that I would like to chain together using a UNIX-style pipeline. Each script would take as input a data frame and provide a data frame as output. For example, I am imagining something like this that would run in R's batch mode.
cat raw-input.Rds | step1.R | step2.R | step3.R | step4.R > result.Rds
Any thoughts on how this could be done?
Writing executable scripts is not the hard part, what is tricky is how to make the scripts read from files and/or pipes. I wrote a somewhat general function here: https://stackoverflow.com/a/15785789/1201032
Here is an example where the I/O takes the form of csv files:
Your step?.R files should look like this:
#!/usr/bin/Rscript
OpenRead <- function(arg) {
if (arg %in% c("-", "/dev/stdin")) {
file("stdin", open = "r")
} else if (grepl("^/dev/fd/", arg)) {
fifo(arg, open = "r")
} else {
file(arg, open = "r")
}
}
args <- commandArgs(TRUE)
file <- args[1]
fh.in <- OpenRead(file)
df.in <- read.csv(fh.in)
close(fh.in)
# do something
df.out <- df.in
# print output
write.csv(df.out, file = stdout(), row.names = FALSE, quote = FALSE)
and your csv input file should look like:
col1,col2
a,1
b,2
Now this should work:
cat in.csv | ./step1.R - | ./step2.R -
The - are annoying but necessary. Also make sure to run something like chmod +x ./step?.R to make your scripts executables. Finally, you could store them (and without extension) inside a directory that you add to your PATH, so you will be able to run it like this:
cat in.csv | step1 - | step2 -
Why on earth you want to cram your workflow into pipes when you have the whole R environment available is beyond me.
Make a main.r containing the following:
source("step1.r")
source("step2.r")
source("step3.r")
source("step4.r")
That's it. You don't have to convert the output of each step into a serialised format; instead you can just leave all your R objects (datasets, fitted models, predicted values, lattice/ggplot graphics, etc) as they are, ready for the next step to process. If memory is a problem, you can rm any unneeded objects at the end of each step; alternatively, each step can work with an environment which it deletes when done, first exporting any required objects to the global environment.
If modular code is desired, you can recast your workflow as follows. Encapsulate the work done by each file into one or more functions. Then call these functions in your main.r with the appropriate arguments.
source("step1.r") # defines step1_read_input, step1_f2
source("step2.r") # defines step2_f2
source("step3.r") # defines step3_f1, step3_f2, step3_f3
source("step4.r") # defines step4_write_output
step1_read_input(...)
step1_f2(...)
....
step4write_output(...)
You'll need to add a line at the top of each script to read in from stdin. Via this answer:
in_data <- readLines(file("stdin"),1)
You'll also need to write the output of each script to stdout().

R: While loop input

I am bit new to R and have a question about a program I am trying to write. I am hoping to take in files (as many as a user pleases) with a while loop (eventually using read.table on each) but it keeps breaking on me.
Here is what I have so far:
cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
fil<-readLines(con="stdin", 1)
cat(fil, "\n")
while (!input=='X' | !input=='x'){
inputfile=input
input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
}
if(input=='X' | input=='x'){
exit -1
}
When I run it (from the commandline (UNIX)) I get these results:
> library("lattice")
>
> cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
Please enter the full path for your files, if you have no more files to add enter 'X': > fil<-readLines(con="stdin", 1)
x
> cat(fil, "\n")
x
> while (!input=='X' | !input=='x'){
+ inputfile=input
+ input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
+ }
Error: object 'input' not found
Execution halted
I am not quite sure how to fix the problem, but I am pretty sure that it is probably a simple problem.
Any suggestions?
Thanks!
when you first run the script input doesnt exist. Assign
input<-c()
say before your while statement or put
inputfile=input
below input<- readline....
I'm not exactly sure what the underlying problem is for your issue. It may be that you're inputting the directory path incorrectly.
Here's a solution I've used a few times. It makes it much easier for the user. Basically, your code will not require user input, all it requires is that you have a certain naming convention for your files.
setwd("Your/Working/Directory") #This doesn't change
filecontents <- 1
i <- 1
while (filecontents != 0) {
mydata.csv <- try(read.csv(paste("CSV_file_",i,".csv", sep = ""), header = FALSE), silent = TRUE)
if (typeof(mydata.csv) != "list") { #checks to see if the imported data is a list
filecontents <- 0
}
else {
assign(paste('dataset',i, sep=''), mydata)
#Whatever operations you want to do on the files.
i <- i + 1
}
}
As you can see, the naming convention for the files is CSV_file_n where n is any number of input files (i took this code out of one of my programs, in which I load csv's). One of the problems I kept having was Error messages popping up when my code looked for a file that wasn't there. With this loop, those messages won't arise. If it assigns the contents of a non-existant file to mydata.csv, it merely checks to see if mydata.csv is a list. If it is, it continues operating. If not, it stops. If you're worried about differentiating between your data from different files within the code, just insert any relevant information about the file in a constant location within the file itself. For example, in my csv's, My 3rd column always contained the name of the image from which I gathered the information contained in the rest of the csv.
Hope this helps you a bit, even though I see you've already got a solution :-). It's really just an option if you want your program to be more autonomous.

Resources