After running my R script in the terminal I get two output data files: a.dat and b.dat. My goal is to directly divert these output files into a new folder.
Is there any way to do something like this:
Rscript myscript.R > folder
Note: For writing the output file I simply use this:
write(t(result1), file = "a.dat", ncolumns = 5, append=TRUE)
I solved my problem by doing the following:
I created an output folder 'output'
I added the full path of the output in myscript.R as
write(t(result1), file = "home/Documents/output/a.dat", ncolumns = 5, append=TRUE)
Solved! :)
You could simply use write.table create two csv files like this:
A minimal working example:
using a r-script called "Rfile.r" in the directory "adir" in my "Dokumente" folder. the script reads the first two inputs , a numeric as the input argument for the function , aswell as a character string with the output-target-directory . (you could also do filenames , etc of course..)
Rfile.r ::
# set arguments, to later specifiy in terminal ,
# one numeric and one target directory
arg <- commandArgs(trailingOnly = TRUE)
n<-as.numeric(arg[1])
path<-as.character(arg[2])
## A random function two create two csv 's
fun <- function(n) {
data.a <-data.frame(rep("Some Data", n))
data.b<-data.frame(rnorm(1:n))
data<-list(data.a,data.b)
return(data)
}
# create data using input arg[1], aka 'n'
data<-fun(n)
# now the important Part: using write.table with the arg[2] aka 'path'
# :
write.table(data[1],file =paste(path,"/data_a.csv", sep = ""))
write.table(data[2],file =paste(path,"/data_b.csv", sep = ""))
## write terminal output message using cat()
cat(paste("Your input was :" ,arg[1],sep="\t"),
paste( "your target path was:" ,arg[2] ,sep="\t"), sep = "\n")
then run in a terminal :
$ Rscript ~/Dokumente/adir/Rfile.r 3 ~/Dokumente/bdir
it creates two csv's in the directory "bdir" called "data_a.csv" and "data_b.csv" where 3 was the numeric input for the function in Rfile.r
Related
I understand how to read a CSV file that is stored on disk, but I don't know how to stream in CSV content via CLI using R.
E.g., Reading CSV file from disk using a simple CLI.
library(optparse)
option_list <- list(
# Absolute filepath to CSV file.
make_option(c("-c","--csv"),type="character",default=NULL,
help="CSV filepath",metavar="character")
);
opt_parser <- OptionParser(option_list=option_list)
opt <- parse_args(opt_parser)
csv_filepath <- opt$csv
csv <- read.csv(csv_filepath)
How would I do this if I'm working with a data stream?
R always reads from connections. A connection can be a file, and url, an in-memory text, and so on.
So, in case you wanna read csv-format data from a content that is already in memory, you just use the text= parameter, instead of a file name.
Like this:
my_stream = "name;age\nJulie;25\nJohn;26"
read.csv(text = my_stream, sep = ";", header = T)
The output will be:
name age
1 Julie 25
2 John 26
You can place additional parameters to read.csv() normally, of course.
R source and package optparse.
First, write an R source file "example.R", such as the following.
#!/usr/bin/env Rscript
#
# R source: example.R
# options: -c --csv
#
library(optparse)
option_list <- list(
# Absolute filepath to CSV file.
make_option(c("-c","--csv"),type="character",default=NULL,
help="CSV filepath",metavar="character")
)
opt_parser <- OptionParser(option_list=option_list)
opt <- parse_args(opt_parser)
csv_filepath <- opt$csv
csv <- read.csv(csv_filepath)
message(paste("\nfile read:", csv_filepath, "\n"))
str(csv)
Then, change the execute permissions, in order for the bash shell to recognize the #! shebang and run Rscript passing it the file.
In this case, I will change the user permissions only, not its group.
bash$ chmod u+x example.R
The test.
I have tested the above script with this data.frame:
df1 <- data.frame(id=1:5, name=letters[1:5])
write.csv(df1, "test.csv", row.names=FALSE)
And, at a Ubuntu 20.04 LTS, ran the command ./example.R passing it the CSV filename in argument csv. The command and its output were
bash$ ./example.R --csv=test.csv
file read: test.csv
'data.frame': 5 obs. of 2 variables:
$ id : int 1 2 3 4 5
$ name: chr "a" "b" "c" "d" ...
I have the following piece of code to write to an R file one line at a time.
for (i in c(1:10)){
writeLines(as.character(i),file("output.csv"))
}
It just writes 10 presumably over-writing the previous lines. How do I make R append the new line to the existing output? append = TRUE does not work.
append = TRUE does work when using the function cat (instead of writeLines), but only if you give cat a file name, not when you give it a file object: whether a file is being appended to or overwritten is a property of the file object itself, i.e. it needs to be specifried when the file is being opened.
Thus both of these work:
f = file('filename', open = 'a') # open in “a”ppend mode
for (i in 1 : 10) writeLines(i, f)
for (i in 1 : 10) cat(i, '\n', file = 'filename', sep = '', append = TRUE)
Calling file manually is almost never necessary in R.
… but as the other answer shows, you can (and should!) avoid the loop anyway.
You won't need a loop. Use newline escape charater \n as separator instead.
vec <- c(1:10)
writeLines(as.character(vec), file("output.csv"), sep="\n")
I'm processing files through an application using R. The application requires a simple inputfile, outputfilename specification as parameters. Using the below code, this works fine.
input <- "\"7374.txt\""
output <- "\"7374_cleaned.txt\""
system2("DataCleaner", args = c(input, output))
However I wish to process a folder of .txt files, rather then have to do each one individually. If i had access to the source code i would simply alter the application to accept a folder rather then an individual file, but unfortunately i don't. Is it possible to somehow do this in R? I had tried starting to create a loop,
input <- dir(pattern=".txt")
but i don't know how i could insert a vector in as an argument without the regex included as part of that? Also i would then need to be able to paste '_cleaned' on to the end of the outputfile names? Many thanks in advance.
Obviously, I can't test it because I don't have your DataCleaner program but how about this...
# make some files
dir.create('folder')
x = sapply(seq_along(1:5), function(f) {t = tempfile(tmpdir = 'folder', fileext = '.txt'); file.create(t); t})
# find the files
inputfiles = list.files(path = 'folder', pattern = 'txt', full.names = T)
# remove the extension
base = tools::file_path_sans_ext(inputfiles)
# make the output file names
outputfiles = paste0(base, '_cleaned.txt')
mysystem <- function(input, output) {
system2('DataCleaner', args = c(input, output))
}
lapply(seq_along(1:length(inputfiles)), function(f) mysystem(inputfiles[f], outputfiles[f]))
It uses lapply to iterate over all the members of the input and output files and calls the system2 function.
I have a bunch of ZIP archives that each contain a bunch of text files. I want to read all the text into memory, one string per file, and with each file tagged with the corresponding filename, but without removing the original ZIP files or writing all the contents to disk. (If writing temporary files is a must, they should be deleted once we're done reading them, or if processing is interrupted.)
For example, suppose you create a simple ZIP like this:
$ echo 'contents1' > file1
$ echo 'contents2' > file2
$ zip files.zip file1 file2
Then calling myfunction("files.zip") should return the same thing as list(file1 = "contents1\n", file2 = "contents2\n").
I currently use the following function, which uses Info-ZIP unzip. It works fine, except that its code to detect the end of one file and the beginning of another might trigger on file contents instead.
library(stringr)
slurp.zip = function(path)
# Extracts each file in the zip file at `path` as a single
# string. The names of the resulting list are set to the inner
# file names.
{lines = system2("unzip", c("-c", path), stdout = T)
is.sep = str_detect(lines, "^ (?: inflating|extracting): ")
chunks = lapply(
split(lines[!is.sep], cumsum(is.sep)[!is.sep])[-1],
function(chunk) paste(chunk, collapse = "\n"))
fnames = str_match(lines[is.sep], "^ (?: inflating|extracting): (.+) $")
stopifnot(!anyNA(fnames))
names(chunks) = fnames[,2]
chunks}
We can use unzip(..., list = TRUE) to get the file names in the archive, without actually extracting them. Then we can use unz to create connections to the files, which can be read using e.g. readLines or scan:
slurp.zip = function(path) {
sapply(unzip(path, list = TRUE)$Name, function(x)
paste0(readLines(unz('files.zip', x)), collapse = '\n'),
simplify = FALSE, USE.NAMES = TRUE)
}
dput(slurp.zip('files.zip'))
# list(file1 = "contents1\n", file2 = "contents2\n")
I have used R to download about 200 zip files. The zipped files are in mmyy.dat format. The next step is to use R to unzip all the files and rename it as yymm.txt. I know the function unzip can unpack the files. But I am not sure which argument in the function can change the name and format of the unzipped files as well.
And when I unzip the files using
for (i in 1:length(destfile)){
unzip(destfile[i],exdir='C:/data/cps1')
}
The files extrated are jan94pub.cps which is supposed to be jan94pub.dat. The code I use to download the files are here.
month_vec <- c('jan','feb','mar','apr','may', jun','jul','aug','sep','oct','nov','dec')
year_vec <- c('94','95','96','97','98','99','00','01','02','03','04','05','06','07','08','09','10','11','12','13','14')
url <- "http://www.nber.org/cps-basic/"
month_year_vec <- apply(expand.grid(month_vec, year_vec), 1, paste, collapse="")
bab <-'pub.zip'
url1 <- paste(url,month_year_vec,bab,sep='')
for (i in 1:length(url1)){
destfile <- paste('C:/data/cps1/',month_year_vec,bab,sep='')
download.file(url1[i],destfile[i])
}
for (i in 1:length(destfile)){
unzip(destfile[i],exdir='C:/data/cps1')
}
When I use str(destfile), the filenames are correct, jan94pub.dat. I don't see where my code goes wrong.
I'd do something like:
file_list = list.files('*zip')
lapply(file_list, unzip)
Next you want to use the same kind of lapply trick in combination with strptime to convert the name of the file to a date:
t = strptime('010101.txt', format = '%d%m%y.txt') # Note I appended 01 (day) before the name, you can use paste for this (and its collapse argument)
[1] "2001-01-01"
You will need to tweak the filename a bit to get a reliable date, as only the month and the year is not enough. Next you can use strftime to transform it back to you desired yymm.txt format:
strftime(t, format = '%y%d.txt')
[1] "0101.txt"
Then you can use file.rename to perform the actual moving. To get this functionality into one function call, create a function which performs all the steps:
unzip_and_move = function(path) {
# - Get file list
# - Unzip files
# - create output file list
# - Move files
}