I have two notebooks, and both have RStudio. When I try to use fread module from library data.table I need to specify the file (.csv in this case) for read and usually I just use Control+C and Control+V to the first argument from function. In one of these notebooks works perfectly, but in another it just don't paste anything ..... I need to activate a module or something I'm my RStudio?
library(data.table)
test <- fread("does not paste anything", sep=";", dec=",")
Related
I have an R script that makes a Bash call to execute a perl script. However, I need to fit everything into a single .R file. Hence, I want to add the Perl script into the body of my R code, so that the .pl file will be generated from within R.
I have so far tried to do this with writeLines but this seems to be very tricky as I cannot simply paste the entire script into one object and have to break it into pieces with paste. This is mainly caused by the fact that my Perl script contains many " and ' characters that confuses R.
Is there an easy way to achieve this with the least amount of usage of paste?
I think the cat() function could be of help. If I understand correctly what you want is a R script that will generate a perl script. If so, what you can do is use cat like so:
cat("piece of script", file = "path/to/script.pl")
use append=T to keep editing the script:
cat("piece of script", file = "path/tp/script.pl", append =T)
Then call the script with system() (I guess you already know that):
system("perl path/to/script.pl")
Regarding the quotation marks, I guess you have to deal with them properly in your R script, i.e:
cat("`piece of script`", file = "path/tp/script.pl", append =T)
for single quotation marks and
cat("\"piece of script\"", file = "path/tp/script.pl", append =T)
for double.
Hope this helps.
Writing a fresh .Rda file to save a data.frame is easy:
df <- data.frame(a=c(1,2,3,4), b=c(5,6,7,8))
save(df,file="data.Rda")
But is it possible to write more data afterwards, there is no append=TRUE option using save.
Similarly, writing new lines to a text file is easy using:
write.table(df, file = 'data.txt', append=T)
However for large data.frames, the resulting file is much larger.
If you use Microsoft R, you might want to check RevoScaler package, rxImport function in particular. It allows you to store compressed data.frame in file, it also allows you to append new lines to existing file without loading it into environment.
Hope this helps. Link on function documentation below.
https://learn.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/rximport
I would like to know what is the recommended way of reading a data.table from an archived file (zip archive in my case). One obvious option is to unzip it to a temporary file and then fread() it as usual. I don't want to bother about creating new file, so instead I use read.table() from unz() connection and then convert it with data.table():
mydt <- data.table(read.table(unz(myzipfilename, myfilename)))
This works fine but read.table() is slow for big files while fread() can't read unz() connection directly. I'm wondering if there is any better solution.
Look at: Read Ziped CSV File with fread
To avoid tmp files you can use unzip with -p extract files to pipe, no messages
You can use such a kind of statements with fread.
x = fread('unzip -p test/allRequests.csv.zip')
Or with gunzip
x = fread('gunzip -cq test/allRequests.csv.gz')
You can also use grep or other tools.
I am trying to make my current project reproducible, and so am creating a master document (eventually a .rmd file) that will be used to call and execute several other documents. This way myself and other investigators only need to open and run one file.
There are three layers to the current setup: master file, 2 read-in files, 2 databases. The master file calls the read-in files using source(), and the read-in files parse the .csv databases and apply labels.
The read-in files and the databases are generated automatically with the data management software I'm currently using (REDCap) each time I download the updated data.
However, the read-in files have a line of code that removes all of the objects in my environment. I would like to edit the read-in files directly from the master file so that I do not have to open the read-in files individually each time I run my report. Specifically, since all the read-in files are the same, I would like to remove line #2 in each.
I've tried searching Google, and tried file.edit(), but have been unable to find anything. Not even sure it is possible, but figured I would ask. Let me know if I can improve this question or if you need any additional code to answer it. Thanks!
Current relevant master code (edited for generality):
source("read-in1")
source("read-in2")
Current relevant read-in file code (same in each file, except for the database name):
#Clear existing data and graphics
rm(list=ls())
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
[read-in code truncated]
Additional details:
OS: Windows 7 Professional x86
R version: 3.1.3
R Studio version: 0.99.441
You might try readLines() and something like the following (which was simplified greatly by a suggestion from #Hong Ooi below):
eval(parse(readLines("read-in1.R")[-2]))
My original solution which was much more pedantic:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
for (l in t[-2]) { eval(parse(text=l)) }
The for() loop just parses and evaluates each line from the text file except for the second one (that's what the -2 index value does). If you're reading and writing longer files then the following will be much faster than the second option, however still less preferable than #Hong Ooi's:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
f <- file("out.R", open="w")
o <- writeLines(t[-2], f)
close(f)
source("out.R")
Sorry I'm so late in noticing this question, but you may want to investigate getting access the the REDCap API and using either the redcapAPI package or the REDCapR package. Both of those packages will allow you to export the data from REDCap and directly into R without having to use the download scripts. redcapAPI will even apply all the formats and dates (REDCapR might do this now too. It was in the plan, but I haven't used it in a while).
You could try this. It just calls some shell commands: (1) renames the file, then (2) copies all lines not containing rm(list=ls()) to a new file with the same name as the original file, then (3) removes the copy.
files_to_change <- c("read-in1.R", "read-in2.R")
for (f in files_to_change) {
old <- paste0(f, ".old")
system(paste("cmd.exe /c ren", f, old))
system(paste("cmd.exe /c findstr /v rm(list=ls())", old, ">", f))
system(paste("cmd.exe /c rm", old))
}
After calling this loop you should have
#Clear existing data and graphics
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
in your read-in*.R files. You could put this in a batch script
#echo off
ren "%~f1" "%~nx1.old"
findstr /v "rm(list=ls())" "%~f1.old" > "%~f1"
rm "%~nx1.old"
say, "example.bat", and call that in the same way using system.
Situation
I wrote an R program which I split up into multiple R-files for the sake of keeping a good code structure.
There is a Main.R file which references all the other R-files with the 'source()' command, like this:
source(paste(getwd(), dirname1, 'otherfile1.R', sep="/"))
source(paste(getwd(), dirname3, 'otherfile2.R', sep="/"))
...
As you can see, the working directory needs to be set correctly in advance, otherwise, this could go wrong.
Now, if I want to share this R program with someone else, I have to pass all the R files and folders in relative order of each other for things to work. Hence my next question.
Question
Is there a way to replace all the 'source' commands with the actual R script code which it refers to? That way, I have a SINGLE R script file, which I can simply pass along without having to worry about setting the working directory.
I'm not looking for a solution which is an 'R package' (which by the way is one single directory, so I would lose my own directory structure). I simply wondering if there is an easy way to combine these self-referencing R files into one single file.
Thanks,
Ok I think you could use something like scaning all the files and then writting them again in the same new one. This can be done using readLines and sink:
sink("mynewRfile.R")
for(i in Nfiles){
current_file = readLines(filedir[i])
cat("\n\n#### Current file:",filedir[i],"\n\n")
cat(current_file, sep ="\n")
}
sink()
Here I have supposed all your file directories are in a vector filedir with length Nfiles, I guess you can adapt that