Reading zip files using fread - r

I tried to call a zip file using fread as like this
data<-("www/608.zip")
test<- fread('gunzip -cq data')
It showed this error does not exist or is non-readable
But it will work if I call
test<- fread('gunzip -cq www/608.zip')
On my script each time value of data will change so I used If command for choosing data as like this
data<-reactive({
if (input$list == 'all')
{
"www/6.zip"
}
else{
if (input$list == 'hkj')
{
"www/6.zip"
}

I think it should work as follows:
data <- "www/608.zip"
test <- fread(cmd = paste("gunzip -cq", data))
i.e. you have to create a command string with paste() first and then pass it as cmd argument to fread().

If you want to read the file path you can use paste0 to create the string
data <- "www/608.zip"
test <- fread(cmd = paste0("gunzip -cq ", data))
fread suggest to use cmd argument for security reasons.

We can also use glue
data <- "www/608.zip"
fread(cmd = glue::glue("gunzip -cq {data}"))

Related

R script output values into another folder or directory

After running my R script in the terminal I get two output data files: a.dat and b.dat. My goal is to directly divert these output files into a new folder.
Is there any way to do something like this:
Rscript myscript.R > folder
Note: For writing the output file I simply use this:
write(t(result1), file = "a.dat", ncolumns = 5, append=TRUE)
I solved my problem by doing the following:
I created an output folder 'output'
I added the full path of the output in myscript.R as
write(t(result1), file = "home/Documents/output/a.dat", ncolumns = 5, append=TRUE)
Solved! :)
You could simply use write.table create two csv files like this:
A minimal working example:
using a r-script called "Rfile.r" in the directory "adir" in my "Dokumente" folder. the script reads the first two inputs , a numeric as the input argument for the function , aswell as a character string with the output-target-directory . (you could also do filenames , etc of course..)
Rfile.r ::
# set arguments, to later specifiy in terminal ,
# one numeric and one target directory
arg <- commandArgs(trailingOnly = TRUE)
n<-as.numeric(arg[1])
path<-as.character(arg[2])
## A random function two create two csv 's
fun <- function(n) {
data.a <-data.frame(rep("Some Data", n))
data.b<-data.frame(rnorm(1:n))
data<-list(data.a,data.b)
return(data)
}
# create data using input arg[1], aka 'n'
data<-fun(n)
# now the important Part: using write.table with the arg[2] aka 'path'
# :
write.table(data[1],file =paste(path,"/data_a.csv", sep = ""))
write.table(data[2],file =paste(path,"/data_b.csv", sep = ""))
## write terminal output message using cat()
cat(paste("Your input was :" ,arg[1],sep="\t"),
paste( "your target path was:" ,arg[2] ,sep="\t"), sep = "\n")
then run in a terminal :
$ Rscript ~/Dokumente/adir/Rfile.r 3 ~/Dokumente/bdir
it creates two csv's in the directory "bdir" called "data_a.csv" and "data_b.csv" where 3 was the numeric input for the function in Rfile.r

Dynamically dropping files while reading using lapply

I've got the following code to read several files and append them separately to a single list along with the file name
foo <- function(fname){
fread(fname, skip = 5, header = TRUE, sep = " ") %>%
mutate(fn = fname)
}
all <- lapply(files, FUN = foo)
After the file is read, I would like to insert a condition in the function which checks for some properties in the file failing which it drops the file along with the filename.
Not strictly related to reading a table but other files also
Edit:
I also use the following efficient method of doing it from here:
all <- setNames(lapply(files, foo), files)
I tried the following fairly simple option using Filter.
I used the condition in the Filter
all <- setNames(lapply(myFiles, function(x) {readLAS(x)}), myFiles)
all <- Filter(function(x) {area(x) >= 640}, all)
Not while running the lapply function but it uses only one extra line of code

looping over all files in the same directory in R

the following code in R for all the files. actually I made a for loop for that but when I run it it will be applied only on one file not all of them. BTW, my files do not have header.
You use [[ to subset something from peaks. However, after reading it using the file name, it is a data frame with then no more reference to the file name. Thus, you just have to get rid of the [[i]].
for (i in filelist.coverages) {
peaks <- read.delim(i, sep='', header=F)
PeakSizes <- c(PeakSizes, peaks$V3 - peaks$V2)
}
By using the iterator i within read.delim() which holds a new file name each time, every time R goes through the loop, peaks will have the content of a new file.
In your code, i is referencing to a name file. Use indices instead.
And, by the way, don't use setwd, use full.names = TRUE option in list.files. And preallocate PeakSizes like this: PeakSizes <- numeric(length(filelist.coverages)).
So do:
filelist.coverages <- list.files('K:/prostate_cancer_porto/H3K27me3_ChIPseq/',
pattern = 'island.bed', full.names = TRUE)
##all 97 bed files
PeakSizes <- numeric(length(filelist.coverages))
for (i in seq_along(filelist.coverages)) {
peaks <- read.delim(filelist.coverages[i], sep = '', header = FALSE)
PeakSizes[i] <- peaks$V3 - peaks$V2
}
Or you could simply use sapply or purrr::map_dbl:
sapply(filelist.coverages, function(file) {
peaks <- read.delim(file, sep = '', header = FALSE)
peaks$V3 - peaks$V2
})

R : name of an object stored in a variable

I have this little problem in R : I loaded a dataset, modified it and stored it in the variable "mean". Then I used an other variable "dataset" also containing this dataset
data<-read.table()
[...modification on data...]
mean<-data
dataset<-mean
I used the variable "dataset" in some other functions of my script, etc. and at the end I want to store in a file with the name "table_mean.csv"
Of course the command write.csv(tabCorr,file=paste("table_",dataset,".csv",sep=""))
nor the one with ...,quote(dataset)... do what I want...
Does anyone know how I can retrieve "mean" (as string) from "dataset" ?
(The aim would be that I could use this script for other purposes simply changing e.g. dataset<-variance)
Thank you in advance !
I think you are trying to do something like the following code does:
data1 <- 1:4
data2 <- 4:8
## Configuration ###
useThisDataSet <- "data2" # Change to "data1" to use other dataset.
currentDataSet <- get(x = useThisDataSet)
## Your data analysis.
result <- fivenum(currentDataSet)
## Save results.
write.csv(x = result, file = paste0("table_", useThisDataSet, ".csv"))
However, a better alternative would be to wrap your code into a function and pass in your data:
doAnalysis <- function(data, name) {
result <- fivenum(data)
write.csv(x = result, file = paste0("table_", name, ".csv"))
}
doAnalysis(data1, "data1")
If you always want to use the name of the object passed into the function as part of the filename, we can use non-standard evaluation to save some typing:
doAnalysisShort <- function(data) {
result <- fivenum(data)
write.csv(x = result, file = paste0("table_", substitute(data), ".csv"))
}
doAnalysisShort(data1)

R, Rscript, Works when variables hard coded, but not when passed as argument

I built the following R script to take a .csv generated by an automated report and split it into several .csv files.
This code works perfectly, and outputs a .csv file for each unique value of "facility" in "todays_data.csv":
disps <- read.csv("/Users/me/Downloads/todays_data.csv", header = TRUE, sep=",")
for (facility in levels(disps$Facility)) {
temp <- subset(disps, disps$Facility == facility & disps$Alert.End == "")
temp <- temp[order(temp$Unit, temp$Area),]
fn <- paste("/Users/me/Documents/information/", facility, "_todays_data.csv", sep = "")
write.csv(temp, fn, row.names=FALSE)
}
But this does not output anything:
args <- commandArgs(trailingOnly = TRUE)
file <- args[1]
disps <- read.csv(file, header = TRUE, sep=",")
for (facility in levels(disps$Facility)) {
temp <- subset(disps, disps$Facility == facility & disps$Alert.End == "")
temp <- temp[order(temp$Unit, temp$Area),]
fn <- paste("/Users/me/Documents/information/", facility, "_todays_data.csv", sep = "")
write.csv(temp, fn, row.names=FALSE)
}
The only difference between the two files is that the first hardcodes the path to the .csv file to be split, while the second one has it passed as an argument in the command line using Rscript.
The read.csv() command works with the passed file path, because I can successfully run commands like head(disps) while running the script via Rscript.
Nothing within the for-loop will execute when run via Rscript, but things before and after it will.
Does anyone have any clues as to what I've missed? Thank you.

Resources