Converting argument to partial string - r

I'm sure this is pretty basic, but I haven't been able to find an answer on stackoverflow.
The basics of what I'm working with is
f1 <- function(x) {
setwd("~/Rdir/x")
col1 <- f2(...)
col2 <- f3(...)
genelist <- data.frame(co1,col2)
write.csv(genelist, file="x.csv")
}
Essentially what I want is for x to be replaced by whatever I input for example
f1(test) would save a file called test.csv into the directory Rdir/test.
I would post a more complete code sample of what i'm working with - but it is very long.

You can use ?paste:
setwd(paste("~/Rdir/", x, sep=""));
and
write.csv(genelist, file=paste(x, ".csv", sep=""))
in your example. However, it might me more straightforward not to change the working directory but instead just to specify the full path when saving:
write.csv(genelist, file=paste("~/Rdir/", x, "/", x, ".csv", sep=""))
but be aware that this will crash if the directory does not exist. You could have a look at ?dir.create to create the directory first, in case it does not exist.

You can create the filename with paste0 and the path with file.path:
x <- "test"
file.path("~/Rdir", x, paste0(x, ".csv"))
# "~/Rdir/test/test.csv"

Related

Config file in a csv (or txt) format

I want to create a config file. In an R file it would look like the following:
#file:config.R
min_birthday_year <- 1920
max_birthday <- Sys.Date() %m+% months(9)
min_startdate_year <- 2010
max_startdate_year <- 2022
And in the main script I would do: source("config.R") .
However, now I want to source the config data from a .csv file. Does anyone have any idea how to? The file could also be in a .txt format
First thing I would suggest is looking into the config package.
It allows you to specify variables in a yaml text file. I haven't used it but it seems pretty neat and looks like it may be a good solution.
If you don't want to use that, then if your csv is something like this, with var names in one column and values in the next:
min_birthday_year,1920
max_birthday,Sys.Date() %m+% months(9)
min_startdate_year,2010
max_startdate_year,2022
then you could do something like this:
# Read in the file
# assuming that names are in one column and values in another
# will create vars using vals from second col with names from first
config <- read.table("config.csv", sep = ",")
# mapply with assign, with var names in one vector and values in the other
# eval(parse()) call to evaluate value as an expression - needed to evaluate the Sys.Date() thing.
# tryCatch in case you add a string value to the csv at some point, which will throw an error in the `eval` call
mapply(
function(x, y) {
z <- tryCatch(
eval(parse(text = y)),
error = function(e) y
)
assign(x, z, inherits = TRUE)
},
config[[1]],
config[[2]]
)

Importing files into R if filename matches specified criteria

Using R I am trying to loop the import of csv files iff the filename contains a specific string
For example, I have a list of files with names 'file01042016_abc.csv', 'file020142016_abc.csv', 'file03042016_abc.csv'...'file26092019_abc.csv' and I have a list of specific values in the format '01042016', '05042016', '09042016', etc.
I would like to only import the files if the filename contains the string value in the second list.
I can import them altogether (shown below) but there are several thousand files and takes a considerable amount of time so would like to reduce it by importing only the files needed based on condition mentioned above.
files <- list.files(path)
for (i in 1:length(files)) {
assign(paste("Df", files[i], sep = "_"), read.csv(paste(path, files[i], sep='')))
}
Any help/suggestions would be greatly appreciated. Thank you.
Using regex along with grepl:
files <- list.files(path)
formats <- c("01042016", "05042016", "09042016")
regex <- paste(formats, collapse="|")
sapply(files, function(x) {
if (grepl(regex, x)) {
assign(paste("Df", x, sep = "_"), read.csv(paste(path, x, sep='')))
}
})
The strategy here is to generate a single regex alternation containing all numeric filename fragments which would whitelist a file as a candidate to be read. For the sample data given above, regex would become:
01042016|05042016|09042016
Then, we call grepl on each file to see if it matches one of the whitelisted patterns. Note that I switched to using sapply as files.list returns a character vector of filenames.
We can just prefilter the files vector, and then loop as normal.
files0 <- c('file01042016_abc.csv', 'file020142016_abc.csv',
'file03042016_abc.csv', 'file26092019_abc.csv',
'file09042016_abc.csv')
k <- c('01042016', '05042016', '09042016')
pat <- paste(k, collapse="|")
files <- grep(pat, files0, value=TRUE)
files
# [1] "file01042016_abc.csv" "file09042016_abc.csv"

How to insert text in specific in directory in R

I am looking for an elegant way to insert character (name) into directory and create .csv file. I found one possible solution, however I am looking another without "replacing" but "inserting" text between specific charaktects.
#lets start
df <-data.frame()
name <- c("John Johnson")
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
#how to insert "name" vector between "Desktop/" and "." to get:
dir <- c("C:/Users/uzytkownik/Desktop/John Johnson.csv")
write.csv(df, file=dir)
#???
#I found the answer but it is not very elegant in my opinion
library(qdapRegex)
dir2 <- c("C:/Users/uzytkownik/Desktop/ab.csv")
dir2<-rm_between(dir2,'a','b', replacement = name)
> dir2
[1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
write.csv(df, file=dir2)
I like sprintf syntax for "fill-in-the-blank" style string construction:
name <- c("John Johnson")
sprintf("C:/Users/uzytkownik/Desktop/%s.csv", name)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
Another option, if you can't put the %s in the directory string, is to use sub. This is replacing, but it replaces .csv with <name>.csv.
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
sub(".csv", paste0(name, ".csv"), dir, fixed = TRUE)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
This should get you what you need.
dir <- "C:/Users/uzytkownik/Desktop/.csv"
name <- "joe depp"
dirsplit <- strsplit(dir,"\\/\\.")
paste0(dirsplit[[1]][1],"/",name,".",dirsplit[[1]][2])
[1] "C:/Users/uzytkownik/Desktop/joe depp.csv"
I find that paste0() is the way to go, so long as you store your directory and extension separately:
path <- "some/path/"
file <- "file"
ext <- ".csv"
write.csv(myobj, file = paste0(path, file, ext))
For those unfamiliar, paste0() is shorthand for paste( , sep="").
Let’s suppose you have list with the desired names for some data structures you want to save, for instance:
names = [“file_1”, “file_2”, “file_3”]
Now, you want to update the path in which you are going to save your files adding the name plus the extension,
path = “/Users/Documents/Test_Folder/”
extension = “.csv”
A simple way to achieve it is using paste() to create the full path as input for write.csv() inside a lapply, as follows:
lapply(names, function(x) {
write.csv(x = data,
file = paste(path, x, extension))
}
)
The good thing of this approach is you can iterate on your list which contain the names of your files and the final path will be updated automatically. One possible extension is to define a list with extensions and update the path accordingly.

looping over all files in the same directory in R

the following code in R for all the files. actually I made a for loop for that but when I run it it will be applied only on one file not all of them. BTW, my files do not have header.
You use [[ to subset something from peaks. However, after reading it using the file name, it is a data frame with then no more reference to the file name. Thus, you just have to get rid of the [[i]].
for (i in filelist.coverages) {
peaks <- read.delim(i, sep='', header=F)
PeakSizes <- c(PeakSizes, peaks$V3 - peaks$V2)
}
By using the iterator i within read.delim() which holds a new file name each time, every time R goes through the loop, peaks will have the content of a new file.
In your code, i is referencing to a name file. Use indices instead.
And, by the way, don't use setwd, use full.names = TRUE option in list.files. And preallocate PeakSizes like this: PeakSizes <- numeric(length(filelist.coverages)).
So do:
filelist.coverages <- list.files('K:/prostate_cancer_porto/H3K27me3_ChIPseq/',
pattern = 'island.bed', full.names = TRUE)
##all 97 bed files
PeakSizes <- numeric(length(filelist.coverages))
for (i in seq_along(filelist.coverages)) {
peaks <- read.delim(filelist.coverages[i], sep = '', header = FALSE)
PeakSizes[i] <- peaks$V3 - peaks$V2
}
Or you could simply use sapply or purrr::map_dbl:
sapply(filelist.coverages, function(file) {
peaks <- read.delim(file, sep = '', header = FALSE)
peaks$V3 - peaks$V2
})

R file inputs and histogram

I am a bit new to R and trying to learn but I am confused as to how to fix a problem that I have stumbled upon. I am trying to input multiple files so that I may make one histogram per file. The code works well, especially with just one file, but I have encountered a problem when I enter multiple files.
EDIT: Ending code
library("scales")
library("tcltk")
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1))
Num.Files<-NROW(File.names)
dat <- lapply(File.names,read.table,header = TRUE)
names(dat) <- paste("f", 1:length(Num.Files), sep="")
tmp <- stack(lapply(dat,function(x) x[,14]))
require(ggplot2)
ggplot(tmp,aes(x = values)) +
facet_wrap(~ind) +
geom_histogram(aes(y=..count../sum(..count..)))
Well, here's something to get you started (but I can't be sure it will work exactly for you, since your code isn't reproducible):
dat <- lapply(File.names,read.table,header = TRUE)
names(dat) <- paste("f", 1:length(Num.Files), sep="")
tmp <- stack(lapply(dat,function(x) x[,14]))
require(ggplot2)
ggplot(tmp,aes(x = values)) +
facet_wrap(~ind) +
geom_histogram()
Ditch everything your wrote after this line:
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1))
and use the above code instead.
A few other explanations (BlueTrin explained the first error):
for (i in Num.Files){
f<- read.table(File.names[i],header=TRUE)
}
This will loop through your file names and read each one, but it will overwrite the previous file each time through the loop. What you'll be left with is only the last file stored in f.
colnames(f) <- c(1:18)
histoCol <- c(f$'14')
You don't need the c() function here. Just 1:18 is sufficient. But numbers as column names are generally awkward, and should probably be avoided.
f(Num.Files) <- paste("f", 1:length(Num.Files), sep = "") : could not find function "f<-"
This specific error happens because you try to assign a string into the result of a function.
This should load the values into a list:
library("lattice");
library("tcltk");
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1));
Num.Files<-NROW(File.names);
result_list = list();
#f(Num.Files)<-paste("f", 1:length(Num.Files), sep="");
#ls();
for (i in Num.Files) {
full_path = File.names[i];
short_name = basename(full_path);
result_list[[short_name]] = read.table(full_path,header=TRUE);
}
Once you run this program, you can type 'result_list$' without the quotes and press TAB for completion. Alternatively you can use result_list[[1]] for example to access the first table.
result_list is a variable of type list, it is a container which supports indexation by a label, which is the filename in this case. (I replaced the full filename with the short filename as the full filename is a bit ugly in a list but feel free to change it back).
Be careful to not use f as a variable, f is a reserved keyword when you create your function. If you try to replace result_list in the program above with f it should fail to work.
I hope it is enough, with the other solution, to get you started !

Resources