With the code below, I have imported all .txt files from working directory.
temp=list.files(pattern = "*.txt")
for (i in 1:length(temp)) { assign(temp[i], read.delim(temp[i]))
But all of them came with .txt extension like this.
How can I remove all .txt extensions from data names?
You can rename the variables in your for loop itself
for (i in 1:length(temp)) {assign(sub(".txt$", "", temp[i]), read.delim(temp[i]))}
Or if you have already imported the variables change their names later
vals <- ls(pattern = ".txt$")
for (i in vals) { assign(sub(".txt$", "", i), get(i)) }
and then clean up the old names
rm(list = vals)
On a side note, using assign is considered bad. Read it's potential dangers and side effects here.
Related
I have multiple similarly named but different folders, each containing similarly named but different csv files.
For example, I have three folders named "output", each containing "image.csv" and "cells.csv".
How do I loop through each "output" folder, then read each csv files in the folder and apply function onto these files?
Here's what I tried :
Firstly, I list the folders named "output":
dirs<-list.dirs()
dirs<-dirs[grepl("output",dirs)]
Then I want to set up a function to join both csv files, something like below (codes are incomplete though, please help to correct this):
object_extraction<-function(x){ image<-read.csv(image.csv, header=T, sep=",")
cells<-read.csv(cells.csv, header=T, sep=",")
object<-dplyr::inner_join(cells,image,by="ImageNumber")
return(object)}
Finally I want to loop the function above through the "output" folders
object<-list()
for(i in 1:length(dirs)){
object[[i]]<-object_extraction(dirs[i])
Thank you
Make the path to read csv dynamic in your function
object_extraction<-function(x){
image<-read.csv(paste0(x, '/image.csv'), header=T, sep=",")
#header = T and sep = ',' is default in read.csv so this should
#work without specifying them as well.
cells<-read.csv(paste0(x, '/cells.csv'))
object<-dplyr::inner_join(cells,image,by="ImageNumber")
return(object)
}
and then apply the function to each folder.
dirs <- list.dirs(recursive=FALSE)
dirs <- grep('output', dirs, value = TRUE)
result <- lapply(dirs, object_extraction)
Two errors I can spot in your code:
You need to use the directory name form the dirs variable, eg:
object_extraction<-function(x){
image<-read.csv(file.path(x, "image.csv"), header=T, sep=",")
cells<-read.csv(file.path(x, "cells.csv"), header=T, sep=",")
object<-dplyr::inner_join(cells,image,by="ImageNumber")
return(object)
}
And the file names should be strings, "image.csv" and "cells.csv"
HTH
I have some data the I would like to write to a temporary CSV file in R.
Users have the option to specify a filename of their choice, which is stored in an environment (called 'envr') separate from .GlobalEnv
if (!is.null(envr$filename)) {
write.csv(df, file = paste(envr$filename, ".csv", sep = ""))
}
In order to do this successfully, I need to create a temporary file that is assigned to the filename chosen by the user.
if (!is.null(envr$filename)) {
file.name <- get("filename", envir = envr)
tempfile(fileext = ".csv")
write.csv(df, file = file.name)
}
The above if statement however does not do the job, as a CSV file is not saved in $TMPDIR.
How can I easily integrate tempfile() into the first if statement above without having to assign it to a variable name (file.name)?
You may concatenate the file name (obtained from the filename environment variable) with the temporary folder of the session (using tempdir()), along with the .csv extension, as follows:
if (!is.null(envr$filename)) {
write.csv(df, file = paste0(tempdir(), "/", get("filename", envir = envr), ".csv"))
}
Let me know if it answers your question or if you need any further help.
I have some .vcf files. I have selected those files from my directory and want to convert them to two other formats.
I am a bit confused using if and else if here. I want to do it like this: if there isn't .bgz file for [i]th .vcf file, I want to convert it to .bgz file keeping the original file.
If there is already .bgz file, but not .bgz.tbi file for [i] th .bgz file, then I want to convert .bgz file to .bgz.tbi file keeping the original .bgz that I get from .vcf file.
Can someone please help me finish this loop? It works for if condition, but don't know how to proceed from there.
path.file<-"/mypath/for/files/"
all.files <- list.files("/mypath/for/files")
all.files <- all.files[grepl(".vcf$",all.files)]
for (i in 1:length(all.files)){
if(!exists(paste0(all.files[i],".bgz"))){
bgzip(paste0(path.file,all.files[i]), overwrite=FALSE)
}else{(!exists(paste0(all.files[i],".bgz",".tbi"))){
#if(!exists(paste0(all.files[i],".bgz",".tbi"))){
indexTabix(paste0(paste0(path.file,all.files[i]),".bgz"), format="vcf")
}
}
Try this (not tested):
#get VCF files with path
all.files <- list.files("/mypath/for/files", pattern = "*.vcf$",
full.names = TRUE)
for (i in all.files) {
#make output names, so we don't mess about with paste
file_bgz <- paste0(i, ".bgz")
file_bgz_tbi <- paste0(i, ".bgz.tbi")
#if bgz exists don't zip else zip
if(!exists(file_bgz))
bgzip(i, paste0(i, ".bgz"))
#if tbi exists don't index else tabix
if(!exists(file_bgz_tbi))
indexTabix(file_bgz, format = "vcf")
}
I need to create a function in R that reads all the files in a folder (let's assume that all files are tables in tab delimited format) and create objects with same names in global environment. I did something similar to this (see code below); I was able to write a function that reads all the files in the folder, makes some changes in the first column of each file and writes it back in to the folder. But the I couldn't find how to assign the read files in to an object that will stay in the global environment.
changeCol1 <- function () {
filesInfolder <- list.files()
for (i in 1:length(filesInfolder)){
wrkngFile <- read.table(filesInfolder[i])
wrkngFile[,1] <- gsub(0,1,wrkngFile[,1])
write.table(wrkngFile, file = filesInfolder[i], quote = F, sep = "\t")
}
}
You are much better off assigning them all to elements of a named list (and it's pretty easy to do, too):
changeCol1 <- function () {
filesInfolder <- list.files()
lapply(filesInfolder, function(fname) {
wrkngFile <- read.table(fname)
wrkngFile[,1] <- gsub(0, 1, wrkngFile[,1])
write.table(wrkngFile, file=fname, quote=FALSE, sep="\t")
wrkngFile
}) -> data
names(data) <- filesInfolder
data
}
a_list_full_of_data <- changeCol1()
Also, F will come back to haunt you some day (it's not protected where FALSE and TRUE are).
add this to your loop after making the changes:
assign(filesInfolder[i], wrkngFile, envir=globalenv())
If you want to put them into a list, one way would be, outside your loop, declare a list:
mylist = list()
Then, within your loop, do like so:
mylist[[filesInfolder[i] = wrkngFile]]
And then you can access each object by looking at:
mylist[[filename]]
from the global env.
I would like to scan multiple files for strings in R and know which file names have that string.
Is there a way to do this with something like grep, cat, readLines in a function maybe?
If I scan the files using:
fileNames <- Sys.glob("*.csv")
then maybe something like:
for (f in fileNames) {
stuff <- read.csv(fileName, sep = ",")
grep("string")
}
names(res) <- substr(filenames, 1, 30)
Or maybe even better, a loop like this:
for( f in filenames ){
cat("string", file=f)
}
for( f in filenames) {
cat(readLines(f), sep="\n")
}
This code doesn't work, I'm just trying to think this through. Im certain there is a better way to do this. It sounds simple but I cant get it right.
I want to scan files for strings and then have the output of the filenames where the string was found. I have not found an example to do this in R.
Suggestions?
note that in your first code example you use f as a loop variable while inside the loop you use fileName instead (also R is case sensitive so fileNames and filenames are different objects).
if it's unlikely that your search string contains the CSV delimiter, you can indeed use readLines(..) together with grep(..). grep(..) then returns a list of line numbers where the string occurs. Try the following code:
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
if (length(grep("string", readLines(fileName))) > 0) { print(fileName)}
}