I have several files in a directory data.
these files are named like this:
file_file_sd_daf_800_800_log-(3-got)_20100101_20121012
All files share all parts of the name but differ with part sd.
I want to extract only this part of file name as one column and write out to text file .
I list all file like this:
dir<- list.files("C:\\data", "*.txt", full.names = TRUE)
okay, this should work (using regular expressions):
dir_ <- list.files("C:\\data", "*.txt", full.names = TRUE)
tmp <- regmatches(dir_, regexec("file_file_(.+)_daf.+", dir_))
sapply(tmp, "[", 2)
a little test with your example:
x <- "file_file_sd_daf_800_800_log-(3-got)_20100101_20121012"
regmatches(x, regexec("file_file_(.+)_daf.+", x))[[1]][2]
# [1] "sd"
you can then write the different bits you get to a file by using write.
Related
I have the following example:
# Vector of names
test <- c("banana", "maca")
# Directories
from.dir <- "C:/Users/Windows 10/Documents/teste"
to.dir <- "C:/Users/Windows 10/Documents/teste2"
# Listing files and copy
files <- list.files(path = from.dir, pattern = test, recursive = T)
for (f in files) file.copy(from = f, to = to.dir)
I have a vector of names that include two names (banana and maca);
I have a directory named "teste". Inside this directory, I have 2 folders. In the first folder has an image named "banana" in the second folder has an image named "maca";
I wanna copy these two images for another directory named "teste2";
I getting an error in list.files(). It's just shown me the first name present in the first folder which is "banana". It's not shown me the name "maca", present in the second folder;
In this way, I can't use the for() to copy files.
Thank's I appreciate all help
I think you need to add an additional loop to iterate through each element in test. list.files is probably expecting a string (e.g. "banana") but instead you passed a vector
for(pattern in test){
files <- list.files(path = from.dir, pattern = pattern, recursive = T)
for (f in files) file.copy(from = f, to = to.dir)
}
I want to read the sheet that contains the word "All"or "all" of an excel workbook for every subdirectory based on a specific pattern.
I have tried list.files() but it does not work properly.
files_to_read = list.files(
path = common_path, # directory to search within
pattern = "X - GEN", # regex pattern, some explanation below
recursive = TRUE, # search subdirectories
full.names = TRUE # return the full path
)
data_lst = lapply(files_to_read, read.xlsx)
I am assuming your sub-directories have a similar name that can be identifiable?
Assumption, let's say:
your sub-directory starts with 'this' and
the files that are saved in sub-directory starts with the file name 'my_file'
the tab that you are trying to read in contains the word 'all'.
If the tab that you are reading in is located in same position (e.g. 2nd tab of every file) then it is easier as you can specify the sheet within read.xlsx as sheet = 2 but if this is not the case then one way you could do is by creating your own function that allows for this.
Then
library(openxlsx)
# getting the name of subdirectories starting with the word 'this'
my_dir <- list.files(pattern = "^this", full.names = TRUE)
# getting the name of the files starting with 'my_file', e.g. my_file.xlsx, my_file2.xlsx
my_files <- list.files(my_dir, pattern = "^my_file", full.names = TRUE)
my_read_xlsx <- function(files_to_read, sheets_to_read) {
# files to import
wb <- loadWorkbook(files_to_read)
# getting the sheet names that contain 'all' or any other strings that you specify
# ignore.case is there so that case is not sensitive when reading in excel tabs
ws <- names(wb)[grepl(sheets_to_read, names(wb), ignore.case = TRUE)]
# reading in the excel tab specified as above
xl_data <- read.xlsx(wb, ws)
return(xl_data)
}
# Using the function created above and import tabs containing 'all'
my_list <- lapply(my_files, FUN = function(x) my_read_xlsx(x, sheet = "ALL"))
# Converting the list into a data.frame
my_data <- do.call("rbind", my_list)
I want to create a program where I select files with a user defined prefix in list.files()
My folder will have files beginning with various characters. I want to define a variable or function at the beginning of the program which I can use in list.files in the program
List of file
MP201901 MP201902 MP201903 SG201901 SG201902 SG201903 XY201901 XY202001 XY202002
If I use
inpfiles1 <- list.files(path =Input, pattern = "*SG.*.csv", full.names = TRUE)
it gives correct output but I want to store the prefix somewhere so we can just change the prefix
Currently using code
A<-"SG"
inpfiles2 <- list.files(path =Input, pattern = "*A*.*.csv", full.names = TRUE)
but this is giving empty result
With your current code, R doesn't know that A is a variable name, and so it's ignoring your variable and literally using the letter A.
You can use paste0 instead:
A <- "SG"
pattern <- paste0(A, '.*.csv')
You have to concatenate the user-inputted pattern in A with your own suffix. I.e.
A <- "SG"
pattern <- paste0(A, ".*.csv")
inpfiles2 <- list.files(path=Input, pattern=pattern, full.names=TRUE)
I am looking for an elegant way to insert character (name) into directory and create .csv file. I found one possible solution, however I am looking another without "replacing" but "inserting" text between specific charaktects.
#lets start
df <-data.frame()
name <- c("John Johnson")
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
#how to insert "name" vector between "Desktop/" and "." to get:
dir <- c("C:/Users/uzytkownik/Desktop/John Johnson.csv")
write.csv(df, file=dir)
#???
#I found the answer but it is not very elegant in my opinion
library(qdapRegex)
dir2 <- c("C:/Users/uzytkownik/Desktop/ab.csv")
dir2<-rm_between(dir2,'a','b', replacement = name)
> dir2
[1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
write.csv(df, file=dir2)
I like sprintf syntax for "fill-in-the-blank" style string construction:
name <- c("John Johnson")
sprintf("C:/Users/uzytkownik/Desktop/%s.csv", name)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
Another option, if you can't put the %s in the directory string, is to use sub. This is replacing, but it replaces .csv with <name>.csv.
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
sub(".csv", paste0(name, ".csv"), dir, fixed = TRUE)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
This should get you what you need.
dir <- "C:/Users/uzytkownik/Desktop/.csv"
name <- "joe depp"
dirsplit <- strsplit(dir,"\\/\\.")
paste0(dirsplit[[1]][1],"/",name,".",dirsplit[[1]][2])
[1] "C:/Users/uzytkownik/Desktop/joe depp.csv"
I find that paste0() is the way to go, so long as you store your directory and extension separately:
path <- "some/path/"
file <- "file"
ext <- ".csv"
write.csv(myobj, file = paste0(path, file, ext))
For those unfamiliar, paste0() is shorthand for paste( , sep="").
Let’s suppose you have list with the desired names for some data structures you want to save, for instance:
names = [“file_1”, “file_2”, “file_3”]
Now, you want to update the path in which you are going to save your files adding the name plus the extension,
path = “/Users/Documents/Test_Folder/”
extension = “.csv”
A simple way to achieve it is using paste() to create the full path as input for write.csv() inside a lapply, as follows:
lapply(names, function(x) {
write.csv(x = data,
file = paste(path, x, extension))
}
)
The good thing of this approach is you can iterate on your list which contain the names of your files and the final path will be updated automatically. One possible extension is to define a list with extensions and update the path accordingly.
I have a folder full of .txt files that I want to loop through and compress into one data frame, but each .txt file is data for one subject and there are no columns in the text files that indicate subject number or time point in the study (e.g. 1-5). I need to add a line or two of code into my loop that looks for strings of four numbers (i.e. each file is labeled something like: "4325.5_ERN_No_Startle") and just creates a column with 4325 and another column with 5 that will appear for every data point for that subject until the loop gets to the next one. I have been looking for awhile but am still coming up empty, any suggestions?
I also have not quite gotten the loop to work:
path = "/Users/me/Desktop/Event Codes/ERN task/ERN text files transferred"
out.file <- ""
file <- ""
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- read.table(file.names[i],header=FALSE, fill = TRUE)
out.file <- rbind(out.file, file)
}
which runs okay until I get this error message part way through:
Error in read.table(file.names[i], header = FALSE, fill = TRUE) :
no lines available in input
Consider using regex to parse the file name for study period and subject, both of which are then binded in a lapply of list.files:
path = "path/to/text/files"
# ANY TXT FILE WITH PATTERN OF 4 DIGITS FOLLOWED BY A PERIOD AND ONE DIGIT
file.names <- list.files(path, pattern="*[0-9]{4}\\.[0-9]{1}.*txt", full.names=TRUE)
# IMPORT ALL FILES INTO A LIST OF DATAFRAMES AND BINDS THE REGEX EXTRACTS
dfList <- lapply(file.names, function(x) {
if (file.exists(x)) {
data.frame(period=regmatches(x, gregexpr('[0-9]{4}', x))[[1]],
subject=regmatches(x, gregexpr('\\.[0-9]{1}', x))[[1]],
read.table(x, header=FALSE, fill=TRUE),
stringsAsFactors = FALSE)
}
})
# COMBINE EACH DATA FRAME INTO ONE
df <- do.call(rbind, dfList)
# REMOVE PERIOD IN SUBJECT (NEEDED EARLIER FOR SPECIAL DIGIT)
df['subject'] <- sapply(df['subject'],
function(x) gsub("\\.", "", x))
You can try to use tryCatchwhich basically would give you a NULL instead of an error.
file <- tryCatch(read.table(file.names[i],header=FALSE, fill = TRUE), error=function(e) NULL))