How to create external folder in a loop in R? - r

Ok, actually I have a loop of 50 iterations and then I need an output file for each of these iterations. That happens is that with my current code I only obtain the output file corresponding to the last iteration, so could you give me a code to let me get all the files in mi current folder??. Thank you enter image description here
part[] is a vector of length 50 (really a list but it does not matter

Use
for(i in 1:(length(vec)-1)){
write.table(part[[i]],paste(i,"txt",sep = "."))
}

How about using list.files()
That lists all the files in the current directory. or you can specify a directory as the first element of the function.

Related

How to loop first n folders and their subfolders to read the file in R?

I have a folder with lets say 50 folders, and each of them contain some number of subfolders (only one depth), I need to loop through the first 10 folders, and then loop through every subfolders and read the single file that they contain. I tried to loop using list.dirs() but it gives all the subfolders as well.
A general approach would look something like this:
First, use top_folders = list.dirs(recursive = FALSE) to get the main folders.
To get the 10 first, use e.g. head(top_folders, n = 10).
Pass this to list.files(), with recursive = TRUE).

R: Deleting Files in R based on their Names

I am working with the R programming language.
I found the following related question Stackoverflow (how to delete a file with R?) which shows you how to delete a file having a specific name from the working directory:
#Define the file name that will be deleted
fn <- "foo.txt"
#Check its existence
if (file.exists(fn)) {
#Delete file if it exists
file.remove(fn)
}
[1] TRUE
My Question: Is it possible to delete files based on whether the file name contains a specific combination of letters (i.e. LIKE 'fo%' )? This way, all files in the working directory starting with the letters "fo" will be deleted.
What I tried so far:
I thought of a way where I could first create a list of all files in the working directory that I want to delete based on their names:
# create list of all files in working directory
a = getwd()
path.to.csv <- a
files<-list.files(path.to.csv)
my_list = print(files) ## list all files in path
#identify files that match the condition
to_be_deleted = my_list[grepl("fo",unlist(my_list))]
Then, I tried to deleted this file using the command used earlier:
if (file.exists(to_be_deleted)) {
#Delete file if it exists
file.remove(to_be_deleted)
}
This returned the following message:
[1] TRUE TRUE TRUE TRUE TRUE TRUE
Warning message:
In if (file.exists(to_be_deleted)) { :
the condition has length > 1 and only the first element will be used
Does anyone know if I have done this correctly? Suppose if there were multiple files in the working directory where the names of these files started with "fo" - would all of these files have been deleted? Or only the first file in this list?
Can someone please show me how to do this correctly?
Thanks!
file.remove accepts a list of file to delete.
Regarding file.exists, it also accepts a list, but it will return a list of logical values, one for each file. And this won't work with if, which requires only one logical value.
However, you don't need to check the existence of files that you get from list.files: they obviously exist.
So, the simplest is to remove the if test and just call file.remove:
files <- list.files(path, pattern = "fo")
to_be_deleted <- grep("fo", files, value = T)
file.remove(to_be_deleted)
Or even simpler:
to_be_deleted <- list.files(path, pattern = "fo")
file.remove(to_be_deleted)
A few notes however:
Here you don't know in advance if you have the right to delete these
files.
You don't know either if the names are indeed files, or
directory (or something else). It's tempting to believe that
file.exists answer the second question, that is, it might tell you
that a name is a real file, but actually it does not:
file.exists(path) returns TRUE also when path is a directory.
However you can detect directories with dir.exists(path). Depending
on your specific case, it may or may not be necessary to check for
this (for instance, if you know the pattern passed to grep always
filters files, it's ok).

list.files taking account of list number in R?

I have a large number of files (>50,000) to analyze. I can get a list of these files with;
myfiles <- list.files(pattern="*output*")
and then loop with
for (file in myfiles) {
"code"
}
The problem is that sometimes my system freezes due to RAM overload so the only option left is to kill the rsession and restart the loop again with same files. How can I modify the list.files call, so that I can only select a certain number of files like 100:200 or 3500:5000 via list.files. Basically, I would like to skip the files which are already analyzed before the last system freeze.
Any help would be appreciated.
Thanks.
The 'myfiles' objects is a vector. So, we can create the sequence (:) of positions to subset the object when we loop
for (file in myfiles[100:200]) {
...code...
}
Also, the files can be split into a list with each element of length 100
lst1 <- split(myfiles, as.integer(gl(length(myfiles), 100, length(myfiles))))
Then, an idea is to loop in parallel or sequentially, remove (rm) the temporary object, call gc() to release memory

How could I use lapply with two parameters?

I have a list with the names of my files and an array with some integers i want to use every time I open a file. When I open file1 I want it to use numbers[1], when I open file 14 I want it to open numbers[14].
I have tried to create n inside the function in which I will use lapply, but not knowing how to have an index to know which file I am reading, I discarded it. Then I tried to use mapply but it creates twice as many elements as I want.
I want to execute my function and every time I use the n index element of my fnames and the n index of my array numbers. I want you to save the result in a list.
My function opens a file and calculates the data in that file based on the value of n corresponding to that file (in a while). That's why I need to use the same index for fnames () as for numbers [].
The function returns a dataframe, with lapply intended to enter the dataframe result of each file with its corresponding number in a list.
In this way I create the list of names
x<-list.files(pattern=".txt")
This is the array of numbers:
n<-c(4,4,12,6,3,6,8,32,4,4,9,5,5,6,8,3,6,7,3,6,5,3,5)
I do not know how to execute the function with those two parameters to get a list with all the results as if I were running lapply with it.

gzgrep help in multiple large archives - solaris

In solaris- i need to perform a gzgrep of archives. But i need to filter so not searching ALL the archives- maybe just files with '09.30-12' in the name.. then i want to search IN that particular file or files for a particular expression. I have this close.. but it takes WAY too long as its searching unnecessary files first and matching on those.. then moving onto the October archives and finding what i need in them. I need to basically search any files in which filename contains 'x' then look in those files for text 'y' and output to > fileoutput. Perhaps just change the *.gz to just match on a set of files?? i cannot figure out how though. Any help is MUCH appreciated.
Something like this works- but i get way too much output and it takes way too long.
gzgrep 'firstexpression' *.gz > /fileoutput.file
maybe just files with '09.30-12' in the name..
You could say:
gzgrep 'firstexpression' *09.30-12*.gz > fileoutput.file
or
gzgrep pattern_to_search *filename_pattern*.gz > outfile

Resources