make file.exists() case insensitive - r

I have a line of code in my script that checks if a file exists (actually, many files, this one line gets looped for a bunch of different files):
file.exists(Sys.glob(file.path(getwd(), "files", "*name*")))
This looks for any file in the directory /files/ that has "name" in it, e.g. "filename.csv". However, some of my files are named "fileName.csv" or "thisfileNAME.csv". They do not get recognized. How can i make file.exists treat this check in a case insensitive way?
In my other code i usually make any imported names or lists immediately lowercase with the tolower function. But I don't see any option to include that in the file.exists function.

Suggested solution using list.files:
If we have many files we might want to do this only once, otherwise we can put in in the function (and pass path_to_root_directory instead of found_files to the function)
found_files <- list.files(path_to_root_directory, recursive=FALSE)
Behaviour as file.exists (return value is boolean):
fileExIsTs <- function(file_path, found_files) {
return(tolower(file_path) %in% tolower(found_files))
}
Return value is file with spelling as found in directory or character(0) if no match:
fileExIsTs <- function(file_path, found_files) {
return(found_files[tolower(found_files) %in% tolower(file_path)])
}
Edit:
New solution to fit new requirements:
keywordExists <- function(keyword, found_files) {
return(any(grepl(keyword, found_files, ignore.case=TRUE)))
}
keywordExists("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] TRUE
Or
Return value are files with spelling as found in directory or character(0) if no match:
keywordExists2 <- function(file_path, found_files) {
return(found_files[grepl(keyword, found_files, ignore.case=TRUE)])
}
keywordExists2("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] "filename.csv" "morefilenames.csv"

The following should return a 1 if the filename matches in any case and a 0 if it does not.
max(grepl("*name*",list.files()),ignore.case=T)

Related

Create a series of new folders in for loop in R

I have create a small script that passes a vector through a loop. In this loop I am using an if else statement to check if folder exists and if not to create the folder. However, I am getting error: Error in file.exists(i) : invalid 'file' argument. This has to due with file.exist(). I dont understand why this isnt ok. I check the man using help. Seems like this should be working.
folders<- c("RawData", "Output", "BCV", "DEplots", "DEtables", "PathwayOuts", "VolcanoPLots")
for(i in 1:length(folders)){
if (file.exists(i)){
cat(paste0(i, "already exists"))
} else {
cat(paste0(i, "does not exists"))
dir.create(i)
}
}
You are looping over an index (that is, 1:length(folders) is just the vector 1:7, not the values of the folders vector itself. The easiest solution is to loop over the vector itself:
for (i in folders) {
Or, if you still want to loop over the index:
for (i in 1:length(folders)) {
if (file.exists(folders[i])){
cat(paste0(folders[i], "already exists"))
}
else {
cat(paste0(folders[i], "does not exists"))
dir.create(folders[i])
}
}
A quick tip: if you are debugging a for-loop, the place to start is to add print(i) at the start of the loop. You would have immediately seen the problem: i was an integer, not the first value of the vector.

How to Change Part of URL With a Function Input in R?

Let's say we have a url in R like:
url <- 'http://google.com/maps'
And the objective is to change the 'maps' part of it. I'd like to write a function where basically I can just input something (e.g. 'maps', 'images'), etc., and the relevant part of the url will automatically change to reflect what I'm typing in.
Is there a way to do this in R, where part of the url can be changed by typing something into a function?
Thanks!
You have to store the part you type into a variable and paste this to the base URL:
base_url <- "http://google.com/"
your_extension <- "maps"
paste0(base_url, your_extension)
[1] "http://google.com/maps"
If you have to start with a fixed URL, use sub to replace the last part:
sub("\\w+$", 'foo', url)
# "http://google.com/foo"
You can use dirname to remove the last part of the URL and paste it with additional custom string.
change_url_part <- function(base_url, string) {
paste(dirname(base_url), string, sep = '/')
}
change_url_part('http://google.com/maps', 'images')
#[1] "http://google.com/images"

Converting the argument name of a function into string

I have developed a function which will take a list of files and will do some statistical tests and will generate a excel file. In the last line of function (return object) I want the function will return a excel file with same names as input file names. In my example it will give list_file.xlsx. IF I enter another file let's say tslist_file it should automatically return tslist_file.xlsx. The function is properly working. Suggest me how I code last line of the function so that I can generalise it.
newey<-function(list_files){
tsmom<-do.call(cbind,lapply(list_files,function(x) read_excel(x)[,2]))
tsmom<-xts(tsmom[,1:5],order.by = seq(as.Date("2005-02-01"),length=183,by="months")-1)
names(tsmom)<-c("tsmom121","tsmom123","tsmom126","tsmom129","tsmom1212")
## newey west
newey_west<-function(x){
model<-lm(x~1)
newey_west<-coeftest(model,vcov=NeweyWest(model,verbose=T))
newey_west[c(1,3,4)]
}
## running newey west
cs_nw_full<-do.call(cbind,lapply(tsmom,newey_west))
library(gtools)
p_values<-cs_nw_full[3,]
cs_nw_full[2,]<-paste0(cs_nw_full[2,],stars.pval(p_values))
write.xlsx(cs_nw_full,"list_file.xlsx")
}
Try:
write.xlsx(cs_nw_full, paste0(eval(substitute(list_files)), ".xlsx"))
Edit:
#jeetkamal is absolutely right - you need to use
write.xlsx(cs_nw_full, paste0(deparse(substitute(list_files)), ".xlsx"))
here.
I apologize for the mistake. eval wold only work if list_files was e.g. the name of a file, not a list object.

Loop works outside function but in functions it doesn't.

Been going around for hours with this. My 1st question online on R. Trying to creat a function that contains a loop. The function takes a vector that the user submits like in pollutantmean(4:6) and then it loads a bunch of csv files (in the directory mentioned) and binds them. What is strange (to me) is that if I assign the variable id and then run the loop without using a function, it works! When I put it inside a function so that the user can supply the id vector then it does nothing. Can someone help ? thank you!!!
pollutantmean<-function(id=1:332)
{
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
}
Your function is missing a return value. (#Roland)
pollutantmean<-function(id=1:332) {
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
return(allfiles)
}
Edit:
Your mistake was that you did not specify in your function what you want to get out from the function. In R, you create objects inside of function (you could imagine it as different environment) and then specify which object you want it to return.
With my comment about accepting my answer, I meant this: (...To mark an answer as accepted, click on the check mark beside the answer to toggle it from greyed out to filled in...).
Consider even an lapply and do.call which would not need return being last line of function:
pollutantmean <- function(id=1:332) {
id <- str_pad(id,3,pad = "0")
direct_files <- paste0("/Users/ped/Documents/LearningR/", id, ".csv")
# READ FILES INTO LIST AND ROW BIND
allfiles <- do.call(rbind, lapply(direct_files, read.csv))
}
ok, I got it. I was expecting the files that are built to be actually created and show up in the environment of R. But for some reason they don't. But R still does all the calculations. Thanks lot for the replies!!!!
pollutantmean<-function(directory,pollutant,id)
{
#read files
allfiles<-data.frame()
id2<-str_pad(id,3,pad = "0")
direct<-paste("/Users/pedroalbuquerque/Documents/Learning R/",directory,sep="")
for (i in id2) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
#averaging polutants
mean(allfiles[,pollutant],na.rm = TRUE)
}
pollutantmean("specdata","nitrate",23:35)

To list files based on unique part of the filename in Unix

I've a directory with below files in it -
111-xxx-typec_2015-10-13.csv.gz
111-xxx-typec_2015-10-14.csv.gz
222-yyy-typec_2015-10-13.csv.gz
222-yyy-typec_2015-10-14.csv.gz
333-zzz-typec_2015-10-13.csv.gz
333-zzz-typec_2015-10-14.csv.gz
444-ppp-typec_2015-10-13.csv.gz
444-ppp-typec_2015-10-14.csv.gz
444-ppp-typec_2015-10-15.csv.gz
I want to see the oldest file of each type (xxx, yyy, etc) only, i.e. the output should be,
111-xxx-typec_2015-10-13.csv.gz
222-yyy-typec_2015-10-13.csv.gz
333-zzz-typec_2015-10-13.csv.gz
444-ppp-typec_2015-10-13.csv.gz
Is there a way to do this?
What you could do is do an 'ls', pipe it through an 'AWK' script where you match the 'type', and check it against a dictionary. If it is in the list, ignore, otherwise print and add to list.
Something like this nawk script:
{
match($0, /(.*)-typec/, m);
if (matches[m[1]] == "")
{
print ;
matches[m[1]] = m[1];
}
}

Resources