Comparing files in R using for loop when there is missing files in a series - r

I've used below code which successfully compares two text files and logs the difference in a log file using for loop. The file names are in series, for example, File_1, File_2 etc.. but when there is a missing file in the series, the code stops the execution with error - No such file or directory.
Then I've used if condition to check the file existence but I am getting a below-mentioned error.
Please help me to skip comparison for a nonexisting file.
Code:
for(i in 1:length){
prod_file_res_name <- sprintf("path/Query_Prod_%s.txt", i)
beta_file_res_name <- sprintf("path/Query_Beta_%s.txt", i)
if (exists('prod_file_res_name' && 'beta_file_res_name')){
res <- tools::Rdiff(prod_file_res_name, beta_file_res_name, Log = TRUE)
if(res[2] != "character(0)"){
write(toString(res[2]), file = "LogFile.txt",append=TRUE)
}
else{
elsevar <- sprintf("No difference found between prod and beta responses for query %s", i)
print(elsevar)
}
}
}
Error:
Error in "prod_file_res_name" && "beta_file_res_name" :
invalid 'x' type in 'x && y'

exists checks if the given R objects (only takes R objects as input --> this caused your first error) exist in your environment. You initiate the R objects prod_file_res_name and beta_file_res_name before you check if they exist, so the exists call will always return TRUE. What you are looking for is the file.exists function which checks if the file does exist in your working directory:
file.exists(prod_file_res_name) && file.exists(beta_file_res_name)
The second error was caused by the R objects existing but not the files you want to check.

Since exists() looks up single objects (from Documentation):
Is an Object Defined?
Description
Look for an R object of the given name and possibly return it
'prod_file_res_name' && 'beta_file_res_name' doesn't work.
Rewrite
exists("prod_file_res_name" && "beta_file_res_name")
to:
exists("prod_file_res_name") && exists("beta_file_res_name")

Related

Create a series of new folders in for loop in R

I have create a small script that passes a vector through a loop. In this loop I am using an if else statement to check if folder exists and if not to create the folder. However, I am getting error: Error in file.exists(i) : invalid 'file' argument. This has to due with file.exist(). I dont understand why this isnt ok. I check the man using help. Seems like this should be working.
folders<- c("RawData", "Output", "BCV", "DEplots", "DEtables", "PathwayOuts", "VolcanoPLots")
for(i in 1:length(folders)){
if (file.exists(i)){
cat(paste0(i, "already exists"))
} else {
cat(paste0(i, "does not exists"))
dir.create(i)
}
}
You are looping over an index (that is, 1:length(folders) is just the vector 1:7, not the values of the folders vector itself. The easiest solution is to loop over the vector itself:
for (i in folders) {
Or, if you still want to loop over the index:
for (i in 1:length(folders)) {
if (file.exists(folders[i])){
cat(paste0(folders[i], "already exists"))
}
else {
cat(paste0(folders[i], "does not exists"))
dir.create(folders[i])
}
}
A quick tip: if you are debugging a for-loop, the place to start is to add print(i) at the start of the loop. You would have immediately seen the problem: i was an integer, not the first value of the vector.

make file.exists() case insensitive

I have a line of code in my script that checks if a file exists (actually, many files, this one line gets looped for a bunch of different files):
file.exists(Sys.glob(file.path(getwd(), "files", "*name*")))
This looks for any file in the directory /files/ that has "name" in it, e.g. "filename.csv". However, some of my files are named "fileName.csv" or "thisfileNAME.csv". They do not get recognized. How can i make file.exists treat this check in a case insensitive way?
In my other code i usually make any imported names or lists immediately lowercase with the tolower function. But I don't see any option to include that in the file.exists function.
Suggested solution using list.files:
If we have many files we might want to do this only once, otherwise we can put in in the function (and pass path_to_root_directory instead of found_files to the function)
found_files <- list.files(path_to_root_directory, recursive=FALSE)
Behaviour as file.exists (return value is boolean):
fileExIsTs <- function(file_path, found_files) {
return(tolower(file_path) %in% tolower(found_files))
}
Return value is file with spelling as found in directory or character(0) if no match:
fileExIsTs <- function(file_path, found_files) {
return(found_files[tolower(found_files) %in% tolower(file_path)])
}
Edit:
New solution to fit new requirements:
keywordExists <- function(keyword, found_files) {
return(any(grepl(keyword, found_files, ignore.case=TRUE)))
}
keywordExists("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] TRUE
Or
Return value are files with spelling as found in directory or character(0) if no match:
keywordExists2 <- function(file_path, found_files) {
return(found_files[grepl(keyword, found_files, ignore.case=TRUE)])
}
keywordExists2("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] "filename.csv" "morefilenames.csv"
The following should return a 1 if the filename matches in any case and a 0 if it does not.
max(grepl("*name*",list.files()),ignore.case=T)

Workaround for case-sensitive input to dir

I am using Octave 5.1.0 on Windows 10 (x64). I am parsing a series of directories looking for an Excel spreadsheet in each directory with "logbook" in its filename. The problem is these files are created by hand and the filenaming isn't consistent: sometimes it's "LogBook", other times it's "logbook", etc...
It looks like the string passed as input to the dir function is case-sensitive so if I don't have the correct case, dir returns an empty struct. Currently, I am using the following workaround, but I wondered if there was a better way of doing this (for a start I haven't captured all possible upper/lower case combinations):
logbook = dir('*LogBook.xls*');
if isempty(logbook)
logbook = dir('*logbook.xls*');
if isempty(logbook)
logbook = dir('*Logbook.xls*');
if isempty(logbook)
logbook = dir('*logBook.xls*');
if isempty(logbook)
error(['Could not find logbook spreadsheet in ' dir_name '.'])
end
end
end
end
You need to get the list of filenames (either via readdir, dir, ls), and then search for the string in that list. If you use readdir, it can be done like this:
[files, err, msg] = readdir ('.'); # read current directory
if (err != 0)
error ("failed to readdir (error code %d): %s", msg);
endif
logbook_indices = find (cellfun (#any, regexpi (files, 'logbook'));
logbook_filenames = files(logbook_indices);
A much less standard approach could be:
glob ('*[lL][oO][gG][bB][oO][kK]*')

R error: dims do not match the length of an object

I am currently trying to run some code (if you need to know the purpose to help me, ask me, but I'm trying to keep this question short). This is the code:
par<-c(a=.5,b=rep(1.3,4))
est<-rep(TRUE,length(par))
ncat<-5
Theta<-matrix(c(-6,-5.8,-5.6,-5.4,-5.2,-5,-4.8,-4.6,-4.4,-4.2,-4,-3.8,-3.6,-3.4,-3.2,-3,-2.8,-2.6,-2.4,-2.2,-2,-1.8,-1.6,-1.4,-1.2,-1,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,1,1.2,1.4,1.6,1.8,2,2.2,2.4,2.6,2.8,3,3.2,3.4,3.6,3.8,4,4.2,4.4,4.6,4.8,5,5.2,5.4,5.6,5.8,6))
p.grm<-function(par,Theta,ncat){
a<-par[1]
b<-par[2:length(par)]
z<-matrix(0,nrow(Theta),ncat)
y<-matrix(0,nrow(Theta),ncat)
y[,1]<-1
for(i in 1:ncat-1){
y[,i+1]<-(exp(a*(Theta-b[i])))/(1+exp(a*(Theta-b[i])))
}
for(i in 1:ncat-1){
z[,i]<-y[,i]-y[,i+1]
}
z[,ncat]<-y[,ncat]
z
}
However, when I try to run the code:
p.grm(par=par,Theta=Theta,ncat=ncat)
I get the following error:
Error: dims [product 61] do not match the length of object [0]
Traceback tells me that the error is occurring in the first for loop in the line:
y[,i+1]<-(exp(a*(Theta-b[i])))/(1+exp(a*(Theta-b[i])))
Could someone point me to what I'm doing wrong? When I try to run this code step by step outside of the custom p.grm function, everything seems to work fine.
It is a common mistake. When you write the for loop and you want it from 1 to ncat -1 remember to write it as for (i in 1:(ncat-1)) instead of for(i in 1:ncat-1) they are completly different.
You may also add to the function something to return return(z). Here it is the corrected code:
par<-c(a=.5,b=rep(1.3,4))
est<-rep(TRUE,length(par))
ncat<-5
Theta<-matrix(c(-6,-5.8,-5.6,-5.4,-5.2,-5,-4.8,-4.6,-4.4,-4.2,-4,-3.8,-3.6,-3.4,-3.2,-3,-2.8,-2.6,-2.4,-2.2,-2,-1.8,-1.6,-1.4,-1.2,-1,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,1,1.2,1.4,1.6,1.8,2,2.2,2.4,2.6,2.8,3,3.2,3.4,3.6,3.8,4,4.2,4.4,4.6,4.8,5,5.2,5.4,5.6,5.8,6))
p.grm<-function(par,Theta,ncat){
a<-par[1]
b<-par[2:length(par)]
z<-matrix(0,nrow(Theta),ncat)
y<-matrix(0,nrow(Theta),ncat)
y[,1]<-1
for(i in 1:(ncat-1)){
y[,i+1]<-(exp(a*(Theta-b[i])))/(1+exp(a*(Theta-b[i])))
}
for(i in 1:(ncat-1)){
z[,i]<-y[,i]-y[,i+1]
}
z[,ncat]<-y[,ncat]
return(z)
}
p.grm(par=par,Theta=Theta,ncat=ncat)

Need to Print Value of a Variable using Paste in R

I am trying to create a data frame of various error messages based on Data to be cross checked between two dataframes and storing the message in a vector in an iterative manner . I am using the following snippet for this purpose :
> for(j in 1:nrow(MySQL_Data)){ date_mysql=
> paste("MySQL_Data[",j,",1]") date_red= paste("RED_Data[",j,",1]")
> body= c() if(!date_mysql == date_red) {
> body<- append(body,paste("'There is data missing for date",date_mysql,"in",table2)) }else {
> NULL }}
My table2 variable prints as MYSQL_Data[2,1] instead of the actual value of the variable which is a date
Following is the Output :
"'There is data missing for date MySQL_Data[ 2 ,1] in Dream11_UserRegistration"
Can someone help me with the error that I am committing here..
Thanks in Advance !
Your use of paste in the definitions of data_mysql and data_red makes no sense. I’m assuming that what you actually want is this:
data_mysql = MySQL_Data[j, 1]
data_red = RED_Data[j, i]
Furthermore, you’re resetting body in every loop iteration so it will only ever hold a single element.

Resources