Constructing a return in R - r

I have a function in R that I run on lists of flies. After the following code I apply the function to files and have R put the values in a new file. An equation in this function outputs a value based on water density and depth. I found that the top row of the depth is not always 1 in these files, but I need it to be for the equation. I would like to be able to output an "NA" message into the new sheet for when it is not.
stratindex=function(file){
ctd=read.csv(file,header=T)
x=ctd$Density..sigma.t..kg.m.3..
y=ctd$Depth..salt.water..m...lat...60
if(y[1]!=1){return(NA)} else
((x[30]-x[1]) / 29)
}
This all reads in fine. Then I get to the next part, where it takes the individual files and applies the function to them.
the.files <- choose.files()
index <- sapply(the.files, stratindex)
After this, R prompts:
"Error in if (y[1] != 1) { : missing value where TRUE/FALSE needed"
Where did I go wrong? What can I do to nest other stipulations if I find them? For instance, if the depth at row 30 is not 30.

Related

selecting small vif variables in r

I am trying to write an r function to select covariates with small VIF.
Here is my code:
ea=read.csv("ea.csv")
library(car)
fullm<-lm(appEuse~.,data=ea)
cov<-names(ea)
ncov<-length(cov)
vifs<-rep(NA,ncov)
include<-rep(NA,ncov)
for (i in 1:ncov){
vifs[i]<-vif(fullm)[i]
if (vifs[i]<10){
include[i]<-cov[i+1]
}
}
Error in if (vifs[i] < 10) { : missing value where TRUE/FALSE needed
I was trying to set for loop from 1 to ncov-1, then got argument is of length zero.
Is there a way to go around it?
Could be wrong, but looks like you're trying to loop an if statement over a list of NAs:
vifs<-rep(NA,ncov)
is later referenced at
if (vifs[i]<10){
Could this be your issue?

Accessing API with for-loop randomly has encoding error, which breaks loop in R

I'm trying to access an API from iNaturalist to download some citizen science data. I'm using the package rinat to get this done (see vignette). The loop below is, essentially, pulling all observations for one species, in one state, in one year iteratively on a per-month basis, then summing the number of observations for that year (input parameters subset from my actual script for convenience).
require(rinat)
state_ids <- c(18, 14)
bird_ids <- c(14886,1409)
months <- c(1:12)
final_nums <- vector()
for(i in 1:length(state_ids)){
total_count <- vector()
for(j in 1:length(months)){
monthly <- get_inat_obs(place_id=state_ids[i],
taxon_id=bird_ids[i],
year=2019,
month = months[j])
total_count <- append(total, length(monthly$scientific_name))
print(paste("done with month", months[j], "in state", state_ids[i]))
}
final_nums <- append(final_nums, sum(total_count))
print(paste("done with state", state_ids[i]))
}
Occasionally, and seemingly randomly, I get the following error:
No encoding supplied: defaulting to UTF-8.
Error in if (!x$headers$`content-type` == "text/csv; charset=utf-8") { :
argument is of length zero
This ends up breaking the loop or makes the loop run without actually pulling any real data. Is this an issue with my script, or the API, or something else? I've tried manually supplying encoding information to the get_inat_obs() function, but it doesn't accept that as an argument. Thank you in advance!
I don't believe this is an error in your script. The issue is with the api most likely.
the error argument is of length zero is a common error when you try to make a comparison that has no length. For example:
if(logical(0) == "TEST") print("WORKED!!")
#Error in if (logical(0) == "TEST") print("WORKED!!") :
# argument is of length zero
I did some a few greps on their source code to see where this if statement is and it seems to be within inat_handle line 211 in get_inate_obs.R
This would suggest that the authors did not expect for
!x$headers$`content-type` == 'text/csv; charset=utf-8'
to evaluate to logical(0), but more specifically
x$headers$`content-type`
to be NULL.
I would suggest making a bug report on their GitHub and recommend they change the specified line to:
if(is.null(x$headers$`content-type`) || !x$headers$`content-type` == 'text/csv; charset=utf-8'){
Suggesting a bug is usually more well received if you have a reproducible example.
Also, you could totally make this change yourself locally by cloning out the git repository, editing the file, rebuild the package, and then confirm if you no longer get an error in your code.

How do I create a loop function to apply acoustic indices from "soundecology" to specific sections of .wav files using R

I have a large quantity of .wav files that I need to analyze using the acoustic indices from the "soundecology" package in R. However, the recordings do not have uniform start times and I need to analyze specific periods of time within the files. I want to create a function and loop for automating the process.
I have created a spread sheet for each folder of recordings (each folder is a different location) that lays out the recording and the times within each recording that I need to analyze. Basically, a row contains: the sound file name, the time when the sample should start (eg. 09:00:00, the number of seconds from the start of the file that that time occurs, and the munber of seconds from the start time of the file that the end of the sample should occur.
That data looks like this:
Spread sheet of data
I am using the package "tuneR" and "warbleR" to select the specific portion of a sound file that I want to analyze. Here is the the code and the output that I would like to loop across all the sound files:
wavrow1 <-read_wave(mvb$sound.files[1], from = mvb$start[1], to = mvb$end[1])
wavrow1.aci <- acoustic_complexity(wavrow1, j=10)
which yeilds
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 934.568
However, when I put this into a function in order to then put it into a loop I get a different output.
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
row1.test <- acianalyzeFUN(mvb, 1)
This gives the output:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Which is different.
So I need to fix this function and put it into a loop so that I can apply it across all the files and save the results into a data frame or ultimately another spread sheet.
I was thinking a loop like the following might work but I am also getting errors with it:
output <- vector("logical", length(97))
for (i in seq_along(mvb$sound.files)) {
output[[i]] <- acianalyzeFUN(mvb, i)
}
Which returns this error:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Error in output[[i]] <- acianalyzeFUN(mvb, i) :
more elements supplied than there are to replace
Thanks for any help and advice on this. Please let me know if there are any other pieces of information that would be helpful.
the read_wave function takes following arguments :
read_wave(X, index, from = X$start[index], to = X$end[index], channel = NULL,
header = FALSE, path = NULL)
In the manual test, you specify from = mvb$start[1], to = mvb$end[1]
In the function you created, you dont specify the arguments :
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
so that mvb$start[i] gets affected to index and mvb$end[i] to from.
You should write:
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], from = mvb$start[i], to = mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
This should explain the difference you observe.
Regarding the error, you create a vector of logical to collect the result, but acianalyzeFUN returns nothing : it just sets two variables r and soundfileaci without returning anything.

R / Rstudio : is there a watch function to continuously monitor some values?

I have a lot of different operations running on quite a big dataframe. It starts to be a pain for maintenance, especially with some data being improperly formatted, and I'm looking at some options to make my life easier.
The problem is that at one point in the flow of operations NAs are introduced in several lines, including the id (certainly due to some bad subsetting). Now I cannot find the culprit easily because I have each time to str() it, or to view() it in Rstudio... This takes time and I already did it once without finding the bad operation...
So I'm curious if there is some package answering to this problem or a way to program something "daemon-like", to pop up a warning message when a specific value appears.
A while loop doesn't help, because it evaluates all the statements, and of course at one point the condition is not true and it doesn't print when it stops ...
while(nrow(df[is.na(df$id),]) > 0){
statements OK
breaking statement
other OK statements
}
I'll look for other options but I wanted to ask before...
EDIT : thanks for the useful comments, I'll definitely will look more into those functions. However I tried also to build myself a watch function (see my answer).
Ok, I guess I have finally built something quite like it :
This is a function to source a file line per line until a given condition is met :
watchIt <- function(file,watchexpression,startwatchline){
line <- 1
sourceList <- scan(file = "source_test.R", what="character", sep="\n", blank.lines.skip = FALSE)
maxLines <- length(sourceList)
while(startwatchline > line && maxLines >= line){
cat("l")
eval(parse(text=sourceList[line]))
line <- line+1
cat(line)
cat(" ")
}
while(eval(parse(text=watchexpression)) == FALSE && maxLines >= line){
cat(" L")
eval(parse(text=sourceList[line]))
line <- line+1
cat(line)
cat(" ")
}
if(maxLines <= line) {
cat("End of file reached without condition getting TRUE")
}
else{
cat("Condition evaluated to TRUE on line :")
cat(line)
cat("\n")
cat(sourceList[line])
}
}
So this is how I use it :
watchIt("source_test.R","nrow(df[is.na(df$id),]) > 0",10)
This puts "source_test.R" in a list, each line a new list item, and, starting from line 10, I test if the resultant dataframe as NAs in the id field. The execution stops either when the condition evaluates TRUE or when the end of the list items is reached.
Still I'm waiting for some other/better answers... Also, this is kind of my fourth function I managed to create in R, so I guess there might be ameliorations to be made to it...

How to automatically skip over errors in r

I'm trying to create new variables from some output from a number of two-piece segmented regression models that I'm running. The code for my new variable is:
initial1=c(fmod$psi[1],fmod2$psi[1], fmod3$psi[1], fmod4$psi[1], fmod5$psi[1], fmod6$psi[1], fmod7$psi[1], fmod8$psi[1], fmod9$psi[1], fmod10$psi[1], fmod11$psi[1],fmod12$psi[1], fmod13$psi[1], fmod14$psi[1], fmod15$psi[1], fmod16$psi[1], fmod17$psi[1], fmod18$psi[1], fmod19$psi[1], fmod20$psi[1], fmod21$psi[1],fmod22$psi[1], fmod23$psi[1], fmod24$psi[1], fmod25$psi[1], fmod26$psi[1], fmod27$psi[1], fmod28$psi[1], fmod29$psi[1], fmod30$psi[1], fmod31$psi[1],fmod32$psi[1], fmod33$psi[1], fmod34$psi[1], fmod35$psi[1], fmod36$psi[1], fmod37$psi[1], fmod38$psi[1], fmod39$psi[1], fmod40$psi[1], fmod41$psi[1],fmod42$psi[1], fmod43$psi[1], fmod44$psi[1], fmod45$psi[1], fmod46$psi[1], fmod47$psi[1], fmod48$psi[1], fmod49$psi[1], fmod50$psi[1], fmod51$psi[1],fmod52$psi[1], fmod53$psi[1], fmod54$psi[1], fmod55$psi[1], fmod56$psi[1], fmod57$psi[1], fmod58$psi[1], fmod59$psi[1], fmod60$psi[1], fmod61$psi[1], fmod62$psi[1], fmod63$psi[1], fmod64$psi[1])
where fmod, fmod2, fmod3, etc. are my regression models. Some of the regression models have errors and do not produce output (because the initial breakpoint estimates are too close to each other). Because of that, when I try to make my 'initial1' variable, I get error messages such as:
Error: object 'fmod12' not found
and the 'initial' variable is not created. I would like these models that don't have output associated with them to be automatically skipped over, or to be replaced with 'NA'. Does anyone know how to do this?
You're creating many different models and giving them numbered names. Why not put them in a list instead?
At model creation time:
for (i in 1:lots) fmod[[i]] <- my_segmented_reg(...)
where my_segmented_reg presumably returns either a model, or NULL or NA.
Then you have a list fmod which you can start using straight away.

Resources