selecting small vif variables in r - r

I am trying to write an r function to select covariates with small VIF.
Here is my code:
ea=read.csv("ea.csv")
library(car)
fullm<-lm(appEuse~.,data=ea)
cov<-names(ea)
ncov<-length(cov)
vifs<-rep(NA,ncov)
include<-rep(NA,ncov)
for (i in 1:ncov){
vifs[i]<-vif(fullm)[i]
if (vifs[i]<10){
include[i]<-cov[i+1]
}
}
Error in if (vifs[i] < 10) { : missing value where TRUE/FALSE needed
I was trying to set for loop from 1 to ncov-1, then got argument is of length zero.
Is there a way to go around it?

Could be wrong, but looks like you're trying to loop an if statement over a list of NAs:
vifs<-rep(NA,ncov)
is later referenced at
if (vifs[i]<10){
Could this be your issue?

Related

How do I create a loop function to apply acoustic indices from "soundecology" to specific sections of .wav files using R

I have a large quantity of .wav files that I need to analyze using the acoustic indices from the "soundecology" package in R. However, the recordings do not have uniform start times and I need to analyze specific periods of time within the files. I want to create a function and loop for automating the process.
I have created a spread sheet for each folder of recordings (each folder is a different location) that lays out the recording and the times within each recording that I need to analyze. Basically, a row contains: the sound file name, the time when the sample should start (eg. 09:00:00, the number of seconds from the start of the file that that time occurs, and the munber of seconds from the start time of the file that the end of the sample should occur.
That data looks like this:
Spread sheet of data
I am using the package "tuneR" and "warbleR" to select the specific portion of a sound file that I want to analyze. Here is the the code and the output that I would like to loop across all the sound files:
wavrow1 <-read_wave(mvb$sound.files[1], from = mvb$start[1], to = mvb$end[1])
wavrow1.aci <- acoustic_complexity(wavrow1, j=10)
which yeilds
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 934.568
However, when I put this into a function in order to then put it into a loop I get a different output.
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
row1.test <- acianalyzeFUN(mvb, 1)
This gives the output:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Which is different.
So I need to fix this function and put it into a loop so that I can apply it across all the files and save the results into a data frame or ultimately another spread sheet.
I was thinking a loop like the following might work but I am also getting errors with it:
output <- vector("logical", length(97))
for (i in seq_along(mvb$sound.files)) {
output[[i]] <- acianalyzeFUN(mvb, i)
}
Which returns this error:
max_freq not set, using value of: 22050
min_freq not set, using value of: 0
This is a mono file.
Calculating index. Please wait...
Acoustic Complexity Index (total): 19183.03
Acoustic Complexity Index (by minute): 931.98
Error in output[[i]] <- acianalyzeFUN(mvb, i) :
more elements supplied than there are to replace
Thanks for any help and advice on this. Please let me know if there are any other pieces of information that would be helpful.
the read_wave function takes following arguments :
read_wave(X, index, from = X$start[index], to = X$end[index], channel = NULL,
header = FALSE, path = NULL)
In the manual test, you specify from = mvb$start[1], to = mvb$end[1]
In the function you created, you dont specify the arguments :
r <- read_wave(mvb$sound.files[i], mvb$start[i], mvb$end[i])
so that mvb$start[i] gets affected to index and mvb$end[i] to from.
You should write:
acianalyzeFUN <- function(mvb, i){
r <- read_wave(mvb$sound.files[i], from = mvb$start[i], to = mvb$end[i])
soundfile.aci <- acoustic_complexity(r, j=10)
}
This should explain the difference you observe.
Regarding the error, you create a vector of logical to collect the result, but acianalyzeFUN returns nothing : it just sets two variables r and soundfileaci without returning anything.

Resolving "Error: subscript out of bounds" in a code loop

I am trying to run an R loop on an individual based model. This includes two lists referring to grid cells, which I originally ran into difficulties with because they returned the error: Error: (list) object cannot be coerced to type 'double'. I think I have resolved this error by typing "as.numeric(unlist(x))."
Example from code:
List 1:
dredg<-list(c(943,944,945,946,947,948,949...1744,1745))
dredging<-as.numeric(unlist(dredg)). I refer to 'dredging' in my code, not 'dredg.'
List 2:
nodredg<-list(c(612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631))
dcells<-as.numeric(unlist(nodredg)) I refer to 'dcells' in my code, not 'nodredg.'
However, now when I use these two number arrays (if that's what they are) my code returns the error
Error in dcells[[dredging[whichD]]] : subscript out of bounds
I believe this error is referring to the following lines of code:
if(julday>=dstart[whichD] & julday<=dend[whichD] & dredging[whichD]!=0){
whichcells<-which(gridd$inds[gridd$julday==julday] %in% dcells[[ dredging[whichD] ]]) #identify cells where dredging is occurring
}
where:
whichD<-1
julday<-1+day
'dstart=c(1,25,75,100)dend=c(20,60,80,117)`
Here is the full block of code:
for (i in 1:time.steps){
qday <- qday + 1
for (p in 1:pop){
if (is.na(dolphin.q[p,1,i]!=dolphin.q[p,2,i])) {
dolphin.q[p,6,i]<-which.max(c(dolphin.q[p,1,i],dolphin.q[p,2,i]))
}else{
dolphin.q[p,6,i]<-rmulti(c(0.5,0.5))
}
usP<-usage[p,]
if(julday>=dstart[whichD] & julday<=dend[whichD] & dredging[whichD]!=0){
whichcells<-which(gridd$inds[gridd$julday==julday] %in% dcells[[ dredging[whichD] ]])
usP[whichcells]<-0
usP<-usP/sum(usP) #rescale the home range surface to 1
}
I was wondering if anyone could show me what is going wrong? I apologize if this is a very simple mistake I am making, I am a novice learner that has been scouring the internet, manuals, and Stack for days with no luck!
Thanks in advance!

Constructing a return in R

I have a function in R that I run on lists of flies. After the following code I apply the function to files and have R put the values in a new file. An equation in this function outputs a value based on water density and depth. I found that the top row of the depth is not always 1 in these files, but I need it to be for the equation. I would like to be able to output an "NA" message into the new sheet for when it is not.
stratindex=function(file){
ctd=read.csv(file,header=T)
x=ctd$Density..sigma.t..kg.m.3..
y=ctd$Depth..salt.water..m...lat...60
if(y[1]!=1){return(NA)} else
((x[30]-x[1]) / 29)
}
This all reads in fine. Then I get to the next part, where it takes the individual files and applies the function to them.
the.files <- choose.files()
index <- sapply(the.files, stratindex)
After this, R prompts:
"Error in if (y[1] != 1) { : missing value where TRUE/FALSE needed"
Where did I go wrong? What can I do to nest other stipulations if I find them? For instance, if the depth at row 30 is not 30.

window function R code

fine people of stackoverflow. I have become trapped on a rather simple part of my program and was wondering if you guys could help me.
library(nonlinearTseries)
tt<-c(0,500,1000)
mm<-rep(0,2)
for (j in 1:2){mm[j]=estimateEmbeddingDim(window(rnorm(1000), start=tt[j],end=tt[j+1]), number.points=(tt[j+1]-tt[j]),do.plot=FALSE)}
Warning message:
In window.default(rnorm(1000), start = tt[j], end = tt[j + 1]) :
'start' value not changed
If I plug in the values directly (tt[1], tt[2], tt[3]), it works but I also get a warning
estimateEmbeddingDim(window(rnorm(1000), start=tt[1],end=tt[2]), number.points=(tt[2]-tt[1]),do.plot=FALSE)
[1] 9
Warning message:
In window.default(rnorm(1000), start = tt[1], end = tt[2]) :
'start' value not changed
Thanks, Matt.
The problem seems to be with the
window(rnorm(1000), start=tt[j],end=tt[j+1])
lines. First of all, window is only meant to be used with a time series object (class=="ts"). In this case, rnorm(1000) simply returns a numeric vector, there are no dates associated with this object. So i'm not sure what you think this function does. Did you only want to extract the values that were between 0-500 and 500-1000? If so that seems a bit because with a standard normal variable, the max of 1000 samples isn't likely to be much over 4 let alone 500.
So be sure to use a proper "ts" object with dates and everything to get this to work.

glm probit "cannot find valid starting values" error message

I am trying to use R for the first time to do some probit analaysis.
I get the following error:
Error in if (!(validmu(mu) && valideta(eta))) stop("cannot find valid starting values: please specify some", :
missing value where TRUE/FALSE needed
This is the command I am using:
m1=glm(Good~Stg.Days+Dev.Deployments+Check.Ins+NoOfDevelopers,family=poisson(link = "probit"),data=deploy[1:4,])
My data deploy[1:4,] are as loaded in from a CSV file follows:
Good,Application Type,Project,Start Date,End Date,Stg Days,Dev Deployments,Check Ins,NoOfDevelopers
1,DocumentPlatform,ZCP,11/08/2010,11/11/2010,0.6,0,12,4
1,DocumentPlatform,ZCP,11/11/2010,09/12/2010,0.4,0,4,1
0,DocumentPlatform,ZCP,09/12/2010,07/03/2011,10,0,7,3
1,FactsheetPlatform,Process.ARCH,28/06/2010,09/03/2011,7.1,0,18,2
deploy is in reality a much bigger vector than 1:4 I am just using a subset of the data to help determine the problem.
Any ideas what is wrong?
As i commented: Using ?glm I found tha the poisoon family supports the following links: log, identity, and sqrt.
Testing on another link:
test <- data.frame('Good'=c(1,1,0,1),'Stg Days'=c(0.6,0.4,10,7.1),'Dev Deployments'=c(0,0,0,0),'Check Ins'=c(12,4,7,18),'NoOfDevelopers'=c(4,1,3,2))
m1=glm(Good~ . ,family=poisson(link = "log"),data=test)
Gives no errors. So I think your link = "probit" is the problem.

Resources