I have used the specaccum() command to develop species accumulation curves for my samples.
Here is some example data:
site1<-c(0,8,9,7,0,0,0,8,0,7,8,0)
site2<-c(5,0,9,0,5,0,0,0,0,0,0,0)
site3<-c(5,0,9,0,0,0,0,0,0,6,0,0)
site4<-c(5,0,9,0,0,0,0,0,0,0,0,0)
site5<-c(5,0,9,0,0,6,6,0,0,0,0,0)
site6<-c(5,0,9,0,0,0,6,6,0,0,0,0)
site7<-c(5,0,9,0,0,0,0,0,7,0,0,3)
site8<-c(5,0,9,0,0,0,0,0,0,0,1,0)
site9<-c(5,0,9,0,0,0,0,0,0,0,1,0)
site10<-c(5,0,9,0,0,0,0,0,0,0,1,6)
site11<-c(5,0,9,0,0,0,5,0,0,0,0,0)
site12<-c(5,0,9,0,0,0,0,0,0,0,0,0)
site13<-c(5,1,9,0,0,0,0,0,0,0,0,0)
species_counts<-rbind(site1,site2,site3,site4,site5,site6,site7,site8,site9,site10,site11,site12,site13)
accum <- specaccum(species_counts, method="random", permutations=100)
plot(accum)
In order to ensure I have sampled sufficiently, I need to make sure the curve of the species accumulation plot reaches an asymptote, defined as a slope of <0.3 between the last two points (ei between sites 12 and 13).
results <- with(accum, data.frame(sites, richness, sd))
Produces this:
sites richness sd
1 1 3.46 0.9991916
2 2 4.94 1.6625403
3 3 5.94 1.7513054
4 4 7.05 1.6779918
5 5 8.03 1.6542263
6 6 8.74 1.6794660
7 7 9.32 1.5497149
8 8 9.92 1.3534841
9 9 10.51 1.0492422
10 10 11.00 0.8408750
11 11 11.35 0.7017295
12 12 11.67 0.4725816
13 13 12.00 0.0000000
I feel like I'm getting there. I could generate an lm with site vs richness and extract the exact slope (tangent?) between sites 12 and 13. Going to search a bit longer here.
Streamlining your data generation process a little bit:
species_counts <- matrix(c(0,8,9,7,0,0,0,8,0,7,8,0,
5,0,9,0,5,0,0,0,0,0,0,0, 5,0,9,0,0,0,0,0,0,6,0,0,
5,0,9,0,0,0,0,0,0,0,0,0, 5,0,9,0,0,6,6,0,0,0,0,0,
5,0,9,0,0,0,6,6,0,0,0,0, 5,0,9,0,0,0,0,0,7,0,0,3,
5,0,9,0,0,0,0,0,0,0,1,0, 5,0,9,0,0,0,0,0,0,0,1,0,
5,0,9,0,0,0,0,0,0,0,1,6, 5,0,9,0,0,0,5,0,0,0,0,0,
5,0,9,0,0,0,0,0,0,0,0,0, 5,1,9,0,0,0,0,0,0,0,0,0),
byrow=TRUE,nrow=13)
Always a good idea to set.seed() before running randomization tests (and let us know that specaccum is in the vegan package):
set.seed(101)
library(vegan)
accum <- specaccum(species_counts, method="random", permutations=100)
Extract the richness and sites components from within the returned object and compute d(richness)/d(sites) (note that the slope vector is one element shorter than the origin site/richness vectors: be careful if you're trying to match up slopes with particular numbers of sites)
(slopes <- with(accum,diff(richness)/diff(sites)))
## [1] 1.45 1.07 0.93 0.91 0.86 0.66 0.65 0.45 0.54 0.39 0.32 0.31
In this case, the slope never actually goes below 0.3, so this code for finding the first time that the slope falls below 0.3:
which(slopes<0.3)[1]
returns NA.
I'm having a few issue's I'd appreciate some help with.
head(new.data)
WSZ_Code Treatment_Code Year Month TTHM CL2_FREE BrO3 Colour PH TURB seasons
1 2 3 1996 1 30.7 0.35 0.5000750 0.75 7.4 0.055 winter
2 6 1 1996 2 24.8 0.25 0.5001375 0.75 6.9 0.200 winter
3 7 4 1996 2 60.4 0.05 0.5001375 0.75 7.1 0.055 winter
4 7 4 1996 2 58.1 0.15 0.5001570 0.75 7.5 0.055 winter
5 7 4 1996 3 62.2 0.20 0.5003881 2.00 7.6 0.055 spring
6 5 2 1996 3 40.3 0.15 0.5003500 2.00 7.7 0.055 spring
library(nlme)
> mod3 <- lme(TTHM ~ CL2_FREE, random= ~ 1| Treatment_Code/WSZ_Code, data=new.data, method ="ML")
> mod3
Linear mixed-effects model fit by maximum likelihood
Data: new.data
Log-likelihood: -1401.529
Fixed: TTHM ~ CL2_FREE
(Intercept) CL2_FREE
54.45240 -40.15033
Random effects:
Formula: ~1 | Treatment_Code
(Intercept)
StdDev: 0.004156934
Formula: ~1 | WSZ_Code %in% Treatment_Code
(Intercept) Residual
StdDev: 10.90637 13.52372
Number of Observations: 345
Number of Groups:
Treatment_Code WSZ_Code %in% Treatment_Code
4 8
> plot(augPred(mod3))
Error in plot(augPred(mod3)) :
error in evaluating the argument 'x' in selecting a method for function 'plot': Error in sprintf(gettext(fmt, domain = domain), ...) :
invalid type of argument[1]: 'symbol'
I'm not sure why I get this error. The ranef plot seems OK
plot(ranef(mod3))
But that only gives the value of the random intercepts, no TTHM predictions.
I'm looking for a way to plot the predictions like in a typical augPred which would show all the random effects for each zone. Hope that makes sense.
You need a groupedData object to use augPred. I hope this helps.
Best wishes #CSJCampbell
con <- textConnection("
WSZ_Code Treatment_Code Year Month TTHM CL2_FREE BrO3 Colour PH TURB seasons
2 3 1996 1 30.7 0.35 0.5000750 0.75 7.4 0.055 winter
6 1 1996 2 24.8 0.25 0.5001375 0.75 6.9 0.200 winter
7 4 1996 2 60.4 0.05 0.5001375 0.75 7.1 0.055 winter
7 4 1996 2 58.1 0.15 0.5001570 0.75 7.5 0.055 winter
7 4 1996 3 62.2 0.20 0.5003881 2.00 7.6 0.055 spring
5 2 1996 3 40.3 0.15 0.5003500 2.00 7.7 0.055 spring
")
new.data <- read.table(con, header = TRUE)
library(nlme)
new.data.grp <- groupedData(TTHM ~ CL2_FREE | Treatment_Code/WSZ_Code, data = new.data)
mod3 <- lme(TTHM ~ CL2_FREE, random= ~ 1| Treatment_Code/WSZ_Code, data=new.data.grp, method ="ML")
mod3
ap3 <- augPred(mod3)
plot(ap3)
I realize most are probably using ggplot2 and lme4 at this point, but I'm a bit crufty.
Here are a couple of things that I've found working with lists of response variables that are fit using lme().
So, I've been working with a number of response variables that I want to fit to a particular set of inputs. In short my code looks something like
mymodels = list()
for(resp in my_response_vars){
f = as.formula(paste(resp,paste(my_input_vars,collapse='+'),sep='~'))
mymodels[[resp]] = lme(fixed=f,random=~wave|group,method="ML",
data=mydata, na.action=na.exclude)
}
I've been successful in treating the entries in the resulting list as normal lme() objects. The problem comes when I want to plot predictions via augPred(). Specifically I get the following error,
Error in tapply(object[[nm]], groups, FUN[["numeric"]], ...) :
arguments must have same length
So, after much searching, I decided to have a look under the hood of augPred() via debug(). Here are some of the insights I came to... I'm not sure that these qualify as bugs or if they would require a patch, but I hope they can help others with similar problems.
When calling augPred() the function looks for the name of the data that was used in the original lme() call, then inherits this object from the parent.frame() via a call to eval(). I'm not sure if this defaults to the object frame or the global, but, when I change this to data = object$data in the debug, things work. So, ostensibly, if you have used a subset of these data in your model, it will call on the full set of data.
The above causes issues if one response has missing values and you are interested in one that does not. Since it includes everything in the data.frame as part of an eventual call to gsummary() the missing values in the non-response variable will throw a wrench into things.
So, missing values mess things up. I have defaulted to making a temporary data.frame with the columns of interest, then running complete.cases() on this prior to fitting the lme() model.
mymods = list()
for(resp in my_response_vars){
f = as.formula(paste(resp,paste(my_input_vars,collapse='+'),sep='~'))
v2keep = all.vars(f) # grab terms
smdat = mydata[,c(v2keep,'group')] # include group
smdat=smdat[complete.cases(smdat),] # scrub missing
tmpmod = lme(fixed=f, random=~wave|group,
method='ML', data=smdat)
mymods[[resp]] = tmpmod
# include augPred() call here
}
If you are not including a primary argument in your call to augPred() it will require that your data.frame is a groupedData() object.
So, if you are running into the arguments must have the same length error, try: subsetting your data first under a different name, make sure to clear out missing rows explicitly prior to fitting your model.