Looking at my time trend plot, I wonder how to test the statistical significance in the trend shown here given this simple "years vs rate" ecological data, using R? I tried ANOVA turned in p<0.05 treating year variable as a factor. But I'm not satisfied with ANOVA. Also, the article I reviewed suggested Wald statistics to test the time trend. But I found no guiding examples in Google yet.
My data:
> head(yrrace)
year racecat rate outcome pop
1 1995 1 14.2 1585 11170482
2 1995 2 8.7 268 3070363
3 1996 1 14.1 1574 11170482
4 1996 2 7.5 230 3070363
5 1997 1 13.3 1482 11170482
6 1997 2 8.3 254 3070363
Related
I was wondering, does anyone know how print the output lmer or lme summary data for a group within a data set in R? For example if this is what the header of my data (df) looks like:
SubjectID
week
group
weight
1
1
1
12.5
1
2
1
10.6
2
1
3
6.4
2
2
3
6.3
3
1
4
23.5
3
2
4
15.2
And I want to get the specific intercept and slope for the subjects in group 3 only. I would use the lmer function in the code below:
fit.coef <- lmer(weight ~ week*group + (week|SubjectID),
data = df,
control = lmerControl(optimizer = "bobyqa"))
I can get statistics for an individual in the data set or the intercept and slopes across the entire data set but I can't figure out how to calculate these specific values for all the items within a group (e.g. all subjects within group 3). I know this is easy in SAS but I can't figure out any way to do this in R despite googling for hours.
You haven't given us a reproducible example but I think you're looking for coef() ?
It gives the predicted effects for each random effect term, for each level of the grouping variable, in the example below the intercept and slope for each subject.
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
coef(m1)
$Subject
(Intercept) Days
308 253.6637 19.6662617
309 211.0064 1.8476053
310 212.4447 5.0184295
330 275.0957 5.6529356
331 273.6654 7.3973743
332 260.4447 10.1951090
333 268.2456 10.2436499
334 244.1725 11.5418676
335 251.0714 -0.2848792
337 286.2956 19.0955511
349 226.1949 11.6407181
350 238.3351 17.0815038
351 255.9830 7.4520239
352 272.2688 14.0032871
369 254.6806 11.3395008
370 225.7921 15.2897709
371 252.2122 9.4791297
372 263.7197 11.7513080
I'm working on forecasting the Monthly Average Precipitation of a geographical region in India (Assam and Meghalaya subdivision). For this purpose, I'm using the Monthly Average Air Temperature data and Monthly Averaged Relative Humidity data (which I extracted and averaged it spatially from the netCDF4 file for this geographical region present on the NOAA website) as the independent variables(predictors).
For the forecasting purpose, I want to model a linear regression with Precipitation as the dependent variable and "Air Temperature" and "Relative Humidity" data as the independent variables such that they're having a time-lagged effect in the regression.
The Linear regression equation should look like:
Please follow this link for the equation
Here, "Y" is Precipitation, "X" is Air Temperature and "Z" is Relative Humidity.
The sample "Training data" is as follows:
ID Time Precipitation Air_Temperature Relative_Humidity
1 1 1948-01-01 105 20.31194 81.64137
2 2 1948-02-01 397 21.21052 80.20120
3 3 1948-03-01 594 22.14363 81.94274
4 4 1948-04-01 2653 20.79417 78.89908
5 5 1948-05-01 7058 20.43589 82.99959
6 6 1948-06-01 5328 18.10059 77.91983
7 7 1948-07-01 4882 16.63936 76.25758
8 8 1948-08-01 3979 16.56065 76.89210
9 9 1948-09-01 2625 16.95542 76.80116
10 10 1948-10-01 2578 17.13323 75.62411
And a segment of "Test data" is as follows:
ID Time Precipitation Air_Temperature Relative_Humidity
1 663 2003-03-01 862 21.27210 79.77419
2 664 2003-04-01 1812 20.44042 79.42500
3 665 2003-05-01 1941 19.24267 79.57057
4 666 2003-06-01 4981 18.53784 80.67292
5 667 2003-07-01 4263 17.21581 79.97178
6 668 2003-08-01 2436 16.88686 81.37097
7 669 2003-09-01 2322 16.23134 77.63333
8 670 2003-10-01 2220 17.40589 81.14516
9 671 2003-11-01 131 19.01159 79.15000
10 672 2003-12-01 241 20.86234 79.05847
Any help would be highly appreciated. Thanks!
Reacting to your clarification in the comments, here is one of many ways to produce lagged variables, using the lag function within dplyr (I am also adding a new row here for later forecasting):
df %>%
add_row(ID = 11, Time = "1948-11-01") %>%
mutate(Air_Temperature_lagged = dplyr::lag(Air_Temperature, 1),
Relative_Humidity_lagged = dplyr::lag(Relative_Humidity, 1)) -> df.withlags
You can then fit a straightforward linear regression using lm, with Precipitation as your dependent variable and the lagged versions of the two other variables as the predictor:
precip.model <- lm(data = df.withlags, Precipitation ~ Air_Temperature_lagged + Relative_Humidity_lagged)
You could then apply your coefficients to your most recent values in Air_Temperature and Relative_Humidity to forecast the precipitation for November of 1948 using the predict function.
predict(precip.model, newdata = df.withlags)
1 2 3 4 5 6 7 8 9 10 11
NA 2929.566 3512.551 3236.421 3778.742 2586.012 3473.482 3615.884 3426.378 3534.965 3893.255
The model's prediction is 3893.255.
Note that this model will only allow you to forecast one time period into the future, since you don't have more information in your predictors.
I have a count data and I need to do time series analysis using Dynamic negative binomial regression as the data has autocorrelation and Overdispersion issues.
I did an online search for any R package that I can use but I was not able to find one.
I would appreciate any help.
An example of my data:
>St1
[1] 17 9 28 7 23 16 17 12 11 16 19 29 5 40 13 27 13 11 10 14 13 23 21 24 9 42 14 22 17 9
>Years
[1] 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
[23] 2007 2008 2009 2010 2011 2012 2013 2014
>library(AER)
>library(stats)
>rd <- glm(St1 ~ Years, family = poisson)
>dispersiontest(rd)
Overdispersion test
data: rd
z = 2.6479, p-value = 0.00405
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
4.305539
#Autocorrelation
>Box.test (St1, lag=ceiling(log(length(St1))), type = "Ljung")
Box-Ljung test
data: St1
X-squared = 13.612, df = 4, p-value = 0.008641
So this is basically a request to find a package (and such requests are considered off-topic). So I'm going to see if I can convert it to a question that has a coding flavor. As I said in my comment, trying to use "dynamic" as a search term is often disappointing since everybody seems to want to use the word for a bunch of disconnected purposes. Witness the functions that come up with this search from the console:
install.packages("sos")
sos::findFn(" dynamic negative binomial")
found 20 matches
Downloaded 20 links in 13 packages.
Nothing that appeared useful. But looking at your citation it appeared that all the models had an autoreggression component, so this search ....
sos::findFn(" autoregressive negative binomial")
found 28 matches; retrieving 2 pages
2
Downloaded 27 links in 16 packages.
Finds: "Fitting Longitudinal Data with Negative Binomial Marginal..." and "Generalized Linear Autoregressive Moving Average Models with...". So consider this rather my answer to an "implicit question": How to do effective searching from the R console with the sos-package?
I am new to coding as well as posting on forum but I will do my best to explain the problem and give enough background so that you're able to help me work through it. I have done a lot of searching for solutions to similar errors but all of the code that produces it and the format of the data behind it are very different.
I am working with biological data that consists of various growth categories but all that I am interested in is length (SCL in my code) and age (Age in my code). I have many lengths and age estimates for each individual through time and I am fitting a linear nlme model to the juvenile ages and a Von Bert curve to the mature ages. My juvenile model works just fine and I extracted h (slope of the line) and t (x intercept). I now need to use those parameters as well as T (known age at maturity) to fit the mature stage. The mature model will estimate K (this is my only unknown). I have included a subset of my data for one individual (ID50). This is information for only the mature years with the h and t from it's juvenile fit appended in the farthest right columns.
Subset of my data:
This didn't format very well but I'm not sure how else to display it
Grouped Data: SCL ~ Age | ID
ID SCL Age Sex Location MeanSCL Growth Year Status T h t
50 86.8 27.75 Female VA 86.8 0.2 1994 Mature 27.75 1.807394 -19.83368
50 86.9 28.75 Female VA 87.1 0.4 1995 Mature 27.75 1.807394 -19.83368
50 87.3 29.75 Female VA 87.5 0.5 1996 Mature 27.75 1.807394 -19.83368
50 87.8 30.75 Female VA 88 0.4 1997 Mature 27.75 1.807394 -19.83368
50 88.1 31.75 Female VA 88.1 0 1998 Mature 27.75 1.807394 -19.83368
50 88.1 32.75 Female VA 88.2 0 1999 Mature 27.75 1.807394 -19.83368
50 88.2 33.75 Female VA 88.3 0.2 2000 Mature 27.75 1.807394 -19.83368
50 88.4 34.75 Female VA 88.4 0.1 2001 Mature 27.75 1.807394 -19.83368
50 88.4 35.75 Female VA 88.4 0 2002 Mature 27.75 1.807394 -19.83368
50 88.5 36.75 Female VA 88.5 0 2003 Mature 27.75 1.807394 -19.83368
This is the growth function:
vbBiphasic = function(Age,h,T,t,K) {
y=(h/(exp(K)-1))*(1-exp(K*((T+log(1-(exp(K)-1)*(T-t))/K)-Age)))
}
This is the original growth model that SHOULD have fit:
ID50 refers to my subsetted dataset with only individual 50
VB_mat <- nlme(SCL~vbBiphasic(Age,h,T,t,K),
data = ID50,
fixed = list(K~1),
random = K~1,
start = list(fixed=c(K~.01))
)
However this model produces the error:
Error in pars[, nm] : incorrect number of dimensions
Which tells me that it's trying to estimate a different number of parameters than I have (I think). Originally I was fitting it to all mature individuals (bur for the sake of simplification I'm now trying to fit to one). Here are all of my variations to the model code, ALL of them produced the same error:
inputting averaged values of (Age, h, T,t,K) of the whole population
instead of the variables.
using a subset of 5 individuals and both (Age, h, T,t,K) as well as the averaged values for those individuals for each variable.
using 5 different individuals separately with both (Age, h, T,t,k) as well as their actual values for those variables (all ran
individually i.e. 10 different strings of code just in case some
worked and others didn't... but none did).
Telling the model to estimate all parameters, not just K
eliminating all parameters except K
Turning all values into vectors (that's what one forum with a similar error said to do)
Most of these were in an effort to change the number of parameters that R thought it needed to estimate, however none have worked for me.
I'm no expert on nlme and often have similar problems when fitting models, especially when you cannot use nlsList to get started. My guess is that you have 4 parameters in your function (h, T, t, k), but you are only estimating one of them as both a fixed effect and with a random effect. I believe this then constrains the other parameters to zero which would in effect eliminate them from the model (but you still have them in the model!). Usually you include all the parameters as fixed, and then try to decide how many of them you also want to have a random effect. So I would include all 4 in the fixed argument and the start argument. Since you have 4 parameters, each one has to be either fixed or random, or both - otherwise, how can they be in the model?
apologies for what is likely to be a very basic question, I am very new to R.
I am looking to read off my augPred plot in order to average out the values to provide a prediction between a time period.
> head(tthm.groupeddata)
Grouped Data: TTHM ~ Yearmon | WSZ_Code
WSZ_Code Treatment_Code Year Month TTHM CL2_FREE BrO3 Colour PH TURB Yearmon
1 2 3 1996 1 30.7 0.35 0.00030 0.75 7.4 0.055 Jan 1996
2 6 1 1996 2 24.8 0.25 0.00055 0.75 6.9 0.200 Feb 1996
3 7 4 1996 2 60.4 0.05 0.00055 0.75 7.1 0.055 Feb 1996
4 7 4 1996 2 58.1 0.15 NA 0.75 7.5 0.055 Feb 1996
5 7 4 1996 3 62.2 0.20 NA 2.00 7.6 0.055 Mar 1996
6 5 2 1996 3 40.3 0.15 0.00140 2.00 7.7 0.055 Mar 1996
This is my model:
modellme<- lme(TTHM ~ Yearmon, random = ~ 1|WSZ_Code, data=tthm.groupeddata)
and my current plot:
plot(augPred(modellme, order.groups=T),xlab="Date", ylab="TTHM concentration", main="TTHM Concentration with Time for all Water Supply Zones")
I would like a way to read off the graph by either placing lines between a specific time period in a specific WSZ_Code (my group) and averaging the values between this period...
Of course any other way/help or guidance would be much appreciated!
Thanks in advance
I don't think we can tell whether it is "entirely incorrect", since you have not described the question and have not included any data. (The plotting question is close to being entirely incorrect, though.) I can tell you that the answer is NOT to use abline, since augPred objects are plotted with plot.augPred which returns (and plots) a lattice object. abline is a base graphic function and does not share a coordinate system with the lattice device. Lattice objects are lists that can be modified. Your plot probably had different panels at different levels of WSZ_Code, but the location of the desired lines is entirely unclear especially since you trail off with an ellipsis. You refer to "times" but there is no "times" variable.
There are lattice functions such as trellis.focus and update.trellis that allow one to apply modifications to lattice objects. You would first assign the plot object to a named variable, make mods to it and then plot() it again.
help(package='lattice')
?Lattice
(If this is a rush job, you might be better off making any calculations by hand and using ImageMagick to edit pdf or png output.)