I ran a multilevel regression and now have a coefficient matrix consisting of value + standard error for each group (=a factor variable) in the regression, e.g. my matrix (for intercept + one variable called Beta1) looks like this:
Group Intercept Beta1 Intercept.se Beta1.se
11 0.044357458 0.4381340 0.08358735 0.1572632
12 -0.007072542 0.1242737 0.09317142 0.1643544
21 0.021075871 0.3727055 0.12050036 0.2459456
22 0.023895981 0.6786013 0.11207848 0.3188887
31 -0.115713481 0.3547718 0.09760681 0.1454787
32 -0.004081244 -0.1954594 0.09993201 0.1953406
What I would like to achieve is to draw a diagram showing possible regression lines for each group. I came up with the following code which produces 15 lines for each group (coef.mtx is the matrix mentioned above):
for (i in 1:6) { # we have 6 groups
x = coef.mtx[i,]
lines[(i*6-5):(i*6),] =
list(Group = replicate(15, x["Group"]),
int = replicate(15, rnorm(1,x["Intercept"],x["Intercept.se"])),
slo = replicate(15, rnorm(1, x["Beta1"], x["Beta1.se"])))
}
This produces a dataframe like this:
Group int slo
1 11 0.09484568 0.3005997
2 11 0.12364749 0.5758899
3 11 -0.02942938 0.4821841
4 11 0.17226587 0.2413752
5 11 0.02923023 0.4251419
6 11 0.14650632 0.4541752
7 12 0.06784996 0.0356669
8 12 -0.02832304 0.2214471
...
And then I can draw those lines with ggplot like this:
ggplot(myData, aes(x=Beta1, y=Outcome)) +
geom_jitter() +
facet_wrap(~ Group) +
geom_abline(aes(intercept=int, slope=slo), data=lines)
The final result looks like this:
Is there a better way to transform the coefficient matrix instead of using this loop? I was unable to think of a better way... Alternatively: how would you visualize possible regression lines (and not just the point-estimate)?
Related
I'm trying to calculate point slopes from a series of x,y data. Because some of the x data repeats (...8, 12, 12, 16...) there will be a division by zero issue when using slope = (y2-y1/x2-x1).
My solution is to create a polynomial regression equation of the data, then plug a new set of x values (xx) into the equation that monotonically increase between the limits of x. This eliminates the problem of equal x data points. As a result, (x) and (xx) have the same limits, but (xx) is always longer in length.
The problem I am having is that the fitted values for xx are limited to the length of x. When I try to use the polynomial equation with (xx) that is 20 in length, the fitted yy results provide data for the first 10 points then gives NA for the next 10 points. What is wrong here?
x <- c(1,2,2,5,8,12,12,16,17,20)
y <- c(2,4,5,6,8,11,12,15,16,20)
df <- data.frame(x,y)
my_mod <- lm(y ~ poly(x,2,raw=T), data=df) # This creates the polynomial equation
xx <- x[1]:x[length(x)] # Creates montonically increasing x using boundaries of original x
yy <- fitted(my_mod)[order(xx)]
plot(x,y)
lines(xx,yy)
tag-name
If you look at
fitted(my_mod)
It outputs:
# 1 2 3 4 5 6 7 8 9 10
#3.241032 3.846112 3.846112 5.831986 8.073808 11.461047 11.461047 15.303305 16.334967 19.600584
Meaning the name of the output matches the position of x, not the value of x, so fitted(my_mod)[order(xx)] doesn't quite make sense.
You want to use predict here:
yy <- predict(my_mod, newdata = data.frame(x = xx))
plot(xx, yy)
# 1 2 3 4 5 6 7 8 9 10
# 3.241032 3.846112 4.479631 5.141589 5.831986 6.550821 7.298095 8.073808 8.877959 9.710550
# 11 12 13 14 15 16 17 18 19 20
# 10.571579 11.461047 12.378953 13.325299 14.300083 15.303305 16.334967 17.395067 18.483606 19.600584
I want to do a lagrange multiplier test on a panel dataset of the following type:
UGA Date Sales Nb_AM Nb_BX ......
A 01/2017 1 4 14
A 02/2017 8 5 17
A 03/2017 26 2 24
B 01/2017 3 3 35
B 02/2017 5 10 42
B 03/2017 8 24 2
I want to use the following command : lm.LMtests()
However, according to R documentation, I need to put an argument of the
type "listw" in lm.LMtests but I have no idea what to put in my case. Could
you help me?
For the moment my code is the following :
fusion2<-read_excel("C:/Users/david/OneDrive/Bureau/Master data/Mémoire
data analyst/Bases de données/Fusion/fusion.xlsx")
modeleam<-Sales ~ Nb_AM + Nb_BX +
Total_PdS_sensibilisés_aux_événement_AM + Mails_AM_ouvert +
Mails_AM_non_ouvert + Total_PdS_sensibilisés_aux_RP_AM +
Total_PdS_sensibilisés_aux_Staff_AM + Total_PdS_sensibilisés_aux_Congrés_AM
+ Total_PdS_sensibilisés_aux_Opportunités_AM
mcoam <-lm(modeleam, data=fusion2)
lagrangeam <- lm.LMtests(mcoam, ,test="all")
Thanks in advance
when you're in the "matter" it's pretty basic. So this test is crated for spatial statistics and a listw-object is nothing else than a neighbour dependence. That means how strongly one value could potentially be influenced by neighbour values.
For that you need for example a simple feature with geometries of a landscape or a city where you can assign the values to a specific polygon. From this pattern, you can create a neighbourhood and then neighbourhood weights (listw-object).
Small tutorial:
library(spdep); library(sf)
#Get your data
shape_and_data <- st_read("your/shape")
#Create your neighbourhood, nb-object
data_nb <- poly2nb(shape_and_data)
#Create the neighbour weights, listw-object
data_listw <- nb2listw(data_nb)
#Calculate
lm.LMtests(lm(...), listw = data_listw, test = "all")
This is a really basic example. For creating the neighbourhood (nb-object) you can choose different methods and for the weights (listw) there are also several methods.
Hope it helped a bit,
Loubert
I'm trying to translate my SAS code for random effect ANOVA to R
here is my code:
proc glm data=A;
class group;
model common = group;
random group;
run;
'group' is group membership, and common is IV.
Please, translate this code into R code.
(edited)
my data looks like this:
id common group
1 4 A
2 2 A
3 3 A
4 2 B
5 2 B
6 3 C
7 4 C
8 3 C
I think you are looking for lme and the code can be written as:
library(nlme)
#let's say A (a dataframe) has the sample data
A$group <- as.factor(A$group)
#model
fit <- lme(common ~ group, random = ~ 1 | group, data = A, na.action=T)
This question is related to: Selecting Percentile curves using gamlss::lms in R
I can get centile curve from following data and code:
age = sample(5:15, 500, replace=T)
yvar = rnorm(500, age, 20)
mydata = data.frame(age, yvar)
head(mydata)
age yvar
1 12 13.12974
2 14 -18.97290
3 10 42.11045
4 12 27.89088
5 11 48.03861
6 5 24.68591
h = lms(yvar, age , data=mydata, n.cyc=30)
centiles(h,xvar=mydata$age, cent=c(90), points=FALSE)
How can I now get yvar on the curve for each of x value (5:15) which would represent 90th percentiles for data after smoothening?
I tried to read help pages and found fitted(h) and fv(h) to get fitted values for entire data. But how to get values for each age level at 90th centile curve level? Thanks for your help.
Edit: Following figure show what I need:
I tried following but it is correct since value are incorrect:
mydata$fitted = fitted(h)
aggregate(fitted~age, mydata, function(x) quantile(x,.9))
age fitted
1 5 6.459680
2 6 6.280579
3 7 6.290599
4 8 6.556999
5 9 7.048602
6 10 7.817276
7 11 8.931219
8 12 10.388048
9 13 12.138104
10 14 14.106250
11 15 16.125688
The values are very different from 90th quantile directly from data:
> aggregate(yvar~age, mydata, function(x) quantile(x,.9))
age yvar
1 5 39.22938
2 6 35.69294
3 7 25.40390
4 8 26.20388
5 9 29.07670
6 10 32.43151
7 11 24.96861
8 12 37.98292
9 13 28.28686
10 14 43.33678
11 15 44.46269
See if this makes sense. The 90th percentile of a normal distribution with mean and sd of 'smn' and 'ssd' is qnorm(.9, smn, ssd): So this seems to deliver (somewhat) sensible results, albeit not the full hack of centiles that I suggested:
plot(h$xvar, qnorm(.9, fitted(h), h$sigma.fv))
(Note the massive overplotting from only a few distinct xvars but 500 points. Ande you may want to set the ylim so that the full range can be appreciated.)
The caveat here is that you need to check the other parts of the model to see if it is really just an ordinary Normal model. In this case it seems to be:
> h$mu.formula
y ~ pb(x)
<environment: 0x10275cfb8>
> h$sigma.formula
~1
<environment: 0x10275cfb8>
> h$nu.formula
NULL
> h$tau.formula
NULL
So the model is just mean-estimate with a fixed-variance (the ~1) across the range of the xvar, and there are no complications from higher order parameters like a Box-Cox model. (And I'm unable to explain why this is not the same as the plotted centiles. For that you probably need to correspond with the package authors.)
I've conducted a psychometric test on some subjects, and I'm trying to create a multivariate probit model.
The test was conducted as follows:
To subject 1 was given a certain stimulous under 11 different conditions, 10 times for each condition. Answers (correct=1, uncorrect=0) were registered.
So for subject 1, I have the following results' table:
# Subj 1
correct
cnt 1 0
1 0 10
2 0 10
3 1 9
4 5 5
5 7 3
6 10 0
7 10 0
8 10 0
9 9 1
10 10 0
11 10 0
This means that Subj1 answered uncorrectly 10 times under condition 1 and 2, and answered 10 times correctly under condition 10 and 11. For the other conditions, the response was increasing from condition 3 to condition 9.
I hope I was clear.
I usually analyze the data using the following code:
prob.glm <- glm(resp.mat1 ~ cnt, family = binomial(link = "probit"))
Here resp.mat1 is the responses' table, while cnt is the contrast c(1,11). So I'm able to draw the sigmoid curve using the predict() function. The graph, for the subject-1 is the following.
Now suppose I've conducted the same test on 20 subjects. I have now 20 tables, organized like the first one.
What I want to do is to compare subgroups, for example: male vs. female; young vs. older and so on. But I want to keep the inter-individual variability, so simply "adding" the 20 tables will be wrong.
How can I organize the data in order to use the glm() function?
I want to be able to write a command like:
prob.glm <- glm(resp.matTOT ~ cnt + sex, family = binomial(link = "probit"))
And then graphing the curve for sex=M, and sex=F.
I tried using the rbind() function, to create a unique table, then adding columns for Subj (1 to 20), Sex, Age. But it looks me a bad solution, so any alternative solutions will be really appreciated.
Looks like you are using the wrong function for the job. Check the first example of glmer in package lme4; it comes quite close to what you want. herd should be replaced by the subject number, but make sure that you do something like
mydata$subject = as.factor(mydata$subject)
when you have numerical subject numbers.
# Stolen from lme4
library(lattice)
library(
xyplot(incidence/size ~ period|herd, cbpp, type=c('g','p','l'),
layout=c(3,5), index.cond = function(x,y)max(y))
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial))
There's a multivariate probit command in the mlogit library of all things. You can see an example of the data structure required here:
https://stats.stackexchange.com/questions/28776/multinomial-probit-for-varying-choice-set