I am trying to run a regression in R, using the plm package, on a unbalanced, relatively large data set (111738x66). However, in the output I get, only the coefficients are displayed, but the std errors, p values, etc, are not.
Further, there are 21 variables in the regression.
This is essentially the code that I am using
year1logyearC <- plm(formula = eddata.lfsato ~ eddata.sex + eddata.agesq + eddata.wav1 + eddata.wav2 + eddata.wav3 + eddata.wav4 + eddata.wav5 + eddata.wav6 + eddata.wav7 + eddata.wav8 + eddata.wav9 + eddata.wav10 + eddata.wav11 + eddata.wav12 + eddata.wav13 + eddata.wav14 + eddata.wav15 + eddata.wav16 + eddata.wav17 + eddata.eth1 + eddata.eth2 + cycle + eddata.logfiyr + cycle:eddata.logfiyr , method =“pooling”, data = PMCombFinNewRef1)
Might there be a way to get the data displayed?
Further, if to the output of this, I use the following commands
kable(tidy(year1logyearC), digits=5,
caption="Pooled model")
I get a nice table with most of the things I need (apart from DF and other stuff)...
Any reasons why that might be the case?
I think it is more to do with something like preferences or other setting perhaps?
I know - that I should add a reproducible example, but it is really difficult, because not only is the data sensitive, but also because there are 200 lines of code before. (Yes, I have tested those 200 lines and they work just fine! Ran a bunch of regressions using lm... and I have the complete output using lm)
Using 'summary' I am able to get the results I need
Thus,
summary(year1logyearC)
Related
I am unable to create a list of constraints in gurobi with list comprehension.
I made this able to work
m.addConstr(0_decision[0] + 1_decision[0] + 2_decision[0] + 3_decision[0] + 4_decision[0] + 5_decision[0] + 6_decision[0] , GRB.EQUAL, test_list[0])
However, I have too much data to enter manually, and substituting i in for the numbers does not work: This doesn't work
m.addConstrs([0_decision[i] + 1_decision[i] + 2_decision[i] + 3_decision[i] + 4_decision[i] + 5_decision[i] + 6_decision[i] for i in range(500)], GRB.EQUAL, test_list[i])
I get this error:
TypeError: addConstrs() takes at most 3 positional arguments (4 given)
I tried different forms of comprehension, I tried to do it one at a time and that works! But I have too many I cannot afford to do it hundreds of times.
Use a generator expression across the entire expression:
m.addConstrs(0_decision[i] + 1_decision[i] + 2_decision[i] +
3_decision[i] + 4_decision[i] + 5_decision[i] +
6_decision[i] == test_list[i] for i in range(500))
Better yet, refactor the *_decision variables using Model.addVars() so that you can write:
m.addConstrs(decision.sum('*',i) == test_list[i] for i in range(500))
When using the mammal dataset in R i am trying to fit a hierarchal model using the danger variable. However using the following code bellow i get a jagged line rather that a linear one. it seems that its not generalising very well, its fitting out data too well and i need a linear relationship here. Does anyone know how to solve this, code is bellow;
fyi body and gestation have been log transformed already here, and wont work without this
#random slope model
hie3 = lmer(dream ~ body + gestation + (1|danger), data = mammal)
summary(hie3)
mammal$hie3_predictions = predict(hie3)
hie_plot3 = ggplot(aes(x = body + gestation, y = hie3_predictions,
color = danger), data = mammal) +
geom_line(size=1) +
geom_point(aes(y = dream))
hie_plot3
Data set used can be in R as standard
I would like to get marignial means of the groups using get_contrasts() function in psycho package.
In the psycho blog (https://neuropsychology.github.io/psycho.R/2018/05/01/repeated_measure_anovas.html), they show how to get difference between groups usign get_contrast. But I could not get the marginal means between groups as they show (see image below).
I trid
# same results betwen them
get_contrasts(fit, "Emotion_Condition")
get_contrasts(fit)
# quite obvious it wont work, but at least I tired
get_contrasts(fit, "Emotion_Condition*Subjective_Valence")
# print results
print(results$contrasts) # only show two levels of factors
print (results) # remove `contrasts` and show the same outcome as they do
# Here are the codes that they provide
library(psycho)
library(tidyverse)
df <- psycho::emotion %>%
select(Participant_ID,
Participant_Sex,
Emotion_Condition,
Subjective_Valence,
Recall)
library(lmerTest)
fit <- lmer(Subjective_Valence ~ Emotion_Condition + (1|Participant_ID), data=df)
anova(fit)
results <- get_contrasts(fit, "Emotion_Condition")
print(results$contrasts)
# ERRORs! since I cant find the means (mrginal means between groups). it is not possible to plot.
# How can I get means from results here?
ggplot(results$means, aes(x=Emotion_Condition, y=Mean, group=1)) +
geom_line() +
geom_pointrange(aes(ymin=CI_lower, ymax=CI_higher)) +
ylab("Subjective Valence") +
xlab("Emotion Condition") +
theme_bw()
Will be much appreciated if anyone can help me on it.
Use get_means to get the estimated marginal means.
I have a dataset comprising the following variables (fruit, prices, country, organic/non-organic, location).
I would like a plot like the one here but with one thing added - a line of best fit that runs through the points for each grouping of organic/non-organic, location, and fruit.
plot -> https://dl.dropboxusercontent.com/u/3803117/stackoverflow.jpeg
For example, in the "Organic, City" square, I would like 4 lines of best fits - one centered on Apples, Bananas, Cherries, Dates, etc.
Here's the code I used to generate the plot.
p <- ggplot(data,aes(factor(fruit),price)) +
geom_violin(aes(fill=Country,trim=FALSE)) +
geom_boxplot(aes(fill=Country),position=position_dodge(0.9),width=.1) +
geom_jitter(alpha=0.5) +
facet_wrap(organic~location) +
xlab("Fruit") +
ylab("Price") +
labs(fill="Country")
Here's a sample dataset if it might help -> https://dl.dropboxusercontent.com/u/3803117/stackoverflow.csv
Thanks in advance so much for all the help!
Doesn't the geom_abline documentation specify exactly what you are looking for? See the part "# Slopes and intercepts from linear model"
http://docs.ggplot2.org/current/geom_abline.html
EDIT: Just checked and realized that there are no examples without the SE bands but you can easily disable them by setting SE=FALSE:
p <- qplot(wt, mpg, data = mtcars)
p <- p + geom_smooth(aes(group=cyl), method="lm", se=FALSE)
p <- p + facet_grid(cyl~.)
print(p)
If you provided a sample dataset it would be even easier to help you.
EDIT2:
The following might more closely resemble what the OP envisioned. however, I hasten that it is not meaningful as the ordering of country (or fruit, or type, or anything) can typically not be used to formulate a useful linear relationship:
p <- ggplot(data,aes(factor(country),price)) +
geom_violin(aes(fill=country,trim=FALSE)) +
geom_boxplot(aes(fill=country),position=position_dodge(0.9),width=.1) +
geom_jitter(alpha=0.5) +
facet_wrap(organic~location+fruit) +
xlab("Fruit") +
ylab("Price") +
labs(fill="country")
p <- p + geom_smooth(aes(group=1,color=country), method="lm", se=FALSE)
p
I'm using R and ggplot2 to analyze some statistics from basketball games. I'm new to R and ggplot, and I like the results I'm getting, given my limited experience. But as I go along, I find that my code gets repetitive; which I dislike.
I created several plots similar to this one:
Code:
efgPlot <- ggplot(gmStats, aes(EFGpct, Nrtg)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(gmStats$plg_ShortName))
Only difference between the plots is the x-value; next plot would be:
orPlot <- ggplot(gmStats, aes(ORpct, Nrtg)) +
stat_smooth(method = "lm") + ... # from here all is the same
How could I refactor this, such that I could do something like:
efgPlot <- getPlot(gmStats, EFGpct, Nrtg))
orPlot <- getPlot(gmStats, ORpct, Nrtg))
Update
I think my way of refactoring this isn't really "R-ish" (or ggplot-ish if you will); based on baptiste's comment below, I solved this without refactoring anything into a function; see my answer below.
The key to this sort of thing is using aes_string rather than aes (untested, of course):
getPlot <- function(data,xvar,yvar){
p <- ggplot(data, aes_string(x = xvar, y = yvar)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(data$plg_ShortName))
print(p)
invisible(p)
}
aes_string allows you to pass variable names as strings, rather than expressions, which is more convenient when writing functions. Of course, you may not want to hard code to color and shape scales, in which case you could use aes_string again for those.
Although Joran's answer helpt me a lot (and he accurately answers my question), I eventually solved this according to baptiste's suggestion:
# get the variablesI need from the stats data frame:
forPlot <- gmStats[c("wed_ID","Nrtg","EFGpct","ORpct","TOpct","FTTpct",
"plg_ShortName","Home")]
# melt to long format:
forPlot.m <- melt(forPlot, id=c("wed_ID", "plg_ShortName", "Home","Nrtg"))
# use fact wrap to create 4 plots:
p <- ggplot(forPlot.m, aes(value, Nrtg)) +
geom_point(aes(shape=plg_ShortName, colour=plg_ShortName)) +
scale_shape_manual(values=as.numeric(forPlot.m$plg_ShortName)) +
stat_smooth(method="lm") +
facet_wrap(~variable,scales="free")
Which gives me: