I have two cor.test results. I would like to visualize this in a plot using geom_smooth. So far, I have a working code with one regression line, but I don't know how to add a second regression line to the same plot. This is my code so far:
cor(opg6wave1$godimportant, opg6wave1$aj, use = 'complete.obs')#-0.309117
cor(opg6wave6$godimportant, opg6wave6$aj, use = 'complete.obs') ##=-0.4321519
ggplot(opg6wave1, aes(x= godimportant, y= aj))+
geom_smooth()+
labs(title = "Religion og abort over tid", x='Religiøsitet', y= 'Holdning til abort')+
theme_classic()
Thank y'all:)
I don't have access to your dataset, you might want to share it? I'm using the diamonds dataset from tidyverse. By putting the dataset in the ggplot(...) command you then have it transfer to any underlying geom_.... You want to specify the data for each regression line separately. We can have two geom_smooth() by specifying the data for each of them separately.
library(tidyverse)
ggplot()+
geom_smooth(diamonds %>% filter(color=="E"),
mapping=aes(x=depth, y=price))+
geom_smooth(diamonds %>% filter(color=="J"),
mapping=aes(x=depth, y=price)) +
theme_classic()
The above for linear model smooth:
ggplot()+
geom_smooth(diamonds %>% filter(color=="E"),
mapping=aes(x=depth, y=price),
method=lm)+
geom_smooth(diamonds %>% filter(color=="J"),
mapping=aes(x=depth, y=price),
method=lm) +
theme_classic()
Related
I want to display two histograms for scores on a survey, one for female and one for male. Now I want to add vertical lines to display their respective means, but my problem is that with my first approach I have both means displayed in each facet. So i tried to look up solutions and what I have found so far is that I should create a second data frame which contains the means, so I tried using the code I originally used for calculating the means in the first place
demo_joined%>%
filter(!is.na(overall_score))%>%
group_by(gender)%>%
summarize(mean_score = mean(overall_score)) -> means
Plugging this into the rest of my code for the plot looked like this for me:
demo_joined%>%
filter(!is.na(overall_score))%>%
ggplot + geom_histogram(aes(overall_score)) + facet_grid(.~gender) +
geom_vline(data = means, xintercept = mean_score)
However geom_vline doesnt seem to understand that I want it to look at my means data frame, as the error message is this:
Error in new_data_frame(list(xintercept = xintercept)) :
object 'mean_score' not found
In addition: Warning message:
geom_vline(): Ignoring `data` because `xintercept` was provided.
Thanks a lot in advance for any help
Oke sorry this is my first StackOverflow post already wrote everything into the other category, I hope this is fine
Using additional data for the mean is not advised, as complicated things unnecessarily. Instead, you should calculate the mean by using mutate() so every observation also shows the group mean. Then you can plot the mean within one command. I made an example with the iris dataset. This should work for you.
iris %>%
group_by(Species) %>%
mutate(mean= mean(Sepal.Length)) %>%
ungroup %>%
ggplot(aes(x=Sepal.Length))+
geom_histogram()+
geom_vline(aes(xintercept = mean), color="red")+
facet_wrap(~Species)
Here is a base version of what Nick made.
library(ggplot2)
species.means <- aggregate(Sepal.Length ~ Species, data = iris, FUN = mean)
ggplot(iris, aes(x = Sepal.Length)) +
theme_bw() +
geom_histogram() +
geom_vline(data = species.means, mapping = aes(xintercept = Sepal.Length), color = "red") +
facet_wrap(~ Species)
I am struggling to get significance values of my experiment replicate data. Experiment done in duplicate for each species and i want to compare how significant the values are for each time point between each species. I am trying to do two-way ANOVA...
library(ggplot2)
library(reshape)
library(dplyr)
abs2.melt<-melt(abs2,
id.vars='Time',
measure.vars=c('WT','WT.1','DsigB','DsigB.1','DrsbR','DrsbR.1'))
print(abs2.melt)
abs2.melt.mod<-abs2.melt %>%
separate(col=variable,into=c('Species'),sep='\\.')
print(abs2.melt.mod)
ggplot(abs2.melt.mod,aes(x=Time,y=value,group=Species))+
stat_summary(
fun =mean,
geom="line",
aes(color=Species))+
stat_summary(
fun=mean,
geom="point")+
stat_summary(
fun.data=mean_cl_boot,
geom='errorbar',
width=2)+
theme_bw()+
xlab("Time")+
ylab("OD600")+
labs(title="Growth Curve of Mutant Strains")
summary(abs2.melt.mod)
print(abs2.melt.mod)
###SD and mean values
as.data.frame<-abs2.melt.mod %>% group_by(Species,Time) %>%
summarize(mean.val=mean(value), sd.val=sd(value))
anova1<-aov(value~Species,data=abs2.melt.mod)
##statistical significance?
print(as.data.frame)
anova1<-aov(Time~Species+value,data=abs2.melt.mod)
summary(anova1)
Simulate something that looks like your data
set.seed(111)
df = expand.grid(rep=1:3,Time=1:5,Species=letters[1:3])
df$value = 0.5*df$Time + rnorm(nrow(df))
df$Time = factor(df$Time)
Then we plot, allowing comparison for each time point:
library(ggplot2)
ggplot(df,aes(x=Time,y=value,col=Species)) +
stat_summary(fun.data="mean_sdl",position=position_dodge(width=0.5))
Or error bar which i think looks bad:
ggplot(df,aes(x=Time,y=value,col=Species))+
stat_summary(fun.data="mean_sdl",position=position_dodge(width=0.5),
geom="errorbar",width=0.4)
Since you have a few data points, no point doing a boxplot, so you can try something like the above
My data is "Boston Housing Dataset", I want to produce a graphic that looks like this:
the code for the plot is in Python (unfortunately, I do not know python only r). link for the Code: kaggle.com/prasadperera/the-boston-housing-dataset .
but instead of y='medv' i need y='crim' with all the rest of the variable, in order to find predictors that have an interesting association with the crime.
I have tried to do this in r, my code is:
Very appreciative of the help, Thanks!
One way to do something similar in ggplot is with faceting. Here's what that might look like
library(ggplot2)
library(dplyr)
library(tidyr)
data(Boston, package="MASS")
Boston %>%
select(crim, lstat, indus, nox, ptratio, rm, tax, dis, age) %>%
gather(obs, val, -crim) %>%
ggplot(aes(val, crim, color=obs, fill=obs)) +
geom_smooth(method="lm", se=TRUE) +
geom_point() +
facet_wrap(~obs, scales="free_x") +
scale_color_discrete(guide=FALSE) +
scale_fill_discrete(guide=FALSE)
The basic key is to reshape the data so you can draw just one plot and facet it rather than drawing many plots and arranging them.
That is gonna be a very basic and naive question, but as my R and programming skills are very limited, I have no idea how to solve it. I would really appreciate if you guys could help me on this.
I want to plot multiple correlation plots comparing a fixed x-axis (Sepal.Length, in the example below) with each column on my dataset as y-axis (Sepal.Width, Petal.Length and Petal.Width). I suspect I might need to use apply, but I don't know how to build it in a function.
Right now I am able to do it manually one by one, but that is not helpful at all. Bellow, I am sharing the piece of the code I would like to apply to every column in my dataset.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
geom_smooth(aes(group = 1), method=lm,) +
geom_point(size=4, shape=20, alpha=0.6) + theme(legend.position="none") +
annotate(x=min(iris$Sepal.Width),y=min(iris$Sepal.Width),hjust=.2,
label=paste("R = ", round(cor(iris$Sepal.Width, iris$Sepal.Width),2)),
geom="text", size=4)
After generating all plots my idea is plot all of them side by side using grid.arrange package.
Are you looking something like this?
library(tidyr)
library(dplyr)
library(ggplot2)
iris %>% select(-Species) %>%
gather(YCol, YValue, -Sepal.Length) %>%
ggplot(aes(x=Sepal.Length, y=YValue)) +
geom_point() +
facet_grid(YCol~.)
It contains same Y-axis but if you do not want, then you could use scales="free_y".
My question has to do with facetting. In my example code below, I look at some facetted scatterplots, then try to overlay information (in this case, mean lines) on a per-facet basis.
The tl;dr version is that my attempts fail. Either my added mean lines compute across all data (disrespecting the facet variable), or I try to write a formula and R throws an error, followed by incisive and particularly disparaging comments about my mother.
library(ggplot2)
# Let's pretend we're exploring the relationship between a car's weight and its
# horsepower, using some sample data
p <- ggplot()
p <- p + geom_point(aes(x = wt, y = hp), data = mtcars)
print(p)
# Hmm. A quick check of the data reveals that car weights can differ wildly, by almost
# a thousand pounds.
head(mtcars)
# Does the difference matter? It might, especially if most 8-cylinder cars are heavy,
# and most 4-cylinder cars are light. ColorBrewer to the rescue!
p <- p + aes(color = factor(cyl))
p <- p + scale_color_brewer(pal = "Set1")
print(p)
# At this point, what would be great is if we could more strongly visually separate
# the cars out by their engine blocks.
p <- p + facet_grid(~ cyl)
print(p)
# Ah! Now we can see (given the fixed scales) that the 4-cylinder cars flock to the
# left on weight measures, while the 8-cylinder cars flock right. But you know what
# would be REALLY awesome? If we could visually compare the means of the car groups.
p.with.means <- p + geom_hline(
aes(yintercept = mean(hp)),
data = mtcars
)
print(p.with.means)
# Wait, that's not right. That's not right at all. The green (8-cylinder) cars are all above the
# average for their group. Are they somehow made in an auto plant in Lake Wobegon, MN? Obviously,
# I meant to draw mean lines factored by GROUP. Except also obviously, since the code below will
# print an error, I don't know how.
p.with.non.lake.wobegon.means <- p + geom_hline(
aes(yintercept = mean(hp) ~ cyl),
data = mtcars
)
print(p.with.non.lake.wobegon.means)
There must be some simple solution I'm missing.
You mean something like this:
rs <- ddply(mtcars,.(cyl),summarise,mn = mean(hp))
p + geom_hline(data=rs,aes(yintercept=mn))
It might be possible to do this within the ggplot call using stat_*, but I'd have to go back and tinker a bit. But generally if I'm adding summaries to a faceted plot I calculate the summaries separately and then add them with their own geom.
EDIT
Just a few expanded notes on your original attempt. Generally it's a good idea to put aes calls in ggplot that will persist throughout the plot, and then specify different data sets or aesthetics in those geom's that differ from the 'base' plot. Then you don't need to keep specifying data = ... in each geom.
Finally, I came up with a kind of clever use of geom_smooth to do something similar to what your asking:
p <- ggplot(data = mtcars,aes(x = wt, y = hp, colour = factor(cyl))) +
facet_grid(~cyl) +
geom_point() +
geom_smooth(se=FALSE,method="lm",formula=y~1,colour="black")
The horizontal line (i.e. constant regression eqn) will only extend to the limits of the data in each facet, but it skips the separate data summary step.