That is gonna be a very basic and naive question, but as my R and programming skills are very limited, I have no idea how to solve it. I would really appreciate if you guys could help me on this.
I want to plot multiple correlation plots comparing a fixed x-axis (Sepal.Length, in the example below) with each column on my dataset as y-axis (Sepal.Width, Petal.Length and Petal.Width). I suspect I might need to use apply, but I don't know how to build it in a function.
Right now I am able to do it manually one by one, but that is not helpful at all. Bellow, I am sharing the piece of the code I would like to apply to every column in my dataset.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
geom_smooth(aes(group = 1), method=lm,) +
geom_point(size=4, shape=20, alpha=0.6) + theme(legend.position="none") +
annotate(x=min(iris$Sepal.Width),y=min(iris$Sepal.Width),hjust=.2,
label=paste("R = ", round(cor(iris$Sepal.Width, iris$Sepal.Width),2)),
geom="text", size=4)
After generating all plots my idea is plot all of them side by side using grid.arrange package.
Are you looking something like this?
library(tidyr)
library(dplyr)
library(ggplot2)
iris %>% select(-Species) %>%
gather(YCol, YValue, -Sepal.Length) %>%
ggplot(aes(x=Sepal.Length, y=YValue)) +
geom_point() +
facet_grid(YCol~.)
It contains same Y-axis but if you do not want, then you could use scales="free_y".
Related
I have two cor.test results. I would like to visualize this in a plot using geom_smooth. So far, I have a working code with one regression line, but I don't know how to add a second regression line to the same plot. This is my code so far:
cor(opg6wave1$godimportant, opg6wave1$aj, use = 'complete.obs')#-0.309117
cor(opg6wave6$godimportant, opg6wave6$aj, use = 'complete.obs') ##=-0.4321519
ggplot(opg6wave1, aes(x= godimportant, y= aj))+
geom_smooth()+
labs(title = "Religion og abort over tid", x='Religiøsitet', y= 'Holdning til abort')+
theme_classic()
Thank y'all:)
I don't have access to your dataset, you might want to share it? I'm using the diamonds dataset from tidyverse. By putting the dataset in the ggplot(...) command you then have it transfer to any underlying geom_.... You want to specify the data for each regression line separately. We can have two geom_smooth() by specifying the data for each of them separately.
library(tidyverse)
ggplot()+
geom_smooth(diamonds %>% filter(color=="E"),
mapping=aes(x=depth, y=price))+
geom_smooth(diamonds %>% filter(color=="J"),
mapping=aes(x=depth, y=price)) +
theme_classic()
The above for linear model smooth:
ggplot()+
geom_smooth(diamonds %>% filter(color=="E"),
mapping=aes(x=depth, y=price),
method=lm)+
geom_smooth(diamonds %>% filter(color=="J"),
mapping=aes(x=depth, y=price),
method=lm) +
theme_classic()
So I'm self-teaching myself R right now using this online resource: "https://r4ds.had.co.nz/data-visualisation.html#facets"
This particular section is going over the use of facet_wrap and facet_grid. It's clear to me that facet_grid is primarily used when wanting to visualize a plot along two additional dimensions, rather than just one. What I don't understand is why you can use facet_grid(.~variable) or facet_grid(variable~.) to basically achieve the same result as facet_wrap. Putting a "." in place of a variable results in just not faceting along the row or column dimension, or in other words showing 1 additional variable just as facet_wrap would do.
If anyone can shed some light on this, thank you!
If you use facet_grid, the facets will always be in one row/column. They will never wrap to make a rectangle. But really if you just have one variable with few levels, it doesn't much matter.
You can also see that facet_grid(.~variable) and facet_grid(variable~.) will put the facet labels in different places (row headings vs column headings)
mg <- ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point()
mg + facet_grid(vs~ .) + labs(title="facet_grid(vs~ .)"),
mg + facet_grid(.~ vs) + labs(title="facet_grid(.~ vs)")
So in the most simple of cases, there's nothing that different between them. The main reason to use facet_grid is to have a single, common axis for all facets so you can easily scan across all panels to make a direct comparison of data.
Actually, the same result is not produced all the time...
The number of facets which appear across the graphs pane is fixed with facet_grid (always the number of unique values in the variable) where as facet_wrap, like its name suggests, wraps the facets around the graphics pane. In this way the functions only result in the same graph when the number of facets produced is small.
Both facet_grid and facet_wrap take their arguments in the form row~columns, and nowdays we don't need to use the dot with facet_grid.
In order to compare their differences let's add a new variable with 8 unqiue values to the mtcars data set:
library(tidyverse)
mtcars$example <- rep(1:8, length.out = 32)
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_grid(~example, labeller = label_both)
Which results in a cluttered plot:
Compared to:
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_wrap(~example, labeller = label_both)
Which results in:
I want to compare two histograms in a graph in R, but couldn't imagined and implemented.
My histograms are based on two sub-dataframes and these datasets divided according to a type (Action, Adventure Family)
My first histogram is:
split_action <- split(df, df$type)
dataset_action <- split_action$Action
hist(dataset_action$year)
split_adventure <- split(df, df$type)
dataset_adventure <- split_adventure$Adventure
hist(dataset_adventure$year)
I want to see how much overlapping is occured, their comparison based on year in the same histogram. Thank you in advence.
Using the iris dataset, suppose you want to make a histogram of sepal length for each species. First, you can make 3 data frames for each species by subsetting.
irissetosa<-subset(iris,Species=='setosa',select=c('Sepal.Length','Species'))
irisversi<-subset(iris,Species=='versicolor',select=c('Sepal.Length','Species'))
irisvirgin<-subset(iris,Species=='virginica',select=c('Sepal.Length','Species'))
and then, make the histogram for these 3 data frames. Don't forget to set the argument "add" as TRUE (for the second and third histogram), because you want to combine the histograms.
hist(irissetosa$Sepal.Length,col='red')
hist(irisversi$Sepal.Length,col='blue',add=TRUE)
hist(irisvirgin$Sepal.Length,col='green',add=TRUE)
you will have something like this
Then you can see which part is overlapping...
But, I know, it's not so good.
Another way to see which part is overlapping is by using density function.
plot(density(irissetosa$Sepal.Length),col='red')
lines(density(irisversi$Sepal.Length),col='blue')
lines(density(irisvirgin$Sepal.Length,col='green'))
Then you will have something like this
Hope it helps!!
You don't need to split the data if using ggplot. The key is to use transparency ("alpha") and change the value of the "position" argument to "identity" since the default is "stack".
Using the iris dataset:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_histogram(binwidth=0.2, alpha=0.5, position="identity") +
theme_minimal()
It's not easy to see the overlap, so a density plot may be a better choice if that's the main objective. Again, use transparency to avoid obscuring overlapping plots.
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_density(alpha=0.5) +
xlim(3.9,8.5) +
theme_minimal()
So for your data, the command would be something like this:
ggplot(data=df, aes(x=year, fill=type)) +
geom_histogram(alpha=0.5, position="identity")
Using ggplot2 I have made facetted histograms using the following code.
library(ggplot2)
library(plyr)
df1 <- data.frame(monthNo = rep(month.abb[1:5],20),
classifier = c(rep("a",50),rep("b",50)),
values = c(seq(1,10,length.out=50),seq(11,20,length.out=50))
)
means <- ddply (df1,
c(.(monthNo),.(classifier)),
summarize,
Mean=mean(values)
)
ggplot(df1,
aes(x=values, colour=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo,ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1)
The vertical line showing means per month is to stay.
But I want to also add text over these vertical lines displaying the mean values for each month. These means are from the 'means' data frame.
I have looked at geom_text and I can add text to plots. But it appears my circumstance is a little different and not so easy. It's a lot simpler to add text in some cases where you just add values of the plotted data points. But cases like this when you want to add the mean and not the value of the histograms I just can't find the solution.
Please help. Thanks.
Having noted the possible duplicate (another answer of mine), the solution here might not be as (initially/intuitively) obvious. You can do what you need if you split the geom_text call into two (for each classifier):
ggplot(df1, aes(x=values, fill=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo, ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="a",]) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="b",])
I'm assuming you can format the numbers to the appropriate precision and place them on the y-axis where you need to with this code.
Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green