When I use following code to generate a density plot:
require(ggplot2)
set.seed(seed=10)
n <- 10000
s.data <- data.frame(score = rnorm(n,500,100),
gender = sample(c("Male","Female","No Response"),size=n,replace=T,prob=c(.4,.55,.05)),
major = sample(c("A","B","C","D"),size=n,replace=T,prob=c(.02,.25,.05,.68)))
ggplot(s.data, aes(major,..density..,fill=major,group=1)) +
geom_histogram() + facet_wrap(~ gender)
I cannot distinguish between categories of "major" by color.
What I want to get is density plot similar to this frequency plot in the sense of colors and legend:
ggplot(s.data, aes(major,fill=major)) +
geom_histogram() + facet_wrap(~ gender)
This question is following my question (here) which is already answered here.
You can still try frequency plot with facet parameter scale="free_y":
ggplot(s.data, aes(major,..count..,fill=major)) +
geom_histogram() + facet_wrap(~ gender, scale="free_y")
Related
I want to plot some numeral Data with scatter plot. I used the following code to plot the Data as a scatter with using the same axes for the different variables.
library(car)
data("Anscombe")
mydat <- melt(Anscombe,"urban")
ggplot(mydat,aes(value,urban ))+geom_point() +
facet_grid(.~variable)+geom_smooth(method="lm", se=F)
here is the plot, and the range of x-axis value are the same for three variables. I can not see the points of variable educations very well.
plot1
so i try to change the range of x-axis. Below are the code.
ggplot(mydat,aes(value,urban ))+geom_point() +
facet_grid(.~variable)+ geom_smooth(method="lm", se=F)+
coord_cartesian(xlim = c(0,450), ylim = NULL, expand = TRUE)
Now the I can see the value of variable education. but the value of income is gone, because the value of income is > 450.
plot2
how can i change the x-axis value of each variables instead of change all? I would be grateful if anybody can help me?
What you need is the additional argument scales = "free":
ggplot(mydat, aes(value, urban)) +
geom_point() +
facet_grid(. ~ variable, scales = "free") +
geom_smooth(method = "lm", se = FALSE)
How do I draw a horizontal line indicating the Highest (Posterior) Density interval for faceted density plots in ggplot2? This is what I have tried:
# Functions to calculate lower and upper part of HPD.
hpd_lower = function(x) coda::HPDinterval(as.mcmc(x))[1]
hpd_upper = function(x) coda::HPDinterval(as.mcmc(x))[2]
# Data: two groups with different means
df = data.frame(value=c(rnorm(500), rnorm(500, mean=5)), group=rep(c('A', 'B'), each=500))
# Plot it
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(aes(x=hpd_lower(value), xend=hpd_upper(value), y=0, yend=0), size=3)
As you can see, geom_segment computes on all data for both facets whereas I would like it to respect the faceting. I would also like a solution where HPDinterval is only run once per facet.
Pre-calculate the hpd intervals. ggplot evaluates the calculations in the aes() function in the entire data frame, even when data are grouped.
# Plot it
library(dplyr)
df_hpd <- group_by(df, group) %>% summarize(x=hpd_lower(value), xend=hpd_upper(value))
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(data = df_hpd, aes(x=x, xend=xend, y=0, yend=0), size=3)
library(alr4)
par(mfrow = c(2,2))
ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
I would like to create 4 histograms from the data set walleye. I would like the histograms to be for the length of the walleye. The for histograms should each have their own age for counting. I would like to restrict the ages from 1 to 4. How can I do that with ggplot?
If I understand what you are trying to do correctly, this should help:
library(alr4)
library(ggplot2)
ggplot(subset(walleye, age<5), aes(x=length)) + geom_histogram() + facet_grid(~age)
This way you are only plotting the subset of the data where age is 1-4, and you are actually plotting histograms of length.
You could try this too (adding another line of code on top of your code):
library(alr4)
library(ggplot2)
p <- ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
p %+% subset(walleye, age %in% 1:4)
This question already has answers here:
What is the simplest method to fill the area under a geom_freqpoly line?
(4 answers)
Closed 6 years ago.
I am plotting a continuous variable in X-axis against the the corresponding counts (not the density) in the Y-axis using ggplot2.
This is my code
p <- ggplot(matched.frame, aes(x = AGE, color = as.factor(DRUG_KEY))) + geom_freqpoly(binwidth=5)
p1 <- p + theme_minimal()
plot(p1)
This produces a graph like this this:
I want the areas under these lines to be filled with colors and with little bit of transparency. I know to do this for density plots in ggplot2, but I am stuck with this frequency polygon.
Also, how do I change the legends on the right side? For example, I want 'Cases' instead of 26 and Controls instead of '27'. Instead of as.factor(DRUG_KEY), I want it to appear as 'Colors"
Sample data
matched.frame <- data.frame("AGE"=c(18,19,20,21,22,23,24,25,26,26,27,18,19,20,24,23,23,23,22,30,28,89,30,20,23))
matched.frame$DRUG_KEY <- 26
matched.frame$DRUG_KEY[11:25] <- 27
You can use geom_ribbon to fill the area under the curves and scale_fill_discrete (fill color) as well as scale_color_discrete (line color) to change the legend labels:
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:10, y = runif(20), f = gl(2, 10))
ggplot(df, aes(x=x, ymin=0, ymax=y, fill=f)) +
geom_ribbon(, alpha=.5) +
scale_fill_discrete(labels = c("1"="foo", "2"="bar"), name = "Labels")
With regards to your edit:
ggplot(matched.frame, aes(x=AGE, fill=as.factor(DRUG_KEY), color=as.factor(DRUG_KEY))) +
stat_bin(aes(ymax=..count..,), alpha=.5, ymin=0, geom="ribbon", binwidth =5, position="identity", pad=TRUE) +
geom_freqpoly(binwidth=5, size=2) +
scale_fill_discrete(labels = c("26"="foo", "27"="bar"), name = "Labels") +
scale_color_discrete(labels = c("26"="foo", "27"="bar"), name = "Labels")
I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.
You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()