My goal is to produce two overlapping PMFs of binomial distributions using ggplot2, color-coded according to colors that I specify, with a legend at the bottom.
So far, I think I have set up the data frame right.
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(cbind(successes,freq,class))
However, this gives the wrong result.
library(ggplot2)
g <- ggplot(df1, aes(successes),y=freq)
g + geom_bar(aes(fill = class))
I feel like I'm following an example yet getting a totally different result. This (almost) does what I want: it would be exact if it gave relative frequencies.
g <- ggplot(mpg, aes(class))
g + geom_bar(aes(fill = drv))
A couple of questions:
1) Where am I going wrong in my block of code?
2) Is there a better way to show to PMFs in one graph? I'm not determined to use a histogram or bar chart.
3) How can I set this up to give me the ability to choose the colors?
4) How do I order the values on the x-axis? They aren't categories. They are the numbers 0-10 and have a natural order that I want to preserve.
Thanks!
UPDATE
The following two blocks worked.
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
ggplot(df1, aes(successes ,y=freq, fill = class)) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(values = c("blue", "green")) + theme_bw()
AND
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
ggplot(df1, aes(x=successes,y=freq),y=freq) +
geom_col(aes(fill = class)) +
scale_x_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(values = c("blue", "green")) + theme_bw()
I think your issue is that successes and freq are being changed to factors when you create df1
Maybe this is what you're thinking of?
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes = as.numeric(successes), freq = as.numeric(freq), class)
ggplot(df1, aes(x = successes, y = freq)) +
geom_bar(stat = "identity", aes(fill = class))
If not, happy to answer any further questions!
Is this what you're looking for?
library(ggplot2)
g <- ggplot(df1, aes(successes ,y=freq, fill = class))
g + geom_bar(stat = "identity") +
scale_fill_manual(values = c("blue", "green"))
Of course, keeping in mind you'd indeed change your dataframe creation to:
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
as suggested in the comments.
Related
Suppose I have some code like the following, generating a lineplot with a considerable number of lines (example taken from here)
library(ggplot2)
library(reshape2)
n = 1000
set.seed(123)
mat = matrix(rnorm(n^2), ncol=n)
cmat = apply(mat, 2, cumsum)
cmat = t(cmat)
rownames(cmat) = paste("trial", seq(n), sep="")
colnames(cmat) = paste("time", seq(n), sep="")
dat = as.data.frame(cmat)
dat$trial = rownames(dat)
mdat = melt(dat, id.vars="trial")
mdat$time = as.numeric(gsub("time", "", mdat$variable))
p = ggplot(mdat, aes(x=time, y=value, group=trial)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.1)
So here, "trial number" is my group producing all of these lines, and there are 1000 trials.
Suppose I want to "group my grouping variable" now - that is, I want to see the exact same lines in this plot, but I want the first 500 trial lines to be one color and the next 500 trial lines to be another. How can I do this with ggplot? I've been poking around for some time and I can't figure out how to manually set the colors per group.
Add a variable splitting the data into two groups, then add use it to color the lines in ggplot
dat = as.data.frame(cmat)
dat$trial = rownames(dat)
dat$group = rep(c("a","b"), each = n/2)
mdat = melt(dat, id.vars=c("trial", "group"))
mdat$time = as.numeric(gsub("time", "", mdat$variable))
p = ggplot(mdat, aes(x=time, y=value, group=trial, color = group)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.1)
One possible solution will be to create a new column with the index of the trial number and then using an ifelse condition, you can set different group based on the trial number and pass the grouping variable as color in aes such as:
mdat %>% mutate(Trial = as.numeric(sub("trial","",trial))) %>%
mutate(Group = ifelse(Trial < 51,"A","B")) %>%
ggplot(aes(x=time, y=value, group=trial, color = Group)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.8)
Is it what you are looking for ?
NB: I only use n = 100 to get smallest dataframe.
I am creating a facetted plot using facet_wrap. I want text labels to be included inside the bubble. Instead it seems the total is included as label - i.e. all graphs has the same numbers but different bubble size (which is correct).
(Edits)
My code:
Category1 <- c('A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B')
Category2 <- c('W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V')
Class <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
df <- data.frame(Category1, Category2, Class)
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) + scale_size_continuous(range = c(5, 10))
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
g2 <- g + geom_text(data=ggplot_build(g)$data[[1]], aes(x, y, label=n), size=2) #+ scale_size(range = c(5, 15))
g2
I expect that the size of the bubble will be indicated by the text inside the bubble. But the actual result is all graphs have the same number. I want the small bubble to have small number proportional to its size.
The problem is that your code using ggplot_build data does not have the same categories as the original. You need to create a count data before hand and use it for plotting.
Create count data
library(tidyverse)
df_count <- df %>%
count(Class, Category1, Category2)
Plot
There are two ways to incorporate this new data.
Method 1
The first example I show is to use both df and df_count. This method will modify your code minimally:
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) +
geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
The line geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) + is added.
Method 2
This method uses only the count data. It uses geom_point() instead of geom_count() and alter the size using the variable n. This method is probably better in terms of code readability.
g_alternative <- ggplot(df_count, aes(Category1, Category2, label = n)) +
facet_wrap(Class ~ ., nrow = 3) +
geom_point(col="tomato3", aes(size = n), show.legend=F) +
geom_text() +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g_alternative
The output looks like this:
I printed a matrix and I have two columns which I wish to get into a bar graph and I don't know how. When I tried it, it used the whole matrix as the X value. Here is my code
smoke <- matrix(c(53,42,40,40,39,34,34,30,28,24,22,21,20,16,'A','A','A','B','A','A','A','A','B','A','C','B','B', 'B'),nrow=14, ncol = 2)
colnames(smoke) <- c("NumberofBooks","Grade")
smoke <- as.table(smoke)
smoke
Any help into plotting the numbers against the letters would be greatly appreciated!
smoke <-
matrix(c(53,42,40,40,39,34,34,30,28,24,22,21,20,16,'A','A','A','B','A','A','A','A','B','A','C','B','B', 'B'),nrow=14, ncol = 2)
colnames(smoke) <- c("NumberofBooks","Grade")
smoke <- as.data.frame(smoke)
p <- ggplot(data = smoke, aes(y = NumberofBooks, x = Grade))
p <- (p
+ geom_bar(stat = 'Identity', position = 'dodge', color = 'white')
)
print(p)
Not exactly sure what you're looking for, but this can get you started. You'll need to change to a data.frame to use ggplot
In my opinion, the easiest way to put those values into a bar graph is just replacing matrix with data frame. Just like here:
number_of_books <- c(53,42,40,40,39,34,34,30,28,24,22,21,20,16)
grade <- c('A','A','A','B','A','A','A','A','B','A','C','B','B', 'B')
smoke <- data.frame(number_of_books,grade)
bar_graph <- ggplot(smoke, aes(number_of_books,grade)) + geom_bar(stat="identity")
Hope it'll help you.
Does this do what you are looking for?
library(dplyr)
library(ggplot2)
# install.packages("dplyr"); install.packages("ggplot2")
smoke <- matrix(c(53,42,40,40,39,34,34,30,28,24,22,21,20,16,'A','A','A','B','A','A','A',' A','B','A','C','B','B', 'B'),nrow=14, ncol = 2)
colnames(smoke) <- c("NumberofBooks","Grade")
smoke %>%
as.data.frame() %>%
group_by(Grade) %>%
summarise(NumberofBooks = mean(as.numeric(NumberofBooks))) %>%
ggplot(aes(x = Grade, y = NumberofBooks)) +
geom_bar(stat = "identity") +
xlab("Grade") +
ylab("Average Number of Books") +
ggtitle("Average number of book by Grade")
I have a data frame with five columns and five rows. the data frame looks like this:
df <- data.frame(
day=c("m","t","w","t","f"),
V1=c(5,10,20,15,20),
V2=c(0.1,0.2,0.6,0.5,0.8),
V3=c(120,100,110,120,100),
V4=c(1,10,6,8,8)
)
I want to do some plots so I used the ggplot and in particular the geom_bar:
ggplot(df, aes(x = day, y = V1, group = 1)) + ylim(0,20)+ geom_bar(stat = "identity")
ggplot(df, aes(x = day, y = V2, group = 1)) + ylim(0,1)+ geom_bar(stat = "identity")
ggplot(df, aes(x = day, y = V3, group = 1)) + ylim(50,200)+ geom_bar(stat = "identity")
ggplot(df, aes(x = day, y = V4, group = 1)) + ylim(0,15)+ geom_bar(stat = "identity")
My question is, How can I do a grouped ggplot with geom_bar with multiple y axis? I want at the x axis the day and for each day I want to plot four bins V1,V2,V3,V4 but with different range and color. Is that possible?
EDIT
I want the y axis to look like this:
require(reshape)
data.m <- melt(df, id.vars='day')
ggplot(data.m, aes(day, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
facet_grid(variable ~ .)
You can also change the y-axis limits if you like (here's an example).
Alternately you may have meant grouped like this:
require(reshape)
data.m <- melt(df, id.vars='day')
ggplot(data.m, aes(day, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")
For the latter examples if you want 2 Y axes then you just create the plot twice (once with a left y axis and once with a right y axis) then use this function:
double_axis_graph <- function(graf1,graf2){
graf1 <- graf1
graf2 <- graf2
gtable1 <- ggplot_gtable(ggplot_build(graf1))
gtable2 <- ggplot_gtable(ggplot_build(graf2))
par <- c(subset(gtable1[['layout']], name=='panel', select=t:r))
graf <- gtable_add_grob(gtable1, gtable2[['grobs']][[which(gtable2[['layout']][['name']]=='panel')]],
par['t'],par['l'],par['b'],par['r'])
ia <- which(gtable2[['layout']][['name']]=='axis-l')
ga <- gtable2[['grobs']][[ia]]
ax <- ga[['children']][[2]]
ax[['widths']] <- rev(ax[['widths']])
ax[['grobs']] <- rev(ax[['grobs']])
ax[['grobs']][[1]][['x']] <- ax[['grobs']][[1]][['x']] - unit(1,'npc') + unit(0.15,'cm')
graf <- gtable_add_cols(graf, gtable2[['widths']][gtable2[['layout']][ia, ][['l']]], length(graf[['widths']])-1)
graf <- gtable_add_grob(graf, ax, par['t'], length(graf[['widths']])-1, par['b'])
return(graf)
}
I believe there's also a package or convenience function that does the same thing.
First I reshaped as described in the documentation in the link below the question.
In general ggplot does not support multiple y-axis. I think it is a philosophical thing. But maybe faceting will work for you.
df <- read.table(text = "day V1 V2 V3 V4
m 5 0.1 120 1
t 10 0.2 100 10
w 2 0.6 110 6
t 15 0.5 120 8
f 20 0.8 100 8", header = TRUE)
library(reshape2)
df <- melt(df, id.vars = 'day')
ggplot(df, aes(x = variable, y = value, fill = variable)) + geom_bar(stat = "identity") + facet_grid(.~day)
If I understand correctly you want to include facets in your plot. You have to use reshape2 to get the data in the right format. Here's an example with your data:
df <- data.frame(
day=c("m","t","w","t","f"),
V1=c(5,10,20,15,20),
V2=c(0.1,0.2,0.6,0.5,0.8),
V3=c(120,100,110,120,100),
V4=c(1,10,6,8,8)
)
library(reshape2)
df <- melt(df, "day")
Then plot with and include facet_grid argument:
ggplot(df, aes(x=day, y=value)) + geom_bar(stat="identity", aes(fill=variable)) +
facet_grid(variable ~ .)
If you run the code below you will a line graph. How can I change the color of the point at x = 2 to RED and increase it's size?
In this case the on the graph the point at (.6) where x = 2 would be highlighted red and made bigger.
Here is my code:
library("ggplot2")
data<-data.frame(time= c(1,2,3), value = c(.4,.6,.7))
ggplot(data, aes( x = time, y=value) ) + geom_line() + geom_point(shape = 7,size = 1)
Thank you!
If your dataset is small you could do this:
> library("ggplot2")
> data<-data.frame(time= c(1,2,3), value = c(.4,.6,.7),point_size=c(1,10,1),cols=c('black','red','black'))
> ggplot(data, aes( x = time, y=value) ) + geom_line() + geom_point(shape = 7,size = data$point_size, colour=data$cols)
Makes:
Also I would not advise calling your data frame data
In addition to #Harpal's solution, you can add two more columns to your data frame where pointsize and -color is specified according to particular conditions:
df <- data.frame(time= c(1,2,3), value = c(.4,.6,.7))
# specify condition and pointsize here
df$pointsize <- ifelse(df$value==0.6, 5, 1)
# specify condition and pointcolour here
df$pointcol <- ifelse(df$value==0.6, "red", "black")
ggplot(df, aes(x=time, y=value)) + geom_line() + geom_point(shape=7, size=df$pointsize, colour=df$pointcol)
You may change ifelse(df$value==0.6, 5, 1) to meet any criteria you like, or you use a more complex approach to specifiy more conditions to be met:
df <- data.frame(time= c(1,2,3), value = c(.4,.6,.7))
df$pointsize[which(df$value<0.6)] <- 1
df$pointsize[which(df$value>0.6)] <- 8
df$pointsize[which(df$value==0.6)] <- 5
df$pointcol[which(df$value<0.6)] <- "black"
df$pointcol[which(df$value>0.6)] <- "green"
df$pointcol[which(df$value==0.6)] <- "red"
ggplot(df, aes(x=time, y=value)) + geom_line() + geom_point(shape=7, size=df$pointsize, colour=df$pointcol)