boxplot ggplot2::qplot() ungrouped and grouped data in same plot R - r

My data set features a factor(TypeOfCat) and a numeric (AgeOfCat).
I've made the below box plot. In addition to a box representing each type of cat, I've also tried to add a box representing the ungrouped data (ie the entire cohort of cats and their ages). What I've got is not quite what I'm after though, as sum() of course won't provide all the information needed to create such a plot. Any help would be much appreciated.
Data set and current code:
Df1 <- data.frame(TypeOfCat=c("A","B","B","C","C","A","B","C","A","B","A","C"),
AgeOfCat=c(14,2,5,8,4,5,2,6,3,6,12,7))
Df2 <- data.frame(TypeOfCat=c("AllCats"),
AgeOfCat=sum(Df1$AgeOfCat)))
Df1 <- rbind(Df1, Df2)
qplot(Df1$TypeOfCat,Df1$AgeOfCat, geom = "boxplot") + coord_flip()

No need for sum. Just take all the values individually for AllCats:
# Your original code:
library(ggplot2)
Df1 <- data.frame(TypeOfCat=c("A","B","B","C","C","A","B","C","A","B","A","C"),
AgeOfCat=c(14,2,5,8,4,5,2,6,3,6,12,7))
# this is the different part:
Df2 <- data.frame(TypeOfCat=c("AllCats"),
AgeOfCat=Df1$AgeOfCat)
Df1 <- rbind(Df1, Df2)
qplot(Df1$TypeOfCat,Df1$AgeOfCat, geom = "boxplot") + coord_flip()
You can see you have all the observations if you add geom_point to the boxplot:
ggplot(Df1, aes(TypeOfCat, AgeOfCat)) +
geom_boxplot() +
geom_point(color='red') +
coord_flip()

Like this?
library(ggplot2)
# first double your data frame, but change "TypeOfCat", since it contains all:
df <- rbind(Df1, transform(Df1, TypeOfCat = "AllCats"))
# then plot it:
ggplot(data = df, mapping = aes(x = TypeOfCat, y = AgeOfCat)) +
geom_boxplot() + coord_flip()

Related

Represent dataset in column bar in R using ggplot [duplicate]

I have a csv file which looks like the following:
Name,Count1,Count2,Count3
application_name1,x1,x2,x3
application_name2,x4,x5,x6
The x variables represent numbers and the applications_name variables represent names of different applications.
Now I would like to make a barplot for each row by using ggplot2. The barplot should have the application_name as title. The x axis should show Count1, Count2, Count3 and the y axis should show the corresponding values (x1, x2, x3).
I would like to have a single barplot for each row, because I have to store the different plots in different files. So I guess I cannot use "melt".
I would like to have something like:
for each row in rows {
print barplot in file
}
Thanks for your help.
You can use melt to rearrange your data and then use either facet_wrap or facet_grid to get a separate plot for each application name
library(ggplot2)
library(reshape2)
# example data
mydf <- data.frame(name = paste0("name",1:4), replicate(5,rpois(4,30)))
names(mydf)[2:6] <- paste0("count",1:5)
# rearrange data
m <- melt(mydf)
# if you are wanting to export each plot separately
# I used facet_wrap as a quick way to add the application name as a plot title
for(i in levels(m$name)) {
p <- ggplot(subset(m, name==i), aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
ggsave(paste0("figure_",i,".pdf"), p)
}
# or all plots in one window
ggplot(m, aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
I didn't see #user20650's nice answer before preparing this. It's almost identical, except that I use plyr::d_ply to save things instead of a loop. I believe dplyr::do() is another good option (you'd group_by(Name) first).
yourData <- data.frame(Name = sample(letters, 10),
Count1 = rpois(10, 20),
Count2 = rpois(10, 10),
Count3 = rpois(10, 8))
library(reshape2)
yourMelt <- melt(yourData, id.vars = "Name")
library(ggplot2)
# Test a function on one piece to develope graph
ggplot(subset(yourMelt, Name == "a"), aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = subset(yourMelt, Name == 'a')$Name)
# Wrap it up, with saving to file
bp <- function(dat) {
myPlot <- ggplot(dat, aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = dat$Name)
ggsave(filname = paste0("path/to/save/", dat$Name, "_plot.pdf"),
myPlot)
}
library(plyr)
d_ply(yourMelt, .variables = "Name", .fun = bp)

Plot multiple distributions by year using ggplot Boxplot

I'm trying to evaluate the above data in a boxplot similar to this: https://www.r-graph-gallery.com/89-box-and-scatter-plot-with-ggplot2.html
I want the x axis to reflect my "Year" variable and each boxplot to evaluate the 8 methods as a distribution. Eventually I'd like to pinpoint the "Selected" variable in relation to that distribution but currently I just want this thing to render!
I figure out how to code my y variable and I get various errors no matter what I try. I think the PY needs to be as.factor but I've tried some code that way and I just get other errors.
anyway here is my code (Send Help):
# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(ggplot2)
library(readxl) # For reading in Excel files
library(lubridate) # For handling dates
library(dplyr) # for mutate and pipe functions
# Path to current and prior data folders
DataPath_Current <- "C:/R Projects/Box Plot Test"
Ult_sum <- read_excel(path = paste0(DataPath_Current, "/estimate.XLSX"),
sheet = "Sheet1",
range = "A2:J12",
guess_max = 100)
# just want to see what my table looks like
Ult_sum
# create a dataset - the below is code I commented out
# data <- data.frame(
# name=c(Ult_sum[,1]),
# value=c(Ult_sum[1:11,2:8])
#)
value <- Ult_sum[2,]
# Plot
Ult_sum %>%
ggplot( aes(x= Year, y= value, fill=Year)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A boxplot with jitter") +
xlab("")
I do not see how your code matches the screenshot of your dataset. However, just a general hint: ggplot likes data in long format. I suggest you reshape your data using tidyr::reshape_long oder data.table::melt. This way you get 3 columns: year, method, value, of which the first two should be a factor. The resulting dataset can then be neatly used in aes() as aes(x=year, y=value, fill=method).
Edit: Added an example. Does this do what you want?
library(data.table)
library(magrittr)
library(ggplot2)
DT <- data.table(year = factor(rep(2010:2014, 10)),
method1 = rnorm(50),
method2 = rnorm(50),
method3 = rnorm(50))
DT_long <- DT %>% melt(id.vars = "year")
ggplot(DT_long, aes(x = year, y = value, fill = variable)) +
geom_boxplot()

Plotting a bar graph in R

Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)

ggplot2-line plotting with TIME series and multi-spline

This question's theme is simple but drives me crazy:
1. how to use melt()
2. how to deal with multi-lines in single one image?
Here is my raw data:
a 4.17125 41.33875 29.674375 8.551875 5.5
b 4.101875 29.49875 50.191875 13.780625 4.90375
c 3.1575 29.621875 78.411875 25.174375 7.8012
Q1:
I've learn from this post Plotting two variables as lines using ggplot2 on the same graph to know how to draw the multi-lines for multi-variables, just like this:
The following codes can get the above plot. However, the x-axis is indeed time-series.
df <- read.delim("~/Desktop/df.b", header=F)
colnames(df)<-c("sample",0,15,30,60,120)
df2<-melt(df,id="sample")
ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) + geom_line() + geom_point()
I wish it could treat 0 15 30 60 120 as real number to show the time series, rather than name_characteristics. Even having tried this, I failed.
row.names(df)<-df$sample
df<-df[,-1]
df<-as.matrix(df)
df2 <- data.frame(sample = factor(rep(row.names(df),each=5)), Time = factor(rep(c(0,15,30,60,120),3)),Values = c(df[1,],df[2,],df[3,]))
ggplot(data = df2, aes(x=Time, y= Values, group = sample, colour=sample))
+ geom_line()
+ geom_point()
Loooooooooking forward to your help.
Q2:
I've learnt that the following script can add the spline() function for single one line, what about I wish to apply spline() for all the three lines in single one image?
n <-10
d <- data.frame(x =1:n, y = rnorm(n))
ggplot(d,aes(x,y))+ geom_point()+geom_line(data=data.frame(spline(d, n=n*10)))
Your variable column is a factor (you can verify by calling str(df2)). Just convert it back to numeric:
df2$variable <- as.numeric(as.character(df2$variable))
For your other question, you might want to stick with using geom_smooth or stat_smooth, something like this:
p <- ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) +
geom_line() +
geom_point()
library(splines)
p + geom_smooth(aes(group = sample),method = "lm",formula = y~bs(x),se = FALSE)
which gives me something like this:

How can a line be overlaid on a bar plot using ggplot2?

I'm looking for a way to plot a bar chart containing two different series, hide the bars for one of the series and instead have a line (smooth if possible) go through the top of where bars for the hidden series would have been (similar to how one might overlay a freq polynomial on a histogram). I've tried the example below but appear to be running into two problems.
First, I need to summarize (total) the data by group, and second, I'd like to convert one of the series (df2) to a line.
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,1,2,2,3,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,1,2))
ggplot(df, aes(x=grp, y=val)) +
geom_bar(stat="identity", alpha=0.75) +
geom_bar(data=df2, aes(x=grp, y=val), stat="identity", position="dodge")
You can get group totals in many ways. One of them is
with(df, tapply(val, grp, sum))
For simplicity, you can combine bar and line data into a single dataset.
df_all <- data.frame(grp = factor(levels(df$grp)))
df_all$bar_heights <- with(df, tapply(val, grp, sum))
df_all$line_y <- with(df2, tapply(val, grp, sum))
Bar charts use a categorical x-axis. To overlay a line you will need to convert the axis to be numeric.
ggplot(df_all) +
geom_bar(aes(x = grp, weight = bar_heights)) +
geom_line(aes(x = as.numeric(grp), y = line_y))
Perhaps your sample data aren't representative of the real data you are working with, but there are no lines to be drawn for df2. There is only one value for each x and y value. Here's a modifed version of your df2 with enough data points to construct lines:
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,2,3,1,2,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,0,2))
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(stat="identity", alpha=0.75)
p + geom_line(data=df2, aes(x=grp, y=val), colour="blue")
Alternatively, if your example data above is correct, you can plot this information as a point with geom_point(data = df2, aes(x = grp, y = val), colour = "red", size = 6). You can obviously change the color and size to your liking.
EDIT: In response to comment
I'm not entirely sure what the visual for a freq polynomial over a histogram is supposed to look like. Are the x-values supposed to be connected to one another? Secondly, you keep referring to wanting lines but your code shows geom_bar() which I assume isn't what you want? If you want lines, use geom_lines(). If the two assumptions above are correct, then here's an approach to do that:
#First let's summarise df2 by group
df3 <- ddply(df2, .(grp), summarise, total = sum(val))
> df3
grp total
1 A 5
2 B 8
3 C 3
#Second, let's plot df3 as a line while treating the grp variable as numeric
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(alpha=0.75, stat = "identity")
p + geom_line(data=df3, aes(x=as.numeric(grp), y=total), colour = "red")

Resources