Boxplots in ggplot2 R - r

My goal is to visualize some data frames with ggplot2.
I have several data.frames looking like this
And my goal is a boxplot looking like this, just nicer.
I managed to get single boxplots using
plt <- ggplot(data, aes(RF, data$RF)) +
geom_boxplot()
plt
But that's not what I want.

library(ggplot2)
library(reshape)
airquality_m = melt(airquality)
ggplot(airquality_m, aes(variable, value )) + geom_boxplot()
I did not beautify the plot but I guess you get the idea here.

That boxplot you showed is created with base-r graphics. Single command
boxplot(data) will do it.
If you want to use ggplot, you have to first melt the dataframe and then plot.
library(reshape2)
datPlot <- melt(data)
ggplot(datPlot,aes(variable,value)) + geom_boxplot()

I guess this is what you want:
library(ggplot2)
library(reshape)
myddt_m = melt(mydata)
names(myddt_m)=c("Models","CI")
ggplot(myddt_m, aes(Models, CI,fill=Models )) + geom_boxplot()+guides(fill=FALSE)+labs( x="", y="C-Index")

Related

How can I use column labels as Y axis in ggplot?

Hello,
I have a dateset structured as shown in the link above. I am extremely new to R. And this is probably super easy to get done. But I cannot figure out how to plot this dataset using ggplot...
Could anyone guide and give me hints?
I basically want to color lines according to socioeconomic levels and visualize it by each years' value...
You need to reshape you data to run ggplot.
library(reshape)
library(dplyr)
library(ggplot2)
df_long <- melt(df) # reshape the dataframe to a long format
df_long %>%
ggplot( aes(x=variable, y=value, group=group, color=group)) +
geom_line()
Note: You will get better answers if you post your code with a reproducible dataset.

ggplot bar chart for time series

I'm reading the book by Hadley Wickham about ggplot, but I have trouble to plot certain weights over time in a bar chart. Here is sample data:
dates <- c("20040101","20050101","20060101")
dates.f <- strptime(dates,format="%Y%m%d")
m <- rbind(c(0.2,0.5,0.15,0.1,0.05),c(0.5,0.1,0.1,0.2,0.1),c(0.2,0.2,0.2,0.2,0.2))
m <- cbind(dates.f,as.data.frame(m))
This data.frame has in the first column the dates and each row the corresponding weights. I would like to plot the weights for each year in a bar chart using the "fill" argument.
I'm able to plot the weights as bars using:
p <- ggplot(m,aes(dates.f))
p+geom_bar()
However, this is not exactly what I want. I would like to see in each bar the contribution of each weight. Moreover, I don't understand why I have the strange format on the x-axis, i.e. why there is "2004-07" and "2005-07" displayed.
Thanks for the help
Hope this is what you are looking for:
ggplot2 requires data in a long format.
require(reshape2)
m_molten <- melt(m, "dates.f")
Plotting itself is done by
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity")
You can add position="dodge" to geom_bar if you want then side by side.
EDIT
If you want yearly breaks only: convert m_molten$dates.f to date.
require(scales)
m_molten$dates.f <- as.Date(m_molten$dates.f)
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity") +
scale_x_date(labels = date_format("%y"), breaks = date_breaks("year"))
P.S.: See http://vita.had.co.nz/papers/tidy-data.pdf for Hadley's philosophy of tidy data.
To create the plot you need, you have to reshape your data from "wide" to "tall". There are many ways of doing this, including the reshape() function in base R (not recommended), reshape2 and tidyr.
In the tidyr package you have two functions to reshape data, gather() and spread().
The function gather() transforms from wide to tall. In this case, you have to gather your columns V1:V5.
Try this:
library("tidyr")
tidy_m <- gather(m, var, value, V1:V5)
ggplot(tidy_m,aes(x = dates.f, y=value, fill=var)) +
geom_bar(stat="identity")

Plot results from dist_tab() function from qdap library

I am interested in plotting the results from the following code which produces a frequency distribution table. I would like to graph the Freq column as a bar with the cum.Freq as a line both sharing the interval column as the x-axis.
library("qdap")
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
dist_tab(x)
I have been able to get the bar chart built using ggplot, but I want to take it further with the cum.Freq added as a secondary axis. I also want to add the percent and cum.percent values added as data labels. Any help is appreciated.
library("ggplot2")
ggplot(dist_tab(x), aes(x=interval)) + geom_bar(aes(y=Freq))
Not sure if I understand your question. Is this what you are looking for?
df <- dist_tab(x)
df.melt <- melt(df, id.vars="interval", measure.vars=c("Freq", "cum.Freq"))
#
ggplot(df.melt, aes(x=interval, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")

ggplot2 continuous bar plot with multiple factors

I'm going to use the diamond data set that comes standard with the ggplot2 package to illustrate what I'm looking for.
I want to build a graph that is like this:
library(ggplot2)
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")
However, instead of having a count, I would like to return the mean of a continuous variable. I'd like to return cut and color and get the mean carat. If I put in this code:
ggplot(diamonds, aes(carat, fill=cut)) + geom_bar(position="dodge")
My output is a count of the number of carats vs the cut.
Anyone know how to do this?
You can get a new data frame with mean(carat) grouped by cut and color and then plot:
library(plyr)
data <- ddply(diamonds, .(cut, color), summarise, mean_carat = mean(carat))
ggplot(data, aes(color, mean_carat,fill=cut))+geom_bar(stat="identity", position="dodge")
If you want faster solutions you can use either dplyr or data.table
With dplyr:
library(dplyr)
data <- group_by(diamonds, cut, color)%.%summarise(mean_carat=mean(carat))
With data.table:
library(data.table)
data <- data.table(diamonds)[,list(mean_carat=mean(carat)), by=c('cut', 'color')]
The code for the plot is the same for both.

ggplot How to scatter plot one column and line plot another

I'm trying to make a plot in R from a data frame with several columns and I'd like to have ggplot plot one of the columns as points, and the other several as lines of different colors.
I can find examples about how to make each of these plots separately, but I can't seem to find the command to combine the plots...
Thanks for any help you can provide.
Like this:
dat <- data.frame(points.x = c(1:10), points.y = c(1:10),
lines.x = c(10:1), lines.y = c(1:10))
ggplot(dat, aes(points.x, points.y)) + geom_point() +
geom_line(aes(lines.x,lines.y))
In order to plot several different columns as lines of different colors, use the melt function from the reshape2 package.
For example:
df <- data.frame(A=1:10, B=rnorm(10), C=rnorm(10), D=rnorm(10))
melted <- melt(df, id="A")
ggplot(melted[melted$variable!="B",], aes(A, value, color=variable)) + geom_line() +
geom_point(data=melted[melted$variable=="B",])

Resources