Adding legend to ggplot curves plotted on the same axis [duplicate] - r

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 4 months ago.
I have a graph that I'm trying to add a legend to but I can't find any answers.
Here's what the graph looks like
I made a dataframe containing my x-axis as a colum and several othe columns containing y values that I graphed against x (fixed) in order to get these curves. I want a legend to appear on the side saying column 1, ...column 11 and corresponding to the color of the graph
How do I do this? I feel like I'm missing something obvious
Here's what my code looks like:(sorry for the pic. I keep getting errors that my code is not formatted correctly even though I'm using the code button)
interval is just 2:100 and aaaa etc... is a vector the same length as interval.

As Peter says, you will need to convert your data into "long" format. Here is an example using reshape2::melt:
library(reshape2)
library(ggplot2)
n <- 20
df <- data.frame(x = seq(n))
tmp <- as.data.frame(do.call("cbind", lapply(seq(5), FUN = function(x){rnorm(n)})))
names(tmp) <- paste0("aaaa", letters[1:5])
df <- cbind(df, tmp)
head(df)
df2 <- melt(df, id.vars = "x")
head(df2)
ggplot(data = df2) + aes(x = x, y = value, color = variable) +
geom_point() +
geom_line()

Related

How to connect lines when x are strings? [duplicate]

This question already has answers here:
ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
(6 answers)
Closed 12 months ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
library(ggplot2)
x=letters[1:3]
y=1:3
qplot(x, y)
qplot(x, y, geom=c('point', 'line'))
geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?
I want to connect lines between the points. But when the x is a string, the above commands won't work. It works when the x is numeric. I'd think qplot should be made more user-friendly in this case.
How to make it connect the points with lines when x is a string?
One solution is provided by #stefan. Another one could be the following.
Sample data:
x=letters[1:3]
y=1:3
Sample code:
d <- data.frame(x, y) %>%
mutate(x = x %>%
factor(levels = x))
library(ggplot2)
ggplot(data = d, aes(x = x, y = y, group = 1)) +
geom_line() +
scale_x_discrete(labels = x, breaks = x)
Plot:

boxplots with missing values in R - ggplot

I am trying to make boxplots for a matrix (athTp) with 6 variables (columns) but with many missing values, '
ggplot(athTp)+geom_boxplot()
But maybe sth I am doing wrong...
I tried also to make many box plots and after to arrange the grid, but the final plot was very small (in desired dimensions), loosing many of details.
q1 <- ggplot(athTp,aes(x="V1", y=athTp[,1]))+ geom_boxplot()
..continue with other 5 columns
grid.arrange(q1,q2,q3,q4,q5,q6, ncol=6)
ggsave("plot.pdf",plot = qq, width = 8, height = 8, units = "cm")
Do you have any ideas?
Thanks in advance!
# ok so your data has 6 columns like this
set.seed(666)
dat <- data.frame(matrix(runif(60,1,20),ncol=6))
names(dat) <- letters[1:6]
head(dat)
# so let's get in long format like ggplot likes
library(reshape2)
longdat <- melt(dat)
head(longdat)
# and try your plot call again specifying that we want a box plot per column
# which is now indicated by the "variable" column
# [remember you should specify the x and y axes with `aes()`]
library(ggplot2)
ggplot(longdat, aes(x=variable, y=value)) + geom_boxplot(aes(colour = variable))

R stacked area chart - ignore NA and retain full x-axis

i've decadal time series from 1700 to 1900 (21 time slices) and for each decade i've got 7 categories that represent a quantity; see here
As you can see, only 5 of the decades actually have data.
I can plot a nice little stacked area chart in R, with the help of this very nice example, which retains only the 5 time slices that have data.
My problem is that i want an x-axis that retains all 21 times slices but still plots a stacked area chart using only the 5 time slices. The idea is that the stacked areas will still only be plotted against the correct year but simply connect up to the next point, 10 ticks down the x-axis, ignoring the no-data in between. i can achieve something in excel but i dont like it.
My reasoning is i want to plot lines on the top of the stacked area that are much more complete, for example from 1700 to 1850, or 1800 to 1900, for visual comparison purposes.
This post suggests how to connect dots in a line chart when you want to ignore NAs but it doesnt work for me in this instance.
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
df
thanks a lot
If you wish to transform your year to factor, on the lines of the code below:
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
It will generate the chart below:
I wasn't sure if you are interested in mapping all of the X variables. I was thinking that this is the case so I reshaped your data. Presumably, it is wiser not to change the Year to factor. The code below:
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
# Leave it as int.
# df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
would generate much more meaningful chart:
Potentially, if you decide to use years as factors you may group them and have one category for a number of missing years so the x-axis is more readable. I would say it's a matter of presentation to great extent.

Dynamically Set X limits on time plot

I am wondering how to dynamically set the x axis limits of a time series plot containing two time series with different dates. I have developed the following code to provide a reproducible example of my problem.
#Dummy Data
Data1 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1997","9/13/1998"), Area_2D = c(20,11,5,25,50))
Data2 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
Data3 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1996","9/13/1998"), Area_2D = c(20,25,28,30,35))
Data4 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
#Convert date column as date
Data1$Date <- as.Date(Data1$Date,"%m/%d/%Y")
Data2$Date <- as.Date(Data2$Date,"%m/%d/%Y")
Data3$Date <- as.Date(Data3$Date,"%m/%d/%Y")
Data4$Date <- as.Date(Data4$Date,"%m/%d/%Y")
#PLOT THE DATA
max_y1 <- max(Data1$Area_2D)
# Define colors to be used for cars, trucks, suvs
plot_colors <- c("blue","red")
plot(Data1$Date,Data1$Area_2D, col=plot_colors[1],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
par(new=T)
plot(Data2$Date,Data2$Area_2D, col=plot_colors[2],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
The main problem I see with the code above is there are two different x axis on the plot, one for Data1 and another for Data2. I want to have a single x axis spanning the date range determined by the dates in Data1 and Data2.
My questions is:
How do i dynamically create an x axis for both series? (i.e select the minimum and maximum date from the data frames 'Data1' and 'Data2')
The solution is to combine the data into one data.frame, and base the x-axis on that. This approach works very well with the ggplot2 plotting package. First we merge the data and add an ID column, which specifies to which dataset it belongs. I use letters here:
Data1$ID = 'A'
Data2$ID = 'B'
merged_data = rbind(Data1, Data2)
And then create the plot using ggplot2, where the color denotes which dataset it belongs to (can easily be changed to different colors):
library(ggplot2)
ggplot(merged_data, aes(x = Date, y = Area_2D, color = ID)) +
geom_point() + geom_line()
Note that you get one uniform x-axis here. In this case this is fine, but if the timeseries do not overlap, this might be problematic. In that case we can use multiple sub-plots, known as facets in ggplot2:
ggplot(merged_data, aes(x = Date, y = Area_2D)) +
geom_point() + geom_line() + facet_wrap(~ ID, scales = 'free_x')
Now each facet has it's own x-axis, i.e. one for each sub-dataset. What approach is most valid depends on the specific situation.

Plotting multiple columns with ggplot2 [duplicate]

This question already has answers here:
Plot multiple columns on the same graph in R [duplicate]
(4 answers)
Closed 4 years ago.
I need to plot the following dataset in the same graph.
Bin1,Bin2,Bin3,Cat
4,3,5,S
6,4,5,M
3,5,4,M
1,4,5,M
,5, ,M
In each bin, first data point belongs to a different category than the rest. (So I added the Cat column)
I need to plot these as points (different colors for the different categories)
Following lines of code achieve what I need for a single bin
p <- ggplot(data,aes(Bin1,1))
p + geom_point(aes(color=Cat, size=Cat))
How do I do this for the entire dataset ?
Here is a related question?
What if I need to use a bunch of columns to color the points. Color Bin1 points according to Cat1 and so on..
Bin1,Cat1,Bin2,Cat2
4,S,5,S
6,L,5,M
3,M,4,L
1,M,5,L
3,M
How do I do this??
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.vars = "Cat"), aes(value, variable, colour = Cat)) +
geom_point(size = 4)
Just melt the data.frame and plot it.
library(reshape2)
dataM <- melt(data, id.vars = "Cat")
p <- ggplot(dataM, aes(value, variable, colour = Cat, size = Cat) + geom_point()

Resources