How to plot lines for the count data in R? - r

I have data frame like this:
frame <- data.frame("AGE" = seq(18,44,1),
"GROUP1"= c(83,101,159,185,212,276,330,293,330,356,370,325,264,274,214,229,227,154,132,121,83,69,57,32,16,17,8),
"GROUP2"= c(144,210,259,329,391,421,453,358,338,318,270,258,207,186,173,135,106,92,74,56,41,31,25,13,16,5,8))
I want to plot AGE in X-axis and value of GROUP1 and GROUP2 in the Y-axis in the same plot with different colors. And the values should be joined by a smoothened line.
As a first part, I melted the data frame and plotted:
melt <- melt(frame, id.vars = "AGE")
melt <- melt[order(melt$AGE),]
plot(melt$AGE, melt$value)

Here is an alternative solution using dplyr and tidyr packages.
library(dplyr)
library(tidyr)
newframe <- frame %>% gather("variable","value",-AGE)
ggplot(newframe, aes(x=AGE, y=value, color=variable)) +
geom_point() +
geom_smooth()
You could use geom_line() to get lines between the points, but it feels better to use geom_smooth() here. geom_area gives you a shaded area under the lines, but we need to change color to fill.
ggplot(newframe, aes(x=AGE, y=value, fill=variable)) + geom_area()

We can use matplot
matplot(`row.names<-`(as.matrix(frame[-1]), frame[,1]),
ylab='value',type = "l", xlab = "AGE",col = c("red", "blue"), pch = 1)
legend("topright", inset = .05, legend = c("GROUP1", "GROUP2"),
pch = 1, col = c("red", "blue"), horiz = TRUE)

Try,
library(ggplot2)
ggplot(meltdf,aes(x=AGE,y=value,colour=variable,group=variable)) + geom_line()

Related

Can you translate this into ggplot?

So basically, I would like to use ggplot function geom_line + geom_point to create the same plots but with fancier graphics.
> a
V1 V2 V3
1 0.8224887 0.7882316 0.7596440
2 0.7892779 0.7604186 0.7409430
3 0.8254516 0.8257800 0.8014778
4 0.8268519 0.7887464 0.7887322
5 0.8226651 0.7981079 0.7934783
plot(6:10, a$V1, type="l", xlab="Folds", ylab="Accuracy", col="Blue",ylim=c(0.7,0.9))
par(new=TRUE)
plot(6:10, a$V2, type="l", xlab="Folds", ylab="Accuracy", col="Orange",ylim=c(0.7,0.9))
par(new=TRUE)
plot(6:10, a$V3, type="l", xlab="Folds", ylab="Accuracy", col="Green",ylim=c(0.7,0.9))
My main goal is to get a legend that helps to distinguish each variable.
I tried to plot just the first line:
ggplot(data = a)+
theme_classic()+
geom_line(aes(x=6:10, y = a$V1, color = "blue"))
The problem is that i don't even get the color I want.
Thanks for reading and helping!
library (dplyr)
library (ggplot2)
a <- data.frame(
V1=rnorm(5),
V2=rnorm(5),
V3=rnorm(5),
Folds = 6:10) # make some example data
a %>%
tidyr::gather(key,value,-Folds) %>% #get data in long format for ggplot
ggplot(.,aes(x = Folds,y = value,y,col = key))+
geom_line() + # add line
geom_point() + # add points
scale_color_manual("My Variables",values = c("blue","orange","green")) + #change colours
theme_classic()
library(tidyverse)
originalData <- tibble(
V1=c(0.8224887, 0.7892779, 0.8254516, 0.8268519, 0.8226651),
V2=c(0.7882316, 0.7604186, 0.8257800, 0.7887464, 0.7981079),
V3=c(0.7596440, 0.7409430, 0.8014778, 0.7887322, 0.7934783)
)
# ggplot works best if your data is 'tidy'
tidyData <- originalData %>%
pivot_longer(cols=c(V1, V2, V3), names_to="Variable") %>%
add_column(X=rep(6:10, each=3))
tidyData
tidyData %>%
ggplot(aes(x=X, y=value, colour=Variable)) +
geom_line() +
theme_classic()
Giving
You can customise your plot from here as you like.

Histogram with different colours using the abline function in R

I would like to plot a histogram with different colours and legend.
Assuming the following data:
df1<- rnorm(300,60,5)
I have used the following codes to get the histogram plot and the lines using the abline function:
df1<-data.frame(df1)
attach(df1)
hist(M,at=seq(0,100, 2))
abline(v=80, col="blue")
abline(v=77, col="red")
abline(v=71, col="red")
abline(v=68, col="blue")
abline(v=63, col="blue")
abline(v=58, col="blue")
abline(v=54, col="blue")
abline(v=51, col="blue")
abline(v=457, col="blue")
Now I want to get the following plot. I wanted to remove the lines, but I was unable to do it. So I do not need to have the lines.
Here's one way of doing that with ggplot2, dplyr and tidyr.
First you need to set the colors. I do that with mutate and case_when. For the plot itself, it's important to remember that if histogram bins are not aligned, you can get different colors on the same bar. To avoid this, you can use binwidth=1.
library(ggplot2)
library(dplyr)
library(tidyr)
df1 <- data.frame(data1=rnorm(300,60,5))
df1 <- df1 %>%
mutate(color_name=case_when(data1<60 ~ "red",
data1>=60 & data1 <63 ~ "blue",
TRUE ~ "cyan"))
ggplot(df1,aes(x=data1, fill=color_name)) +
geom_histogram(binwidth = 1, boundary = 0, position="dodge") +
scale_fill_identity(guide = "legend")
Additional request in comment
Using case_when with four colors:
df1 <- data.frame(data1=rnorm(300,60,5))
df1 <- df1 %>%
mutate(color_name=case_when(data1<60 ~ "red",
data1>=60 & data1 <63 ~ "blue",
data1>=63 & data1 <65 ~ "orange",
TRUE ~ "cyan"))
ggplot(df1,aes(x=data1, fill=color_name)) +
geom_histogram(binwidth = 1, boundary = 0, position="dodge") +
scale_fill_identity(guide = "legend")

Plot different parts of a vector with different colors on the same graph

As from the title suppose this vector and plot:
plot(rnorm(200,5,2),type="l")
This returns this plot
What i would like to know is whether there is a way to make the first half of it to be in blue col="blue" and the rest of it to be in red "col="red".
Similar question BUT in Matlab not R: Here
You could simply use lines for the second half:
dat <- rnorm(200, 5, 2)
plot(1:100, dat[1:100], col = "blue", type = "l", xlim = c(0, 200), ylim = c(min(dat), max(dat)))
lines(101:200, dat[101:200], col = "red")
Not a base R solution, but I think this is how to plot it using ggplot2. It is necessary to prepare a data frame to plot the data.
set.seed(1234)
vec <- rnorm(200,5,2)
dat <- data.frame(Value = vec)
dat$Group <- as.character(rep(c(1, 2), each = 100))
dat$Index <- 1:200
library(ggplot2)
ggplot(dat, aes(x = Index, y = Value)) +
geom_line(aes(color = Group)) +
scale_color_manual(values = c("blue", "red")) +
theme_classic()
We can also use the lattice package with the same data frame.
library(lattice)
xyplot(Value ~ Index, data = dat, type = 'l', groups = Group, col = c("blue", "red"))
Notice that the blue line and red line are disconnected. Not sure if this is important, but if you want to plot a continuous line, here is a workaround in ggplot2. The idea is to subset the data frame for the second half, plot the entire data frame with color as blue, and then plot the second data frame with color as red.
dat2 <- dat[dat$Index %in% 101:200, ]
ggplot(dat, aes(x = Index, y = Value)) +
geom_line(color = "blue") +
geom_line(data = dat2, aes(x = Index, y = Value), color = "red") +
theme_classic()

How to plot a filled.contour plot using ggplot2?

I have some data and I have tried a filled.contour plot which seems nice. However, the legend is hard to control, so I am thinking to use ggplo2. But I have no clue how to plot a filled.contour using ggplot2.
The data contains 840 rows (which stand for the dates), and 12 columns (which stand for 12 time scales). Here is an example
set.seed(66)
Mydata <- sample(x=(-3:3),size = 840*12,replace = T)
Mydata <- matrix(data=Mydata,nrow=840,ncol=12)
Dates <- seq(from=1948+1/24, to= 2018,by=1/12)
data.breaks <- c(-3.5,-2.5,-1.5,0,1.5,2.5,3.5)
filled.contour(Dates,seq(1:12),Mydata,col=cols(11),xlab="",ylab="time-scale",levels=data.breaks)
As we can see, the legend intervals are not what I want. I want to show -3.5,-2.5,-1.5,0,1.5,2.5,3.5on the legend and I believe it is much easier to do this with ggplot2. Thanks for any help.
A ggplot2 alternative to filled.contour is stat_contour.
library(ggplot2)
library(reshape2)
set.seed(66)
Mydata <- sample(x=(-3:3),size = 840*12,replace = T)
Mydata <- matrix(data=Mydata,nrow=840,ncol=12)
Dates <- seq(from=1948+1/24, to= 2018,by=1/12)
data.breaks <- c(-3.5,-2.5,-1.5,0,1.5,2.5,3.5)
rownames(Mydata) <- Dates
d <- melt(Mydata)
colfunc = colorRampPalette(c("brown", "red", "yellow", "white"))
ggplot(d, aes(Var1, Var2, z=value, fill = value)) +
stat_contour(geom="polygon", aes(fill=..level..)) +
scale_fill_gradientn(colours = colfunc(7), breaks=data.breaks, limits=c(-4,4),
values=scales::rescale(data.breaks))+
theme_bw() +
scale_x_continuous(name="", breaks=seq(1950,2010,20), expand=c(0,0)) +
scale_y_continuous(name="time-scale", expand=c(0,0))+
guides(fill = guide_colorbar(barwidth = 2, barheight = 15))

Add arbitrary series with legend in ggplot2?

I have a bunch of data - three timeseries (model group means), coloured by group, with standard deviation represented by geom_ribbon. By default they have a nice legend on the side. I also have a single timeseries of observations, that I want to overlay over the plot (without the geom_ribbon), like this:
df <- data.frame(year=1991:2010, group=c(rep('group1',20), rep('group2',20), rep('group3',20)), mean=c(cumsum(abs(rnorm(20))),cumsum(abs(rnorm(20))),cumsum(abs(rnorm(20)))),sd=3+rnorm(60))
obs_df <- data.frame(year=1991:2010, value=cumsum(abs(rnorm(20))))
ggplot(df, aes(x=year, y=mean)) + geom_line(aes(colour=group)) + geom_ribbon(aes(ymax=mean+sd, ymin=mean-sd, fill=group), alpha = 0.2) +geom_line(data=obs_df, aes(x=year, y=value))
But the observations does appear on the legend, because it's not coloured (I want it black). How can I add the obs to the legend?
First, create a combined data frame of df and obs_df:
dat <- rbind(df, data.frame(year = obs_df$year,
group = "obs", mean = obs_df$value, sd = 0))
Plot:
ggplot(dat, aes(x=year, y=mean)) +
geom_line(aes(colour=group)) +
geom_ribbon(aes(ymax=mean+sd, ymin=mean-sd, fill=group), alpha = 0.2) +
scale_colour_manual(values = c("red", "green", "blue", "black")) +
scale_fill_manual(values = c("red", "green", "blue", NA))
I'm guessing you made an error with your construction of 'obs_df'. If you create it with year = 1991:2010 it makes more sense in the context of the rest of the data and it gives you the plot you are hoping for with the ggplot call unchanged.

Resources