Graphs for Comparision in R

Graphs for Comparision in R - r

I have a dataset like below and would like to create a graph as shown. I have tried many ways to represent data in a good way. Is there any way I could represent the following dataset as given graph.
d <- read.table(text="
MONTH ACTUAL PREDICTED
1 January -18.0521472 -0.05621367
2 February 5.7084035 2.06652079
3 March 1.5226629 -2.13900349
4 April -6.2783397 -1.4275986
", stringsAsFactors=FALSE)
d$MONTH <- factor(d$MONTH, levels=unique(d$MONTH))
Graph:
[![X axis is the Month and Y axis is Values of Actual and Predicted][1]][1]
Here the X- axis is the month and Y axis is the values of Actual and Predicted. I would like to show the labels as well. Thanks in advance.

library(tidyverse)
d %>%
gather(var,value,-MONTH) %>%
ggplot(aes(MONTH,value, col=var,group = var))+
geom_line(linetype = "dashed")+
geom_point()

Related

geom bar comapre years per month

I have 2 datas, one for 2020 and the other for 2019. Each is divided into 5 groups when each month has its own data.
I want to create a graph that compares each month for each group between the figure in 2020 and the figure in 2019.
the data for 2020 was like that-
enter image description here
and the data for 2019 was the same.
I combine the 2 datas to that:
enter image description here
The problem is that all the graphs I looked at on the internet have either one column of values or no division into months.
How can you create one graph that compares each month between 2019 and 2020?

library(tidyverse)
library(ggplot2)
# bring table in long format
longerTable <- tibble(month = 1:12, value_2020 = rnorm(12), value_2019=rnorm(12)) %>%
pivot_longer(cols=starts_with("value"), names_to="year", values_to="value")
# plot with ggplot.
ggplot(longerTable, aes(x=month, y=value, fill=year)) +
# stat = identity -> plot numbers as they are
# position = dodge -> show bars next to each other
geom_bar(stat="identity", position = "dodge")
Created on 2020-10-01 by the reprex package (v0.3.0)

ggplot2 - How to plot length of time using geom_bar?

I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?

I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))

plotting two categorical vectors in ggridges

I have a dataset with a few organisms, which I would like to plot on my y-axis, against date, which I would like to plot on the x-axis. However, I want the fluctuation of the curve to represent the abundance of the organisms. I.e I would like to plot a time series with the relative abundance separated by the organism to show similar patterns with time.
However, of course, plotting just date against an organism does not yield any information on the abundance. So, my question is, is there a way to make the curve represent abundance using ggridges?
Here is my code for an example dataset:
set.seed(1)
Data <- data.frame(
Abundance = sample(1:100),
Organism = sample(c("organism1", "organism2"), 100, replace = TRUE)
)
Date = rep(seq(from = as.Date("2016-01-01"), to = as.Date("2016-10-01"), by =
'month'),times=10)
Data <- cbind(Date, Data)
ggplot(Data, aes(x = Abundance, y = Organism)) +
geom_density_ridges(scale=1.15, alpha=0.6, color="grey90")
This produces a plot with the two organisms, however, I want the date on the x-axis and not abundance. However, this doesn't work. I have read that you need to specify group=Date or change date into julian day, however, this doesn't change the fact that I do not get to incorporate abundance into the plot.
Does anyone have an example of a plot with date vs. a categorical variable (i.e. organism) plotted against a continuous variable in ggridges?
I really like to output from ggridges and would like to be able to use it for these visualizations. Thank you in advance for your help!
Cheers,
Anni

To use geom_density_ridges, it'll help to reshape the data to show observations in separate rows, vs. as summarized by Abundance.
library(ggplot2); library(ggridges); library(dplyr)
# Uncount copies the row "Abundance" number of times
Data_sum <- Data %>%
tidyr::uncount(Abundance)
ggplot(Data_sum, aes(x = Date, y = Organism)) +
ggridges::geom_density_ridges(scale=1, alpha=0.6, color="grey90")

ggplot: Plotting timeseries data with missing values

I have been trying to plot a graph between two columns from a data frame which I had created. The data values stored in the first column is daily time data named "Time"(format- YYYY-MM-DD) and the second column contains precipitation magnitude, which is a numeric value named "data1".
This data is taken from an excel file "St Lucia3" which has a total 11598 data points and stores daily precipitation data from 1981 to 2018 in two columns:
YearMonthDay (format- "YYYYMMDD", example "19810501")
Rainfall (mm)
The code for importing data into R:
StLucia <- read_excel("C:/Users/hp/Desktop/St Lucia3.xlsx")
The code for time data "Time" :
Time <- as.Date(as.character(StLucia$YearMonthDay), format= "%Y%m%d")
The code for precipitation data "data1" :
library("imputeTS")
data1 <- na_ma(StLucia$`Rainfall (mm)`, k = 4, weighting = "exponential")
The code for data frame "Pecip1" :
Precip1 <- data.frame(Time, data1, check.rows=TRUE)
The code for ggplot is:
ggplot(data = Precip1, mapping= aes(x= Time, y= data1)) + geom_line()
Using ggplot for plotting the graph between "Time" and "data1" results as:
Can someone please explain to me why there is an "unusual kink" like behavior at the right end of the graph, even though there are no such values in the column "data1".
The plot of "data1" data against its index is as shown:
The code for this plot is:
plot(data1, type = "l")
Any help would be highly appreciated. Thanks!

By using pad we can make up for those lost values an assign an NA value as to
avoid plotting in the region of missing data.
library(padr)
library(zoo)
YearMonthDay<-c(19810501,19810502,19810504,19810505)
Data<-c(1,2,3,4)
StLucia<-data.frame(YearMonthDay,Data)
StLucia$YearMonthDay <- as.Date(as.character(StLucia$YearMonthDay), format=
"%Y%m%d")
> StLucia
YearMonthDay Data
1 1981-05-01 1
2 1981-05-02 2
3 1981-05-04 3
4 1981-05-05 4
Note: you can see we are missing a date, but still there is no gap between position 2 and 3, thus plotting versus indexing you would not see a gap.
So lets add the missing date:
StLucia<-pad(StLucia,interval="day")
> StLucia
YearMonthDay Data
1 1981-05-01 1
2 1981-05-02 2
3 1981-05-03 NA
4 1981-05-04 3
5 1981-05-05 4
plot(StLucia, type = "l")
If you want to fill in those NA values, use na.locf() from package(zoo)

Here is a reproducible example - change the names to match your data.
# create sample data
set.seed(47)
dd = data.frame(t = Sys.Date() + c(0:5, 30:32), y = runif(9))
# demonstrate problem
ggplot(dd, aes(t, y)) +
geom_point() +
geom_line()
The easiest solution, as Tung points out, is to use a more appropriate geom, like geom_col:
ggplot(dd, aes(t, y)) +
geom_col()
If you really want to use lines, you should fill in the missing dates with NA for rainfall. H
# calculate all days
all_days = data.frame(t = seq.Date(from = min(dd$t), to = max(dd$t), by = "day"))
# join to original data
library(dplyr)
dd_complete = left_join(all_days, dd, by = "t")
# ggplot won't connect lines across missing values
ggplot(dd_complete, aes(t, y)) +
geom_point() +
geom_line()
Alternately, you could replace the missing values with 0s to have the line just go along the axis, but I think it's nicer to not plot the line, which implies no data/missing data, rather than plot 0s which implies no rainfall.

R stacked area chart - ignore NA and retain full x-axis

i've decadal time series from 1700 to 1900 (21 time slices) and for each decade i've got 7 categories that represent a quantity; see here
As you can see, only 5 of the decades actually have data.
I can plot a nice little stacked area chart in R, with the help of this very nice example, which retains only the 5 time slices that have data.
My problem is that i want an x-axis that retains all 21 times slices but still plots a stacked area chart using only the 5 time slices. The idea is that the stacked areas will still only be plotted against the correct year but simply connect up to the next point, 10 ticks down the x-axis, ignoring the no-data in between. i can achieve something in excel but i dont like it.
My reasoning is i want to plot lines on the top of the stacked area that are much more complete, for example from 1700 to 1850, or 1800 to 1900, for visual comparison purposes.
This post suggests how to connect dots in a line chart when you want to ignore NAs but it doesnt work for me in this instance.
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
df
thanks a lot

If you wish to transform your year to factor, on the lines of the code below:
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
It will generate the chart below:
I wasn't sure if you are interested in mapping all of the X variables. I was thinking that this is the case so I reshaped your data. Presumably, it is wiser not to change the Year to factor. The code below:
a <- 1700:1900
b <- a[seq(1, length(a), 10)]
df <- data.frame("Year"=b,replicate(7,sample(1:21)))
rows <- c(2:10,11:15,17,19,21)
df[rows,2:8] <- NA
# Transform the data to long
library(reshape2)
df <- melt(data = df, na.rm = FALSE, id.vars = "Year")
# Leave it as int.
# df$Year <- as.factor(df$Year)
# Chart
require(ggplot2)
ggplot(df, aes(Year, value)) +
geom_area(aes(colour = variable, fill= variable), position = 'stack')
would generate much more meaningful chart:
Potentially, if you decide to use years as factors you may group them and have one category for a number of missing years so the x-axis is more readable. I would say it's a matter of presentation to great extent.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Graphs for Comparision in R - r

library(tidyverse) d %>% gather(var,value,-MONTH) %>% ggplot(aes(MONTH,value, col=var,group = var))+ geom_line(linetype = "dashed")+ geom_point()

Related

geom bar comapre years per month

ggplot2 - How to plot length of time using geom_bar?

plotting two categorical vectors in ggridges

ggplot: Plotting timeseries data with missing values

R stacked area chart - ignore NA and retain full x-axis

Categories

Resources