Grouped barplot with ggplot2: grouping two numeric variables under each year - r

My problem is simple but I have not been able to find a post that solves it.
Here is my data set DF:
Year CO2Seq CO2Seq2
1 2000 1135704 1107400
2 2003 3407111 3444508
3 2010 1703555 1661100
4 2015 2271407 2296339
I would like to create a barplot where the bars CO2Seq and CO2Seq2 are next to each other for each year.
For the moment, I have only been able to create a simple barplot for CO2Seq with this script
ggplot(DF,aes(x=factor(Year), y=CO2Seq))+geom_bar(stat="identity")
Could you help me?
Thanks a lot

ggplot has generally been designed for use with long rather than wide data, so the first step is to reshape your data, then plotting is straightforward.
library(ggplot2)
library(tidyr)
df %>%
pivot_longer(col = -Year) %>%
ggplot(aes(x = factor(Year), y = value, fill = name)) +
geom_bar(stat = "identity", position = "dodge")

Related

Why does my line plot (ggplot2) look vertical?

I am new to coding in R, when I was using ggplot2 to make a line graph, I get vertical lines. This is my code:
all_trips_v2 %>%
group_by(Month_Name, member_casual) %>%
summarise(average_duration = mean(length_of_ride))%>%
ggplot(aes(x = Month_Name, y = average_duration)) + geom_line()
And I'm getting something like this:
This is a sample of my data:
(Not all the cells in the Month_Name is August, it's just sorted)
Any help will be greatly appreciated! Thank you.
I added a bit more code just for the mere example. the data i chose is probably not the best choice to display a proper timer series.
I hope the features of ggplot i displayed will be benficial for you in the future
library(tidyverse)
library(lubridate)
mydat <- sample_frac(storms,.4)
# setting the month of interest as the current system's month
month_of_interest <- month(Sys.Date(),label = TRUE)
mydat %>% group_by(year,month) %>%
summarise(avg_pressure = mean(pressure)) %>%
mutate(month = month(month,label = TRUE),
current_month = month == month_of_interest) %>%
# the mutate code is just for my example.
ggplot(aes(x=year, y=avg_pressure,
color=current_month,
group=month,
size=current_month
))+geom_line(show.legend = FALSE)+
## From here its not really important,
## just ideas for your next plots
scale_color_manual(values=c("grey","red"))+
scale_size_manual(values = c(.4,1))+
ggtitle(paste("Averge yearly pressure,\n
with special interest in",month_of_interest))+
theme_minimal()
## Most important is that you notice the group argument and also,
# in most cases you will want to color your different lines.
# I added a logical variable so only October will be colored,
# but that is not mandatory
You should add a grouping argument.
see further info here:
https://ggplot2.tidyverse.org/reference/aes_group_order.html
# Multiple groups with one aesthetic
p <- ggplot(nlme::Oxboys, aes(age, height))
# The default is not sufficient here. A single line tries to connect all
# the observations.
p + geom_line()
# To fix this, use the group aesthetic to map a different line for each
# subject.
p + geom_line(aes(group = Subject))

Plot multicolor vertical lines by using ggplot to show average time taken for each type as facet. Each type will have different vertical lines

I want to plot a chart in R where it will show me vertical lines for each type in facet.
df is the dataframe with person X takes time in minutes to reach from A to B and so on.
I have tried below code but not able to get the result.
df<-data.frame(type =c("X","Y","Z"), "A_to_B"= c(20,56,57), "B_to_C"= c(10,35,50), "C_to_D"= c(53,20,58))
ggplot(df, aes(x = 1,y = df$type)) + geom_line() + facet_grid(type~.)
I have attached image from excel which is desired output but I need only vertical lines where there are joins instead of entire horizontal bar.
I would not use facets in your case, because there are only 3 variables.
So, to get a similar plot in R using ggplot2, you first need to reformat the dataframe using gather() from the tidyverse package. Then it's in long or tidy format.
To my knowledge, there is no geom that does what you want in standard ggplot2, so some fiddling is necessary.
However, it's possible to produce the plot using geom_segment() and cumsum():
library(tidyverse)
# First reformat and calculate cummulative sums by type.
# This works because factor names begins with A,B,C
# and are thus ordered correctly.
df <- df %>%
gather(-type, key = "route", value = "time") %>%
group_by(type) %>%
mutate(cummulative_time = cumsum(time))
segment_length <- 0.2
df %>%
mutate(route = fct_rev(route)) %>%
ggplot(aes(color = route)) +
geom_segment(aes(x = as.numeric(type) + segment_length, xend = as.numeric(type) - segment_length, y = cummulative_time, yend = cummulative_time)) +
scale_x_discrete(limits=c("1","2","3"), labels=c("Z", "Y","X"))+
coord_flip() +
ylim(0,max(df$cummulative_time)) +
labs(x = "type")
EDIT
This solutions works because it assigns values to X,Y,Z in scale_x_discrete. Be careful to assign the correct labels! Also compare this answer.

bar plot in r with multiple bars per x variable

How do I plot a bar-plot so that every variable (treatment group) on the x-axis displays two bars, representing avgRDm and avgSDM? I would like the bars to be colored by avgRDm and avgSDM.
The data for the plot is in the following image:
Thank you
I'm a big fan of ggplot, so here is an option in that vein. It's easiest (and tidiest) to reshape data from wide to long and then map the fill aesthetic to the key
library(tidyverse)
df %>%
gather(key, val, -trt) %>%
ggplot(aes(trt, val, fill = key)) +
geom_col(position = "dodge2")
PS. For future posts, please share data in a reproducible way using e.g. dput; screenshots are never a good idea as it requires respondents to manually type out your sample data.
Sample data
df <- read.table(text =
"trt avgRDM avgSDM
F10 49.5 108.333
NH4Cl 12.583 50.25
NH4NO3 17.333 73.33
'F10 + ANU843' 6.0 7.333", header = T)

Plotting a line graph with multiple lines

I am trying to plot a line graph with multiple lines in different colors, but not having much luck. My data set consists of 10 states and the voting turnout rates for each state from 9 elections (so the states are listed in the left column, and each subsequent column is an election year from 1980-2012 with the voting turnout rate for each of the 10 states). I would like to have a graph with the year on the X axis and the voting turnout rate on the Y axis, with a line for each state.
I found this previous answer (Plotting multiple lines from a data frame in R) to a similar question but cannot seem to replicate it using my data. Any ideas/suggestions would be immensely appreciated!
Use tidyr::gather or reshape::melt to transform the data to a long form.
## Simulate data
d <- data.frame(state=letters[1:10],
'1980'=runif(10,0,100),
'1981'=runif(10,0,100),
'1982'=runif(10,0,100))
library(dplyr)
library(tidyr)
library(ggplot2)
## Transform to a long df
e <- d %>% gather(., key, value, -state) %>%
mutate(year = as.numeric(substr(as.character(key), 2, 5))) %>%
select(-key)
## Plot
ggplot(data=e,aes(x=year,y=value,color=state)) +
geom_point() +
geom_line()
Please include your data, or sample data, in your question so that we can answer your question directly and help you get to the root of the problem. Pasting your data is simplified by using dput().
Here's another solution to your problem, using scoa's sample data and the reshape2 package instead of the tidyr package:
# Sample data
d <- data.frame(state = letters[1:10],
'1980' = runif(10,0,100),
'1981' = runif(10,0,100),
'1982' = runif(10,0,100))
library(reshape2)
library(ggplot2)
# Melt data and remove X introduced into year name
melt.d <- melt(d, id = "state")
melt.d[["variable"]] <- gsub("X", "", melt.td[["variable"]])
# Plot melted data
ggplot(data = melt.d,
aes(x = variable,
y = value,
group = state,
color = state)) +
geom_point() +
geom_line()
Produces:
Note that I left out the as.numeric() conversion for year from scoa's example, and this is why the graph above does not include the extra x-axis ticks that scoa's does.

How to melt a dataframe into multiple factors

I have been trying to plot a line plot with ggplot.
My data looks something like this:
I04 F04 I05 F05 I06 F06
CAT 3 12 2 6 6 20
DOG 0 0 0 0 0 0
BIEBER 1 0 0 1 0 0
and can be found here.
Basically, we have a certain number of CATs (or other creatures) initially in a year (this is I04), and a certain number of CATs at the end of the year (this is F04). This goes on for some time.
I can plot something like this fairly simply using the code below, and get this:
This is fantastic, but doesn't work very well for me. After all, I have these staring and ending inventory for each year. So I am interested in seeing how the initial values (I04, I05, I06) change over time. So, for each animal, I would like to create two different lines, one for initial quantity and one for final quantity (F01, F05, F06). This seems to me like now I have to consider two factors.
This is really difficult given the way my data is set up. I'm not sure how to tell ggplot that all the I prefixed years are one factor, and all the F prefixed years are another factor. When the dataframe gets melted, it's too late. I'm not sure how to control this situation.
Any advice on how I can separate these values or perhaps another, better way to tackle this situation?
Here is the code I have:
library(ggplot2)
library(reshape2)
DF <- read.csv("mydata.csv", stringsAsFactors=FALSE)
## cleaning up, converting factors to numeric, etc
text_names <- data.frame(as.character(DF$animals))
names(text_names) <- c("animals")
numeric_cols <- DF[, -c(1)]
numeric_cols <- sapply(numeric_cols, as.numeric)
plot_me <- data.frame(cbind(text_names, numeric_cols))
plot_me$animals <- as.factor(plot_me$animals)
meltedDF <- melt(plot_me)
p <- ggplot()
p <- p + geom_line(aes(seq(1:36), meltedDF$value, group=meltedDF$animals, color=meltedDF$animals))
p
Using your original data from the link:
nd <- reshape(mydata, idvar = "animals", direction = "long", varying = names(mydata)[-1], sep = "")
ggplot(nd, aes(x = time, y = I, group = animals, colour = animals)) + geom_line() + ggtitle("Development of initial inventories")
ggplot(nd, aes(x = time, y = F, group = animals, colour = animals)) + geom_line() + ggtitle("Development of final inventories")
I think from a data analyst perspective the following approach might provide better insight.
For each animal we visualize the initial and the final quantity in a separate panel. Moreover, each subplot has its own y scale because the values of the different animal types are radically different. Like this, differences within and across animal types are easier to spot.
Given the current structure of your data, we do not need two different factors. After the gather call the indicator column includes data like I04, F04, etc. We just need to separate the first character from the rest resulting in two columns type and time. We can use type as the argument for color in the ggplot call. time provides a unified x-axis across all animal types.
library(tidyr)
library(dplyr)
library(ggplot2)
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep = 1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = type)) +
geom_line() +
facet_grid(animals ~ ., scales = "free_y")
Of course, you might also do it the other way round, namely using a subplot for the initial and the final quantities like this:
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep=1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = animals)) +
geom_line() +
facet_grid(type ~ ., scales = "free_y")
But as described above, I would not recommend that because the y scale varies too much across animal types.

Resources