R ggplot2 multiple column stacked histogram, separate bar for each column - r

I am trying to make a histogram of percentages for multiple columns of data in one graph. Is there a way to do this without transforming the data into an even longer format? Basically, I want to combine multiple histograms on one plot with the same y axis. I can't get facet_grid and facet_wrap to work because everything is in different columns. Here is some sample data:
data <- data.frame("participant"=c(1,2,3,4,5),
"metric1"=c(0,1,2,0,1),
"metric2"=c(1,2,0,1,2),
"metric3"=c(2,0,1,2,0),
"date"=rep("8/14/2021",5))
Ideally, I would have a stacked bar for metric 1, next to that a stacked bar for metric 2, fianlly a stacked bar for metric 3. I can generate one stacked bar at a time with the following code:
ggplot(data = data,
aes(x = date, group = factor(metric1), fill=factor(metric1))) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent)
How do I combine this graph with the graphs for metric 2 and 3 so that they are all on the same graph with the same axes? Can it be done without making the data long? My real data is more complicated than the test data, and I'd like to avoid transforming it. Thank you for reading and any help you can offer.

Reshape to 'long' format with pivot_longer and create the bar plot
library(dplyr)
library(ggplot2)
library(tidyr)
data %>%
pivot_longer(cols = starts_with('metric'), values_to = 'metric') %>%
ggplot(aes(x = date, group = factor(metric),fill = factor(metric))) +
geom_bar() +
facet_wrap(~ name)

Related

Scatter plot with ggplot2

I am trying to make scatter plot with ggplot2. Below you can see data and my code.
data=data.frame(
gross_i.2019=seq(1,101),
Prediction=seq(21,121))
ggplot(data=data, aes(x=gross_i.2019, y=Prediction, group=1)) +
geom_point()
This code produce chart below
So now I want to have values on scatter plot with different two different colors, first for gross_i.2019 and second for Prediction. I try with this code below with different color but this code this lines of code only change previous color into new color.
sccater <- ggplot(data=data, aes(x=gross_i.2019, y=Prediction))
sccater + geom_point(color = "#00AFBB")
So can anybody help me how to make this plot with two different color (e.g black and red) one for gross_i.2019 and second for Prediction?
I may be confused by what you are trying to accomplish, but it doesn't seem like you have two groups of data to plot two different colors for. You have one dependent(Prediction) and one independent (gross_i.2019) variable that you are plotting a relationship for. If Prediction and gross_i.2019 are both supposed to be dependent variables of different groups, you need a common independent variable to plot them separately, against (like time for example). Then you can do something like geompoint(color=groups)
Edit1: If you wanted the index (count of the dataset to be your independent x axis then you could do the following:
library(tidyverse)
data=data.frame(gross_i.2019=seq(1,101),Prediction=seq(21,121))
#create a column for the index numbers
data$index <- c(1:101)
#using tidyr pivot your dataset to a tidy dataset (long not wide)
data <- data %>% pivot_longer(!index, names_to="group",values_to="count")
#asign the groups to colors
p<- ggplot(data=data, aes(x=index, y=count, color=group))
p1<- p + geom_point()
p1
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
long <- reshape(data,
ids = row.names(data),
varying = c("gross_i.2019", "Prediction"),
v.names = "line",
direction = "long")
long$time <- names(data)[long$time]
long$id <- as.numeric(long$id)
library(ggplot2)
ggplot(long, aes(id, line, color = time)) +
geom_point() +
scale_color_manual(values = c("#000000", "#00AFBB"))

Plot multicolor vertical lines by using ggplot to show average time taken for each type as facet. Each type will have different vertical lines

I want to plot a chart in R where it will show me vertical lines for each type in facet.
df is the dataframe with person X takes time in minutes to reach from A to B and so on.
I have tried below code but not able to get the result.
df<-data.frame(type =c("X","Y","Z"), "A_to_B"= c(20,56,57), "B_to_C"= c(10,35,50), "C_to_D"= c(53,20,58))
ggplot(df, aes(x = 1,y = df$type)) + geom_line() + facet_grid(type~.)
I have attached image from excel which is desired output but I need only vertical lines where there are joins instead of entire horizontal bar.
I would not use facets in your case, because there are only 3 variables.
So, to get a similar plot in R using ggplot2, you first need to reformat the dataframe using gather() from the tidyverse package. Then it's in long or tidy format.
To my knowledge, there is no geom that does what you want in standard ggplot2, so some fiddling is necessary.
However, it's possible to produce the plot using geom_segment() and cumsum():
library(tidyverse)
# First reformat and calculate cummulative sums by type.
# This works because factor names begins with A,B,C
# and are thus ordered correctly.
df <- df %>%
gather(-type, key = "route", value = "time") %>%
group_by(type) %>%
mutate(cummulative_time = cumsum(time))
segment_length <- 0.2
df %>%
mutate(route = fct_rev(route)) %>%
ggplot(aes(color = route)) +
geom_segment(aes(x = as.numeric(type) + segment_length, xend = as.numeric(type) - segment_length, y = cummulative_time, yend = cummulative_time)) +
scale_x_discrete(limits=c("1","2","3"), labels=c("Z", "Y","X"))+
coord_flip() +
ylim(0,max(df$cummulative_time)) +
labs(x = "type")
EDIT
This solutions works because it assigns values to X,Y,Z in scale_x_discrete. Be careful to assign the correct labels! Also compare this answer.

bar plot in r with multiple bars per x variable

How do I plot a bar-plot so that every variable (treatment group) on the x-axis displays two bars, representing avgRDm and avgSDM? I would like the bars to be colored by avgRDm and avgSDM.
The data for the plot is in the following image:
Thank you
I'm a big fan of ggplot, so here is an option in that vein. It's easiest (and tidiest) to reshape data from wide to long and then map the fill aesthetic to the key
library(tidyverse)
df %>%
gather(key, val, -trt) %>%
ggplot(aes(trt, val, fill = key)) +
geom_col(position = "dodge2")
PS. For future posts, please share data in a reproducible way using e.g. dput; screenshots are never a good idea as it requires respondents to manually type out your sample data.
Sample data
df <- read.table(text =
"trt avgRDM avgSDM
F10 49.5 108.333
NH4Cl 12.583 50.25
NH4NO3 17.333 73.33
'F10 + ANU843' 6.0 7.333", header = T)

R: How do I make a one-variable scatterplot?

I have five categories and the first category has 100x more records than the fifth one.
I want to show a comparison between categories, but bar charts wouldn't make sense.
I also don't want to take the log, since I want to communicate the absolute values.
I have a category, x, called number of records. The idea is that y is an arbitrary axis and x is the categorical records. It's like a bar chart with dots instead of bars or a histogram with dots.
Is this something I can do with ggplot?
Check out geom_jitter()
library(dplyr)
library(ggplot2)
data = data.frame(records = c(rep("a",1000),rep("b",500),rep("c",100),rep("d",10)))%>%
mutate(y = 0)
data%>%
ggplot(aes(x = records,y = y))+
geom_jitter()
Reference: https://ggplot2.tidyverse.org/reference/position_jitter.html

Reordering categories in stacked bar chart based on count

I want to produce a stacked bar chart in ggplot2 where the bars in the stack are ordered according to the count of that category. When I attempt this using the below code, it appears that ggplot2 arranges the order of the bars in the stacked plot according to alphabetical order. Other answers on Stackoverflow suggest that ggplot2 order the bars according to the order in which R consumes the data, however in the 'a' dataframe, the appliance column is in the order of 'Radio', 'Laptop', 'TV' 'Fridge' (the first 4 rows) which isn't how it is shown in the graph either.
library(ggplot2)
library(tidyr)
#some data
SalesData<-data.frame(Appliance=c("Radio", "Laptop", "TV", "Fridge"), ThisYear=c(5,25,5,8), LastYear=c(6,20,5,8))
#transform the data into 'long format' for ggplot2
a<- gather(SalesData, Sales, Total, ThisYear, LastYear)
#Produce the bar chart
p<-ggplot(a, aes(fill=Appliance, y=Total, x=Sales)) +
geom_bar( stat="identity")
p
What I want to happen is for the largest counts to be at the bottom of the graph, so I need a way to order the data in this way. So in this example it would be 'Laptop' at the bottom, then 'Fridge', 'Radio' and 'TV' and for the legend to match this order.
Does anyone have any suggestions?
You need to reorder the factor levels before you plot the stacked bar chart. For this, there are several possibilities:
With base R
order_appliance <- unique(a$Appliance[order(a$Total)])
a$Appliance <- factor(a$Appliance, levels = order_appliance)
With dplyr
library(dplyr)
a <- a %>%
arrange(Total) %>%
mutate(Appliance = factor(Appliance, levels = unique(Appliance)))
With forcats
library(forcats)
a$Appliance <- fct_reorder(a$Appliance, a$Total)
For the plot you can use `geom_col` instead of `geom_bar(stat = "identity")`:
ggplot(a, aes(fill = Appliance, y = Total, x = Sales)) +
geom_col()
Geom_bar uses factors to create the stacks. You can see the levels present in your data with factor(a$Appliance). By default, these levels are sorted on alphabetic order. However, you can manually set the order of the levels as follows:
a$Appliance = factor(a$Appliance, levels=c("TV", "Radio", "Fridge", "Laptop"))
If you do this before creating your ggplot, you will have your desired order.
We could re-order factors based on sum, then plot, see example:
# reorder labels based on row sums
myFac <- SalesData$Appliance[ order(rowSums(SalesData[, 2:3])) ]
# wide-to-long, then reorder factor
a <- gather(SalesData, Sales, Total, ThisYear, LastYear) %>%
mutate(Appliance = factor(Appliance, labels = myFac, levels = myFac ))
# then plot
ggplot(a, aes(fill = Appliance, y = Total, x = Sales)) +
geom_bar(stat = "identity")

Resources