I would like to visualise proportions of quantities like:
Four values are votes for great/good/moderate/bad
data1 <- c(4, 6, 0, 1)
data2 <- c(2, 0, 1, 15)
using R as stacked horizontal barplots where each of the two bars data1 and data2 is whole width of the chart and great / good / moderate / bad are in different colours / patterns, like:
XXXXXXXXOOOOOOOOOOOO%%
XX*%%%%%%%%%%%%%%%%%%%
I am using lots of other charts in R (besides automation, another reason to use it!), but I can't get the grasp how to do this one.
Perhaps something like this:
dat <-data.frame(data1,data2)
barplot(prop.table(as.matrix(dat), margin = 2), horiz = TRUE)
Here's a ggplot2 answer:
library(ggplot2)
data1 <- c(4, 6, 0, 1)
data2 <- c(2, 0, 1, 15)
MyData <- data.frame(DataSource= c(rep("data1",4),rep("data2",4)),
quality=rep(c("great","good","moderate","bad"),2),
Value=c(data1/sum(data1),data2/sum(data2)))
ggplot(data=MyData,aes(DataSource,Value,fill=quality))+geom_col()
I hope this can point you in the right direction:
data1 <- c(4, 6, 0, 1)
data2 <- c(2, 0, 1, 15)
data3 <- c("great","good","moderate","bad")
df <- data.frame(group1 = data1,group2 = data2, class = data3)
library(reshape2)
library(dplyr)
library(ggplot2)
df<- melt(df,"class")
df <- df %>% group_by(variable) %>% mutate(perc = value/sum(value))
ggplot(df, aes(x = variable, y = perc,fill=class)) +
geom_bar(stat='identity') + coord_flip()
Related
I have a question regarding multiple boxplots. Assume we have data structures like this:
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
My task is to create a boxplot of a and b for each group of c. However, it needs to be in the same plot. Ideally: Boxplot for a and b side by side for group 0 and next to it boxplot for a and b for group 1 and all together in one graphic.
I tried several things, but only the seperate plots are working:
boxplot(a~as.factor(c))
boxplot(b~as.factor(c))
But actually, that's not what I'm searching for. As it has to be one plot.
You can use the tidyverse package for this. You transform your data into long-format that you get three variables: "names", "values" and "group". After that you can plot your boxplots with ggplot():
value_a <- rnorm(100, 0, 1)
value_b <- rnorm(100, 0, 1)
group <- as.factor(rbinom(100, 1, 0.5))
data <- data.frame(value_a,value_b,group)
library(tidyverse)
data %>%
pivot_longer(value_a:value_b, names_to = "names", values_to = "values") %>%
ggplot(aes(y = values, x = group, fill = names))+
geom_boxplot()
Created on 2022-08-19 with reprex v2.0.2
Another option using lattice package with bwplot function:
library(tidyr)
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
df <- data.frame(a = a,
b = b,
c = c)
# make longer dataframe
df_long <- pivot_longer(df, cols = -c)
library(lattice)
bwplot(value ~ name | as.factor(c), df_long)
Created on 2022-08-19 with reprex v2.0.2
Noah has already given the ggplot2 answer that would also be my go to option. As you used the boxplot function in the question, this is how to approach it with boxplot. You should probably stay consistently within base or within ggplot2 for your publication/presentation.
First we transform the data to a long format (here an option without additional packages):
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
d <- data.frame(a, b, c)
d <- cbind(stack(d, select = c("a", "b")), c)
giving us
> head(d)
values ind c
1 -0.66905293 a 0
2 -0.28778381 a 0
3 0.29148347 a 1
4 0.81380406 a 0
5 -0.85681913 a 0
6 -0.02566758 a 0
With which we can then call boxplot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5))
The at argument controls the grouping and placement of the boxes. Contrary to ggplot2 you need to choose placing manually, but you also get very fine control of spacing very easily.
Slightly refined version of the plot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5),
col = c(2, 4), show.names = FALSE,
xlab = "")
axis(1, labels= c("c = 0", "c = 1"), at = c(1.5, 4.5))
legend("topright", fill = c(2, 4), legend = c("a", "b"))
I have this data frame to construct some lines chart using ggplot2. lb is what I want my label to be on x-axis while each other variables (x0.6, x0.8, x0.9, x0.95, x0.99, and x0.999) will be against lb on the y-axis.
# my data
lb <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
x0.6 <- c(0.9200795, 0.9315084, 0.9099002, 0.9160192, 0.9121120, 0.9134098, 0.9130619, 0.9128494, 0.9144164)
x0.8 <- c(0.9804872, 1.0144678, 0.9856382, 0.9730490, 1.0032707, 1.0036311, 0.9726198, 0.9986403, 1.0022643)
x0.9 <- c(1.055256, 1.016159, 1.067242, 1.089894, 1.043502, 1.041497, 1.037738, 1.023274, 1.040536)
x0.95 <- c(1.058024, 1.105353, 1.069076, 1.061077, 1.095764, 1.096789, 1.096670, 1.121497, 1.109918)
x0.99 <- c(1.107258, 1.098061, 1.118248, 1.101253, 1.083208, 1.109715, 1.083704, 1.083704, 1.118057)
x0.999 <- c(1.110732, 1.119625, 1.121221, 1.087423, 1.093228, 1.094003, 1.108910, 1.112413, 1.096734)
#my datafram
pos11 <- data.frame(lb, x0.6, x0.8, x0.9, x0.95, x0.99, x0.999)
#load packages
library("reshape2")
library("ggplot2")
# this `R` CODE reshapes the data
long_pos11 <- melt(pos11, id="lb")
# Here is the `R` code that produces the `line-chart`
pos_line <- ggplot(data = long_pos11,
aes(x=AR, y=value, colour=variable)) +
geom_line()
I want the line-chart to show elements of the vector lb (1, 2, 3, 4, 5, 6, 7, 8, 9) on x-axis as its label just like date is 0n Plotting two variables as lines using ggplot2 on the same graph
Try this. As your variable is of numeric type you would need to set it as factor and then also add group to your aes() statement. Here the code:
library("reshape2")
library("ggplot2")
# this `R` CODE reshapes the data
long_pos11 <- melt(pos11, id="lb")
# Here is the `R` code that produces the `line-chart`
pos_line <- ggplot(data = long_pos11,
aes(x=factor(lb), y=value, colour=variable,group=variable)) +
geom_line()+xlab('lb')
Output:
We can also use pivot_longer
library(ggplot2)
library(tidyr)
library(dplyr)
pos11 %>%
pivot_longer(cols = -lb) %>%
mutate(lb = factor(lb)) %>%
ggplot(aes(x = lb, y = value, color = name, group = name)) +
geom_line() +
xlab('lb')
Thank you for this wonderful community. I am trying to create a loop to make lots of graphs. However, only the last graph is being shown.
Here is my dataset. I am essentially subsetting by gene and then trying to create graphs.
Here is my code.
data_before_heart <- subset(data_before, Organ == 0)
uniq <- unique(unlist(data_before_heart$Gene))
for (i in 1:length(uniq)){
data_1 <- subset(data_before_heart, Gene == uniq[i])
print(ggplot(data_1, aes(x=Drug, y=Expression)) +
geom_bar(stat="identity"))
}
Unfortunately this only generates the last graph. (there are 6 genes from 0, 1, 2, 3, 4, 5)
You code works fine for me with 'fake' data and generates the expected number of bar plots without any issues:
library(ggplot2)
library(dplyr)
data_before_heart <- data.frame(
Gene = as.vector(sapply(seq(1:10), function(x) x*rep(1, 5))) - 1,
Drug = c(0, seq(1:50) %% 5)[1:50],
Expression = runif(n = 50, min = 0, max = 1),
stringsAsFactors = FALSE
)
#data_before_heart <- subset(data_before, Organ == 0)
uniq <- unique(unlist(data_before_heart$Gene))
for (i in 1:length(uniq)){
data_1 <- subset(data_before_heart, Gene == uniq[i])
print(ggplot(data_1, aes(x=Drug, y=Expression)) +
geom_bar(stat="identity"))
}
Perhaps your data set has issues? Are there any error messages?
facet_wrap provides an elegant alternative:
data_before_heart %>%
ggplot(aes(x = Drug, y = Expression)) +
geom_bar(stat = "identity") +
facet_wrap(~ Gene, nrow = 5, ncol = 2, scales = "fixed")
I have a data frame.
id <- c(1:5)
count_big <- c(15, 25, 7, 0, 12)
count_small <- c(15, 9, 22, 11, 14)
count_black <- c(7, 12, 5, 2, 6)
count_yellow <- c(2, 0, 7, 4, 3)
count_red <- c(8, 4, 4, 2, 5)
count_blue <- c(5, 9, 6, 1, 7)
count_green <- c(8, 9, 7, 2, 5)
df <- data.frame(id, count_big, count_small, count_black, count_yellow, count_red, count_blue, count_green)
How can I display the following in ggplot2 and which geom should I use:
a breakdown of big and small variable by id
a breakdown of colors by id
This is just a subset of the data set that has around 1000 rows.
Can I use this df in ggplot2, or do I need to transform it into tidy data with tidyr? (don't know data.table yet)
You need to first restructure the data from wide to long with tidyr.
library(tidyr)
library(ggplot2)
df <- gather(df, var, value, starts_with("count"))
# remove count_
df$var <- sub("count_", "", df$var)
# plot big vs small
df_size <- subset(df, var %in% c("big", "small"))
ggplot(df_size, aes(x = id, y = value, fill = var)) +
geom_bar(stat = "identity", position = position_dodge())
# same routine for colors
df_color <- subset(df, !(var %in% c("big", "small")))
ggplot(df_color, aes(x = id, y = value, fill = var)) +
geom_bar(stat = "identity", position = position_dodge())
Use stat = "identity" to prevent it from doing a row count. position = position_dodge() is used to place the bars next to each other rather than stacked.
Using matplot, I can plot a line for each row of a dataframe at given x values. For example
set.seed(1)
df <- matrix(runif(20, 0, 1), nrow = 5)
matplot(t(df), type = "l", x = c(1, 3, 7, 9)) # c(1, 3, 7, 9) are the x-axis positions I'd like to plot along
# the line colours are not important
I'd like to use ggplot2 instead, but I'm not sure how best to replicate the outcome. Using melt I can rename the columns to the desired x values, as below. But is there a 'cleaner' approach that I'm missing?
df1 <- as.data.frame(df)
names(df1) <- c(1, 3, 7, 9) # rename columns to the desired x-axis values
df1$id <- 1:nrow(df1)
df1_melt <- melt(df1, id.var = "id")
df1_melt$variable <- as.numeric(as.character(df1_melt$variable)) # convert x-axis values from factor to numeric
ggplot(df1_melt, aes(x = variable, y = value)) + geom_line(aes(group = id))
Any help would be much appreciated. Thanks
Since ggplot2 is increasingly used as part of the tidyverse family of packages, I thought I would post a tidy approach.
# generate data
set.seed(1)
df <- matrix(runif(20, 0, 1), nrow = 5) %>% as.data.frame
# put x-values into a data.frame
x_df <- data.frame(col=c('V1', 'V2', 'V3', 'V4'),
x=c(1, 3, 7, 9))
# make a tidy version of the data and graph
df %>%
rownames_to_column %>%
gather(col, value, -rowname) %>%
left_join(x_df, by='col') %>%
ggplot(aes(x=x, y=value, color=rowname)) +
geom_line()
The key idea is to gather() the data into tidy format, so that instead of being 5 rows × 4 columns, the data is 20 rows × 1 value column along with a few other identifier columns (col, rowname and eventually x) in this particular case).
autoplot.zoo can do ggplot graphics of matrix data. Omit the facet argument if you want separate panels. The inputs are defined in the Note at the end.
library(ggplot2)
library(zoo)
z <- zoo(t(m), x) # use t so that series are columns
autoplot(z, facet = NULL) + xlab("x")
Note: The inputs used:
set.seed(1)
m <- matrix(runif(20, 0, 1), nrow = 5)
rownames(m) <- c("a", "b", "c", "d", "e")
x <- c(1, 3, 7, 9)