R: Visualize Multiple Columns with their Sum - r

I have a data frame with the following columns: product_id, ..., p1, p2, p3, ... etc. The p-columns only have 0 or 1 as their cell data.
I want a bar chart that sums up (or count) p1, p2 etc. and shows each p-column as a bar with the value of the sum (with ggplot).
Additionally I want to fill the color by product_id.
It seems like reshaping the data in the long format could be helpful, but I still stuck.
Here's the minimal data set, already reshaped:
product_id <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
p1 <- c(0, 0, 1, 1, 0, 0, 1, 0, 0)
p2 <- c(1, 0, 1, 0, 1, 0, 1, 1, 0)
p3 <- c(0, 0, 1, 1, 0, 1, 0, 1, 1)
df1 <- data.frame(product_id, p1, p2, p3)
df2 <- melt(df1, id.vars = "product_id",
measure.vars = grep("^p[0-9]", names(df1), value = TRUE),
variable.name = "p",
value.name = "p-active")

There are dozens of ggplot2 tutorials, but I'm feeling generous:
ggplot(df2,
#map columns to aesthetics:
aes(x = p, y = `p-active`,
#important to use a factor for discrete values:
fill = factor(product_id),
color = factor(product_id))) +
#summarize data:
stat_summary(fun.y = sum,
#the geom:
geom = "bar",
#positioning:
position = "dodge")

I'm not sure I understood exactly what you want, but I'll give it a try:
I changed the reshaping a bit, because it is not a good idea to use - in the name of a data frame column:
df2 <- melt(df1, id.vars = "product_id",
measure.vars = grep("^p[0-9]", names(df1), value = TRUE),
variable.name = "p",
value.name = "p_active")
The next step is to sum up the values in p_active per value for p and product_id:
library(dplyr)
df2_summed <- group_by(df2, product_id, p) %>%
summarise(p_active_summed = sum(p_active))
And finally, I create the plot:
library(ggplot2)
ggplot(df2_summed, aes(x = p, y = p_active_summed, fill = as.factor(product_id))) +
geom_col()

Related

R multiple boxplots in one plot

I have a question regarding multiple boxplots. Assume we have data structures like this:
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
My task is to create a boxplot of a and b for each group of c. However, it needs to be in the same plot. Ideally: Boxplot for a and b side by side for group 0 and next to it boxplot for a and b for group 1 and all together in one graphic.
I tried several things, but only the seperate plots are working:
boxplot(a~as.factor(c))
boxplot(b~as.factor(c))
But actually, that's not what I'm searching for. As it has to be one plot.
You can use the tidyverse package for this. You transform your data into long-format that you get three variables: "names", "values" and "group". After that you can plot your boxplots with ggplot():
value_a <- rnorm(100, 0, 1)
value_b <- rnorm(100, 0, 1)
group <- as.factor(rbinom(100, 1, 0.5))
data <- data.frame(value_a,value_b,group)
library(tidyverse)
data %>%
pivot_longer(value_a:value_b, names_to = "names", values_to = "values") %>%
ggplot(aes(y = values, x = group, fill = names))+
geom_boxplot()
Created on 2022-08-19 with reprex v2.0.2
Another option using lattice package with bwplot function:
library(tidyr)
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
df <- data.frame(a = a,
b = b,
c = c)
# make longer dataframe
df_long <- pivot_longer(df, cols = -c)
library(lattice)
bwplot(value ~ name | as.factor(c), df_long)
Created on 2022-08-19 with reprex v2.0.2
Noah has already given the ggplot2 answer that would also be my go to option. As you used the boxplot function in the question, this is how to approach it with boxplot. You should probably stay consistently within base or within ggplot2 for your publication/presentation.
First we transform the data to a long format (here an option without additional packages):
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
d <- data.frame(a, b, c)
d <- cbind(stack(d, select = c("a", "b")), c)
giving us
> head(d)
values ind c
1 -0.66905293 a 0
2 -0.28778381 a 0
3 0.29148347 a 1
4 0.81380406 a 0
5 -0.85681913 a 0
6 -0.02566758 a 0
With which we can then call boxplot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5))
The at argument controls the grouping and placement of the boxes. Contrary to ggplot2 you need to choose placing manually, but you also get very fine control of spacing very easily.
Slightly refined version of the plot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5),
col = c(2, 4), show.names = FALSE,
xlab = "")
axis(1, labels= c("c = 0", "c = 1"), at = c(1.5, 4.5))
legend("topright", fill = c(2, 4), legend = c("a", "b"))

Plot linegraph of dataset with variable number of column

Depending on the input of my code - a varying number of columns are populated. I am attempting to create a loop which will only populate the columns which are populated. However, i am struggling as the loop keep overwriting and only retains the last line plotted on the graph. I though printing the ggplot would help - but sadly not!
plot <- ggplot(plottable, aes(x = Date))
####for (i in 2:ncol(plottable)) {
for (i in 2:ncol(plottable)) {
Exposure <- assign(colnames(plottable)[i],plottable[,i])
plot <- plot +
geom_line(aes(y=Exposure, color = colnames(plottable)[i]))
print(plot)
}
plot
Data
structure(list(Date = structure(c(18078, 18079, 18080, 18081, 18082), class = "Date"), Zone9 = c(0, 0, 0, 0, 0), Zone6 = c(0, 0, 0, 0, 0), Zone4 = c(0, 0, 0, 0, 0), Zone3 = c(0, 969698.444, 969698.444, 969698.444, 969698.444), Zone2 = c(0, 0, 0, 0, 0), Zone11 = c(0, 15560719.2483794, 15560719.2483794, 15560719.2483794, 15560719.2483794), Zone10 = c(0, 2208064.625714, 2208064.625714, 2208064.625714, 2208064.625714), Zone1 = c(0, 0, 0, 0, 0)), row.names = c(NA, 5L), class = "data.frame")
Personally I would follow the approach using tidyr::pivot_longer as outlined by #AndyEggers. Nonetheless if you don't want to reshape your dataset you could add multiple geom layers to a plot using e.g. lapply or purrr::map like so:
ggplot(plottable, aes(x = Date)) +
lapply(names(plottable)[!names(plottable) %in% "Date"], function(x) {
geom_line(aes(y=.data[[x]], color = x))
})
Making use of ggplot2::economics as example data:
library(ggplot2)
ggplot(economics, aes(x = date)) +
lapply(names(economics)[!names(economics) %in% "date"], function(x) {
geom_line(aes(y=.data[[x]], color = x))
})
Your approach looks like something I would have tried before I got comfortable with tidyr and ggplot. I suggest a different approach that makes better use of these tools:
plottable %>%
pivot_longer(cols = -Date) %>%
ggplot(aes(x = Date, y = value, col = name)) +
geom_line()

Multiple horizontal barplots in one chart

I want to have two charts containing multiple horizontal bar graphs, each showing mean values of one of the two variables: fear and expectation. The bar graphs should be grouped by the dummies.
I have created single bar graphs with the mean values of fear and expectation grouped by each of the dummies but I don't know how to combine them properly.
x = data.frame(
id = c(1, 2, 3, 4, 5),
sex = c(1, 0, 1, 0, 1),
migration = c(0, 1, 0, 1, 0),
handicap = c(0, 1, 1, 1, 0),
east = c(0, 1, 1, 1, 0),
fear = c(1, 3, 4, 6, 3),
expectation = c(2, 3, 2, 5, 4))
I want to have it look like this basically:
https://ibb.co/3fz0GQ4
Any help would be greatly appreciated.
TO get to the plot you show, you will need to reshape a bit your data:
library(tidyverse)
x2 <- x%>%
gather(fear, expectation, key = "group", value = "value")%>%
gather(sex, migration, handicap, east, key = "dummies", value = "dum_value")%>%
group_by(group, dummies, dum_value)%>%
summarize(prop = mean(value))
Then you can easily get to the plot:
x2%>%
ggplot(aes(y= prop, x = dummies, fill = factor(dum_value)))+
geom_bar(stat = "identity", position = "dodge")+
coord_flip()+
facet_wrap(~group)

Project R - Barplot of occurrences of levels

I am struggling with some plots. I have a really big data.frame with some entries. To get an overview I will work with some test data.
Let's assume the following data:
Sender <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akz <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkz <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
data <- data.frame(Sender, Akz, NAkz)
I want to get a (stacked) barplot group by the column "Person". So for each person I want to illustrate the occurrences of the columns "A" and "NA". Means one bar represents the column "A" with 3 "0"s and 4 "1"s and next to this bar I want the column "NA" with 4 "0"s and 3 "1"s. Would be great if there is a possibility to have a legend and the total amount of each level.
Thanks and all the best
Peter
PS: Found a pictures which illustrates a cool barplot. But I am not able to create this since the work with integers and total amounts
Your data is a bit messed up, I trust this is what you wanted to post:
data:
Person <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akzept <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkzept <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
df <- data.frame(Person, Akzept, NAkzept)
The key to plotting in ggplot2 is to arrange the data in long format achieved by the function gather:
library(tidyverse)
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = interaction(var, Person), fill = val))
or perhaps:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
facet_wrap(~var)
with text:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
geom_text(stat = "count", aes(label = ..count.. , x = Person, group = val), position = "stack", vjust = 2, hjust = 0.5)+
facet_wrap(~var)

Remove inner margins from lattice plot

Thanks to the excellent answer in "Combine a ggplot2 object with a lattice object in one plot" and some further thoughts I could plot a lattice plot next to a ggplot:
library(ggplot2)
library(lattice)
library(gtools)
library(plyr)
library(grid)
library(gridExtra)
set.seed(1)
mdat <- data.frame(x = rnorm(100), y = rnorm(100), veryLongName = rnorm(100),
cluster = factor(sample(5, 100, TRUE)))
cols <- c("x", "y", "veryLongName")
allS <- adply(combinations(3, 2, cols), 1, function(r)
data.frame(cluster = mdat$cluster,
var.x = r[1],
x = mdat[[r[1]]],
var.y = r[2],
y = mdat[[r[2]]]))
sc <- ggplot(allS, aes(x = x, y = y, color = cluster)) + geom_point() +
facet_grid(var.x ~ var.y) + theme(legend.position = "top")
sc3d <- cloud(veryLongName ~ x + y, data = mdat, groups = cluster)
scG <- ggplotGrob(sc)
sc3dG <- gridExtra:::latticeGrob(sc3d)
ids <- grep("axis-(l|b)-(1|2)|panel", scG$layout$name)
scG$grobs[ids[c(2, 5, 8)]] <- list(nullGrob(), nullGrob(), nullGrob())
grid.newpage()
grid.draw(scG)
pushViewport(viewport(0, 0, width = .515, height = .46,
just = c("left", "bottom")))
grid.rect()
grid.draw(sc3dG)
As you can see in the picture there is quite some margin around the lattice plot and on top of it the axis label for the z-axis is cut (which is not the case is I plot the lattice plot alone).
So how can I get rid of this behaviour, thus how to solve the follwing two problems:
Get rid of the inner margin between the viewport and the lattice plot
Avoid that the label in the lattice plot is cut.
I tried to play with the clip option of the viewport but without success. So, what to do?
Update 2020
Edited code and answer to reflect new naming convention in the grob.
those settings are probably somewhere in ?xyplot, but I find it's faster to read the internet,
theme.novpadding <-
list(layout.heights =
list(top.padding = 0,
main.key.padding = 0,
key.axis.padding = 0,
axis.xlab.padding = 0,
xlab.key.padding = 0,
key.sub.padding = 0,
bottom.padding = 0),
axis.line = list(col = 0),
clip =list(panel="off"),
layout.widths =
list(left.padding = 0,
key.ylab.padding = 0,
ylab.axis.padding = 0,
axis.key.padding = 0,
right.padding = 0))
sc3d <- cloud(veryLongName ~ x + y, data = mdat, groups = cluster,
par.settings = theme.novpadding )
scG <- ggplotGrob(sc)
sc3dG <- grobTree(gridExtra:::latticeGrob(sc3d),
rectGrob(gp=gpar(fill=NA,lwd=1.2)))
ids <- grep("axis-(l|b)-(1|2)|panel", scG$layout$name)
scG$grobs[ids[c(5, 2, 8)]] <- list(nullGrob(), sc3dG, nullGrob())
grid.newpage()
grid.draw(scG)

Resources