Ggplot2: unique() does not work properly with dplyr piping - r

I have some problems with the unique() function when piping with dplyr. With my simple example code this works fine:
category <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4))
quality <- as.factor(c(0, 1, 2, 3, 3, 0, 0, 1, 3, 2, 2, 2, 1, 0, 3, 2, 3, 3, 1, 0, 2, 1))
mydata <- data.frame(category, quality)
This adjusts my dataframe so that it is easier to work with and produce a nice plot:
mydata2 <- mydata %>%
group_by(category, quality) %>%
mutate(count_q = n()) %>%
ungroup() %>%
group_by(category) %>%
mutate(tot_q = n(),pc = count_q*100 / tot_q) %>%
unique() %>%
arrange(category)
myplot <- ggplot(mydata2, aes(x = category, y = pc, fill = quality)) +
geom_col() +
geom_text(aes(
x = category,
y = pc,
label = round(pc,digits = 1),
group = quality),
position = position_stack(vjust = .5)) +
ggtitle("test") +
xlab("cat") +
ylab("%") +
labs("quality")
myplot
Looks exactly like I want:
However, with my actual data the same code produces this mess:
I did find a solution: when I add this line and use the new mydata.unique as the basis for my ggplot, it works exactly like with my example data. This is not needed in the example data for some reason, whereas in my actual data the unique() within piping seems to do nothing.
mydata.unique <- unique(mydata2[c("quality","category", "count_q", "tot_q", "pc")])
What I don't understand is why I need to add the above line. Obviously I can't share my actual data. Maybe someone still understands what this is about. Maybe it has to do with other (irrelevant) columns in the data that can't be processed by unique()?

Try with distinct() instead of unique(). And in this case, probably you need to summarise instead of mutate() + distinct()

If your original df has more variables, try this:
mydata2 <- mydata %>%
group_by(category, quality) %>%
mutate(count_q = n()) %>%
ungroup() %>%
group_by(category) %>%
mutate(tot_q = n(),pc = count_q*100 / tot_q) %>%
distinct(category, quality, count_q, tot_q, pc, .keep_all = TRUE) %>%
arrange(category)
Or maybe as mentioned by #adalvarez replace mutate with summarise.

Related

How to include bar with NAs in geom_histogram?

I am trying to create a histogram of a continuous variable (1-10) with a bar a little to the side that says how many NAs are in the vector. I am using geom_histogram() from ggplot2. Here is an example:
v <- data.frame(x=c(1, 2, 3, 4, 3, 2, 3, 4, 5, 3, 2, 1, NA, NA, NA, NA))
ggplot(v, aes(x=x)) +
geom_histogram()
I have looked through the features of the function but there doesn't seem to be a way to inlcude NAs and haven't found an elegant way of doing it from other questions. Thanks for the help.
I don't if it is a perfect solution but you can get the count of NA by using dplyr before plotting your data:
library(tidyverse)
v %>% group_by(x) %>% count(x) %>%
ggplot(aes(x = as.factor(x), y = n)) +
geom_bar(stat = "identity")

Sum variables in a dataframe and plot the sum in ggplot

I have dataframe that contains data on the number of TVs and radios owned by survey respondents now and before:
DF <- data.frame(TV_now = as.numeric(c(4, 9, 1, 0, 4, NA)),
TV_before = as.numeric(c(4, 1, 2, 4, 5, 2)),
Radio_now = as.numeric(c(4, 5, 1, 5, 6, 9)),
Radio_before = as.numeric(c(6, 5, 3, 6, 7, 10)))
I want to sum the total value of each variable and then create a barplot that shows the number of TVs and radios owned by survey respondents now and before.
I can manually create a new dataframe that contains just the sum of the value of each variable in the original DF
DFsum <- data.frame(TV_now = as.numeric(c(sum(DF$TV_now,na.rm = TRUE))),
TV_before = as.numeric(c(sum(DF$TV_before,na.rm = TRUE))),
Radio_now = as.numeric(c(sum(DF$TV_now,na.rm = TRUE))),
Radio_before = as.numeric(c(sum(DF$Radio_before,na.rm = TRUE))))
and then use tidyr to do the following:
library(tidyr)
library(ggplot2)
DFsum %>%
gather(key=Device, value=Number) %>%
ggplot(aes(x=Number,fill=Device)) +
geom_bar(aes(x = Device, y = Number), position = "dodge", stat = "identity")
This gives me the result I want, but seems unnecessarily complicated for what should be easy to achieve. Is there an easier way to plot this?
You can simplify your code with use of dplyr::mutate_all since you are summarizing all your columns:
library(tidyverse)
library(ggplot2)
DF %>% mutate_all(funs(sum), na.rm = TRUE) %>%
gather(key=Device, value=Number) %>%
ggplot(aes(x=Device,fill=Device)) +
geom_bar(aes(x = Device, y = Number), position = "dodge", stat = "identity")
Simplify data creation. R knows that 4, 9, 1, etc., are numbers, you don't need as.numeric.
DF <- data.frame(TV_now = c(4, 9, 1, 0, 4, NA),
TV_before = c(4, 1, 2, 4, 5, 2),
Radio_now = c(4, 5, 1, 5, 6, 9),
Radio_before = c(6, 5, 3, 6, 7, 10))
Simplify the data manipulation. Tidy your data (convert it to long format) first, then do other things:
DF_long = gather(DF, key = "device") %>%
group_by(device) %>%
summarize(number = sum(value, na.rm = TRUE))
Simplify the plotting. Aesthetics are inherited - you don't need to specify them multiple times. geom_col is preferred to geom_bar with stat = "identity". position = "dodge" does nothing when there is one group per x index.
ggplot(aes(x = device, y = number, fill = device)) +
geom_col()
I generally prefer to do my own data manipulation, but we can also lean on ggplots stacking bars to replace the summing, making the entire code:
gather(DF, key = "device", value = "number") %>%
ggplot(aes(x = device, y = number, fill = device)) +
geom_col()
Base approach
dev = colSums(DF, na.rm = TRUE)
barplot(dev, col = factor(names(dev)))

Which ggplot2 geom should I use?

I have a data frame.
id <- c(1:5)
count_big <- c(15, 25, 7, 0, 12)
count_small <- c(15, 9, 22, 11, 14)
count_black <- c(7, 12, 5, 2, 6)
count_yellow <- c(2, 0, 7, 4, 3)
count_red <- c(8, 4, 4, 2, 5)
count_blue <- c(5, 9, 6, 1, 7)
count_green <- c(8, 9, 7, 2, 5)
df <- data.frame(id, count_big, count_small, count_black, count_yellow, count_red, count_blue, count_green)
How can I display the following in ggplot2 and which geom should I use:
a breakdown of big and small variable by id
a breakdown of colors by id
This is just a subset of the data set that has around 1000 rows.
Can I use this df in ggplot2, or do I need to transform it into tidy data with tidyr? (don't know data.table yet)
You need to first restructure the data from wide to long with tidyr.
library(tidyr)
library(ggplot2)
df <- gather(df, var, value, starts_with("count"))
# remove count_
df$var <- sub("count_", "", df$var)
# plot big vs small
df_size <- subset(df, var %in% c("big", "small"))
ggplot(df_size, aes(x = id, y = value, fill = var)) +
geom_bar(stat = "identity", position = position_dodge())
# same routine for colors
df_color <- subset(df, !(var %in% c("big", "small")))
ggplot(df_color, aes(x = id, y = value, fill = var)) +
geom_bar(stat = "identity", position = position_dodge())
Use stat = "identity" to prevent it from doing a row count. position = position_dodge() is used to place the bars next to each other rather than stacked.

Project R - Barplot of occurrences of levels

I am struggling with some plots. I have a really big data.frame with some entries. To get an overview I will work with some test data.
Let's assume the following data:
Sender <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akz <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkz <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
data <- data.frame(Sender, Akz, NAkz)
I want to get a (stacked) barplot group by the column "Person". So for each person I want to illustrate the occurrences of the columns "A" and "NA". Means one bar represents the column "A" with 3 "0"s and 4 "1"s and next to this bar I want the column "NA" with 4 "0"s and 3 "1"s. Would be great if there is a possibility to have a legend and the total amount of each level.
Thanks and all the best
Peter
PS: Found a pictures which illustrates a cool barplot. But I am not able to create this since the work with integers and total amounts
Your data is a bit messed up, I trust this is what you wanted to post:
data:
Person <- c("ARD", "ZDF", "ARD", "ARD", "ZDF", "ZDF", "ARD")
Akzept <- as.factor(c(0, 1, 1, 0, 0, 1, 1))
NAkzept <- as.factor(c(1, 1, 1, 0, 0, 0, 0))
df <- data.frame(Person, Akzept, NAkzept)
The key to plotting in ggplot2 is to arrange the data in long format achieved by the function gather:
library(tidyverse)
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = interaction(var, Person), fill = val))
or perhaps:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
facet_wrap(~var)
with text:
df %>%
gather(var, val, Akzept:NAkzept) %>%
ggplot()+
geom_bar(aes(x = Person, fill = val))+
geom_text(stat = "count", aes(label = ..count.. , x = Person, group = val), position = "stack", vjust = 2, hjust = 0.5)+
facet_wrap(~var)

Stacked 100% barplots

I would like to visualise proportions of quantities like:
Four values are votes for great/good/moderate/bad
data1 <- c(4, 6, 0, 1)
data2 <- c(2, 0, 1, 15)
using R as stacked horizontal barplots where each of the two bars data1 and data2 is whole width of the chart and great / good / moderate / bad are in different colours / patterns, like:
XXXXXXXXOOOOOOOOOOOO%%
XX*%%%%%%%%%%%%%%%%%%%
I am using lots of other charts in R (besides automation, another reason to use it!), but I can't get the grasp how to do this one.
Perhaps something like this:
dat <-data.frame(data1,data2)
barplot(prop.table(as.matrix(dat), margin = 2), horiz = TRUE)
Here's a ggplot2 answer:
library(ggplot2)
data1 <- c(4, 6, 0, 1)
data2 <- c(2, 0, 1, 15)
MyData <- data.frame(DataSource= c(rep("data1",4),rep("data2",4)),
quality=rep(c("great","good","moderate","bad"),2),
Value=c(data1/sum(data1),data2/sum(data2)))
ggplot(data=MyData,aes(DataSource,Value,fill=quality))+geom_col()
I hope this can point you in the right direction:
data1 <- c(4, 6, 0, 1)
data2 <- c(2, 0, 1, 15)
data3 <- c("great","good","moderate","bad")
df <- data.frame(group1 = data1,group2 = data2, class = data3)
library(reshape2)
library(dplyr)
library(ggplot2)
df<- melt(df,"class")
df <- df %>% group_by(variable) %>% mutate(perc = value/sum(value))
ggplot(df, aes(x = variable, y = perc,fill=class)) +
geom_bar(stat='identity') + coord_flip()

Resources