summarise column and add it's common values in R - r

I have a dataframe something like this and my end goal is to make a bar chart.
Here is the data frame.
a 5
a 7
b 23
b 12
c 21
c 21
c 27
I want to summarize the dataframe with the first column but want to add the values of the 2nd column and make a bar chart for the values of 2nd column. The resulting data frame should be :
a 12
b 35
c 69
I tried something like this but it does not work:
d %>%
group_by(V1) %>%
summarise(V2) %>%
ggplot(aes(x = V1, y = V2)) + geom_col()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

A simple base R option using barplot + aggregate
barplot(SumValue ~ ., aggregate(cbind(SumValue = Value) ~ ., df, sum))

Seems to be pretty straightforward. Let me know if this helps.
library(dplyr)
library(ggplot2)
#Converting your values into a dataframe
data <- data.frame("Key" = c("a","a","b","b","c","c","c"), "Value" = c(5,7,23,12,21,21,27))
data <- data %>%
group_by(Key) %>%
summarise(Value = sum(Value))
#Plot
ggplot(data, aes(x=Key, y=Value))+
geom_bar(stat="identity")

Related

Adding character values of a column in R

I have two columns i.e. square_id & Smart_Nsmart as given below.
I want to count(add) N's and S's against each square_id and ggplot the data i.e. plot square_id vs Smart_Nsmart.
square_id 1
1
2 2 2 2 3 3 3 3
Smart_Nsmart
S N N N S S N S S S
We can use count and then use ggplot to plot the frequency. Here, we are plotting it with geom_bar (as it is not clear from the OP's post)
library(dplyr)
library(ggplot2)
df %>%
count(square_id, Smart_Nsmart) %>%
ggplot(., aes(x= square_id, y = n, fill = Smart_Nsmart)) +
geom_bar(stat = 'identity')
The above answer is very smart. However, instead of count function, you can implement group_by and summarise just in case in future you want to apply some other functions to your code.
library(dplyr)
library(ggplot2)
dff <- data.frame(a=c(1,1,1,1,2,1,2),b=c("C","C","N","N","N","C","N"))
dff %>%
group_by(a,b) %>%
summarise(n = length(b) ) %>%
ggplot(., aes(x= a, y = n, fill = b)) +
geom_bar(stat = 'identity')

Plot divergent stacked bar chart with ggplot2

Is there a way to use ggplot2 to create divergent stacked bar charts like the one on the right-hand side of the image below?
Data for reproducible example
library(ggplot2)
library(scales)
library(reshape)
dat <- read.table(text = " ONE TWO THREE
1 23 234 324
2 34 534 12
3 56 324 124
4 34 234 124
5 123 534 654",sep = "",header = TRUE)
# reshape data
datm <- melt(cbind(dat, ind = rownames(dat)), id.vars = c('ind'))
# plot
ggplot(datm,aes(x = variable, y = value,fill = ind)) +
geom_bar(position = "fill",stat = "identity") +
coord_flip()
Sure, positive values stack positively, negative values stack negatively. Don't use position fill. Just define what you want as negative values, and actually make them negative. Your example only has positive scores. E.g.
ggplot(datm, aes(x = variable, y = ifelse(ind %in% 1:2, -value, value), fill = ind)) +
geom_col() +
coord_flip()
If you want to also scale to 1, you need some preprocessing:
library(dplyr)
datm %>%
group_by(variable) %>%
mutate(value = value / sum(value)) %>%
ggplot(aes(x = variable, y = ifelse(ind %in% 1:2, -value, value), fill = ind)) +
geom_col() +
coord_flip()
An extreme approach might be to calculate the boxes yourself. Here's one method
dd <- datm %>% group_by(variable) %>%
arrange(desc(ind)) %>%
mutate(pct = value/sum(value), right = cumsum(pct), left=lag(right, default=0))
then you can plot with
ggplot(dd) +
geom_rect(aes(xmin=right, xmax=left, ymin=as.numeric(variable)-.4, ymax=as.numeric(variable)+.4, fill=ind)) +
scale_y_continuous(labels=levels(dd$variable), breaks=1:nlevels(dd$variable))
to get the left plot. and to get the right, you just shift the boxes a bit. This will line up all the right edges of the ind 3 boxes.
ggplot(dd %>% group_by(variable) %>% mutate(left=left-right[ind==3], right=right-right[ind==3])) +
geom_rect(aes(xmin=right, xmax=left, ymin=as.numeric(variable)-.4, ymax=as.numeric(variable)+.4, fill=ind)) +
scale_y_continuous(labels=levels(dd$variable), breaks=1:nlevels(dd$variable))
So maybe overkill here, but you have a lot of control this way.

R barplot with grouping

I checked ggplot specs and looks like I need transpose my data to build bar plot, or there is still an option to use with that df, so I actually can use column names in groupings, I mocked up image for demo below, on ggplot still can't get where we do groupings, or we can list them with comma ? Tx all
df1 <- data.frame(yy=2017, F1=23, F2=40, F3=4)
df2 <- data.frame(yy=2018, F1=16, F2=90, F3=8)
df <- rbind(df1,df2)
df
yy F1 F2 F3
1 2017 23 40 4
2 2018 16 90 8
ggplot(df, aes(F1, yy)) + ## this is just bad sample
geom_bar(aes(fill = yy), stat = "identity", position = "dodge")
library(tidyverse)
df1 <- data.frame(yy=2017, F1=23, F2=40, F3=4)
df2 <- data.frame(yy=2018, F1=16, F2=90, F3=8)
df <- rbind(df1,df2)
df %>%
gather(type,value,-yy) %>% # reshape data
mutate(yy = factor(yy)) %>% # update variable to a factor
ggplot(aes(type, value, fill=yy)) +
geom_bar(stat = "identity", position = "dodge")

is the merging of data frames necessary here

I have a data frame and I would like to plot 3 lines all from the "Value"
vector. The First two lines are the value vector grouped by the "group" and the 3rd line is the UNGROUPED value vector. The way I am currently doing it is by doing 2 calls to DPLYR and creating 2 data frames, then merging them and then plotting the merged data frame. Is there an easier way that avoids 2 calls to DPLYR?
d = data.frame(ym = rep(c(20011,20012,20023),3), group = c(0,0,1,0,1,0,1,0,1), value = c(1,2,3,4,2,1,3,3,2))
############### 1st call to dplyr to create plot with 2 lines grouped by "group"
d2 = d %>%
group_by(ym,group) %>%
summarise(
Value = mean(value)
)
d2= as.data.frame(d2)
d2
ggplot(data=d2 , aes(x=ym, y=Value, group=as.factor(group), colour = as.factor(group))) +
geom_line() + geom_point()
###second call to dplyr to create a second data frame just for the UNGROUPED data
d3 = d %>%
group_by(ym) %>%
summarise(
Value = mean(value)
)
#### merge the data TWO frames
d3 =as.data.frame(d3)
d3$group=2
d4 = rbind(d2,d3)
### plot all 3 lines
ggplot(data=d4 , aes(x=ym, y=Value, group=as.factor(group), colour = as.factor(group))) +
geom_line() + geom_point()
You could do it in a single dplyr chain, but (AFAIK) it still requires two separate operations:
d2 = bind_rows(
d %>%
group_by(ym, group=as.character(group)) %>%
summarise(Value = mean(value)),
d %>%
group_by(ym) %>%
summarise(Value = mean(value),
group = "All"))
The code group=as.character(group) is necessary to avoid an error when you add group="All", because bind_rows won't automatically coerce group from numeric to character. (This step is of course unnecessary in cases where the grouping column is already factor or character.)
Then, for plotting you can highlight the average line so that it's separate from the individual groups. We map to shape solely to be able to remove the point markers for the All line:
ggplot(d2 , aes(x=ym, y=Value, colour=group)) +
geom_line(aes(size=group)) +
geom_point(aes(shape=group)) +
scale_color_manual(values=c(hcl(c(15,195),100,65), "black")) +
scale_shape_manual(values=c(16,16,NA)) +
scale_size_manual(values=c(0.7,0.7,1.5))

Sort stacked bar plot by cumulative value in R

I am pretty new to R and i'm trying to get a stacked bar plot. My data looks like this:
name value1 value2
1 A 1118 239
2 B 647 31
3 C 316 1275
4 D 2064 230
5 E 231 85
I need a horizontal bar graph with stacked values, this is as far as i can get with my limited R skills (and most of that is also copy-pasted):
melted <- melt(data, id.vars=c("name"))
melted$name <- factor(
melted$name,
levels=rev(sort(unique(melted$name))),
ordered=TRUE
)
melted2 <- melted[order(melted$value),]
ggplot(melted2, aes(x= name, y = value, fill = variable)) +
geom_bar(stat = "identity") +
coord_flip()
It even took me several hours to get to this point, with witch I am pretty content as far as looks go, this is the produced output
What I now want to do is to get the bars ordered by summed up value (D is first, followed by C, A, B, E). I googled and tried some reorder and order stuff, but I simply can't get it to behave like I want it to. I'm sure the solution has to be pretty simple, so I hope you guys can help me with this.
Thanks in advance!
Well, I am not down or keeping up with all the latest changes in ggplot, but here is one way you could remedy this
I used your idea to set up the factor levels of name but based on the grouped sums. You might also find order = variable useful at some point, which will order the bar colors based on the variable, but not needed here
data <- read.table(header = TRUE, text = "name value1 value2
1 A 1118 239
2 B 647 31
3 C 316 1275
4 D 2064 230
5 E 231 85")
library('reshape2')
library('ggplot2')
melted <- melt(data, id.vars=c("name"))
melted <- within(melted, {
name <- factor(name, levels = names(sort(tapply(value, name, sum))))
})
levels(melted$name)
# [1] "E" "B" "A" "C" "D"
ggplot(melted, aes(x= name, y = value, fill = variable, order = variable)) +
geom_bar(stat = "identity") +
coord_flip()
Another option would be to use the dplyr package to set up a total column in your data frame and use that to sort.
The approach would look something like this.
m <- melted %>% group_by(name) %>%
mutate(total = sum(value) ) %>%
ungroup() %>%
arrange(total) %>%
mutate(name = factor(name, levels = unique(as.character(name))) )
ggplot(m, aes(x = name, y = value, fill = variable)) + geom_bar(stat = 'identity') + coord_flip()
Note that trying below code.
using tidyr package instead to reshape2 package
library(ggplot2)
library(dplyr)
library(tidyr)
data <- read.table(text = "
class value1 value2
A 1118 239
B 647 31
C 316 1275
D 2064 230
E 231 85", header = TRUE)
pd <- gather(data, key, value, -class) %>%
mutate(class = factor(class, levels = tapply(value, class, sum) %>% sort %>% names))
pd %>% ggplot(aes(x = class, y = value, fill = key, order = class)) +
geom_bar(stat = "identity") +
coord_flip()

Resources