Converting bar chart to pie chart in R - r

I have following data and code:
dd
grp categ condition value
1 A X P 2
2 B X P 5
3 A Y P 9
4 B Y P 6
5 A X Q 4
6 B X Q 5
7 A Y Q 8
8 B Y Q 2
>
>
dput(dd)
structure(list(grp = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("A", "B"), class = "factor"), categ = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("X", "Y"), class = "factor"),
condition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("P",
"Q"), class = "factor"), value = c(2, 5, 9, 6, 4, 5, 8, 2
)), .Names = c("grp", "categ", "condition", "value"), out.attrs = structure(list(
dim = structure(c(2L, 2L, 2L), .Names = c("grp", "categ",
"condition")), dimnames = structure(list(grp = c("grp=A",
"grp=B"), categ = c("categ=X", "categ=Y"), condition = c("condition=P",
"condition=Q")), .Names = c("grp", "categ", "condition"))), .Names = c("dim",
"dimnames")), row.names = c(NA, -8L), class = "data.frame")
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)
How can I convert this bar chart to pie chart? I want 4 pies here with their sizes corresponding to heights of respective bars here. I tried following but they did not work:
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar()
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar('y')
I also tried to make pie chart similar to Pie charts in ggplot2 with variable pie sizes but I am not able to manage with my data. Thanks for your help.

Using the same idea as in the link you posted, you could add a column size do your dataframe that would be the sum of the values for each group, and use that as the width argument:
library(dplyr)
dd<-dd %>% group_by(categ,grp) %>% mutate(size=sum(value))
ggplot(dd, aes(x=size/2,y=value,fill=condition,width=size))+geom_bar(position="fill",stat='identity')+facet_grid(grp~categ)+coord_polar("y")

You want the group and category both to be variables for the grid, and not inside any plot. Here are two different layouts. X ought to be any single item, string, or something else.
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(~grp+categ)+coord_polar("x")
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(grp~categ)+coord_polar("x")
Something strange happened with the top opening here, maybe its just my interface. Should get you going enough though!

Related

Function for referencing values associated with specific factor values

I have a fairly large list looking something like this, where I have the first two variables stored are factors
Product Vendor Sales Product sales share
a x 100
b y 200
a y 250
c y 700
a z 150
Ideally, I'd like to create a new column containing the vendors share of that product's total sales i.e. Share_{p=a,v=x} = 100/(100+250+150)
I figure lapply() would be viable but not sure how to write the function
> dput(list)
list(structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L,
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100,
200, 250, 700, 150)), class = "data.frame", row.names = c(NA,
-5L)))
Using dplyr package, you could calculate the total sales for each product, then calculate the vendor share based on individual vendor and total sales.
library(dplyr)
df %>%
group_by(Product) %>%
mutate(Total_Sales = sum(Sales),
Vendor_Share = Sales/Total_Sales)
A base R approach could use prop.table as an alternative:
df$Vendor_Share <- with(df, ave(Sales, Product, FUN = prop.table))
Output
Product Vendor Sales Vendor_Share
1 a x 100 0.2
2 b y 200 1.0
3 a y 250 0.5
4 c y 700 1.0
5 a z 150 0.3
Data
df <- structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L,
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100,
200, 250, 700, 150), Vendor_Share = c(0.2, 1, 0.5, 1, 0.3)), row.names = c(NA,
-5L), class = "data.frame")

How to make a color bar using three columns?

I have a table like following:
ID type group
A3EP 1 M
A3MA 2 M
A459 3 M
A3I1 5 M
A9D2 7 M
A3M9 4 M
A7XP 6 M
A4ZP 8 M
I want to make a color bar like following: Red color represents "group" and below that each color represents "type" and below that I want the "ID" names.
Can anyone please tell me how to do this? Thank you.
mypalette <- rainbow(8)
barplot(rep(0.5,8), width=1, space=0, col=mypalette, axes=F)
text(df$type-.5, .2, df$ID, srt=90)
rect(0, .4, 8, .5, col="red")
text(4, .45, "M")
Input data:
df <- structure(list(ID = structure(c(1L, 4L, 5L, 2L, 8L, 3L, 7L, 6L
), .Label = c("A3EP", "A3I1", "A3M9", "A3MA", "A459", "A4ZP",
"A7XP", "A9D2"), class = "factor"), type = 1:8, group = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "M")), .Names =
c("ID",
"type", "group"), row.names = c(NA, -8L), class = "data.frame")

R - individual categorical plot [duplicate]

This question already has answers here:
How to produce a heatmap with ggplot2?
(2 answers)
Closed 7 years ago.
I would simply like to represent a sequence of categorical states with different colours.
This kind of plot is also known as individual sequence plot (TraMineR).
I would like to use ggplot2.
My data simply look like this
> head(dta)
V1 V2 V3 V4 V5 id
1 b a e d c 1
2 d b a e c 2
3 b c a e d 3
4 c b a e d 4
5 b c e a d 5
with the personal id in the last column.
The plot looks like this.
Each letters (states) is represented by a colour. Basically, this plot visualise the successive states for each individual.
Blue is a, Red is b, Purple is c, Yellow is d and Brown is e.
Any idea how I could do this with ggplot2?
dta = structure(list(V1 = structure(c(1L, 3L, 1L, 2L, 1L), .Label = c("b",
"c", "d"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), V3 = structure(c(2L,
1L, 1L, 1L, 2L), .Label = c("a", "e"), class = "factor"), V4 = structure(c(2L,
3L, 3L, 3L, 1L), .Label = c("a", "d", "e"), class = "factor"),
V5 = structure(c(1L, 1L, 2L, 2L, 2L), .Label = c("c", "d"
), class = "factor"), id = 1:5), .Names = c("V1", "V2", "V3",
"V4", "V5", "id"), row.names = c(NA, -5L), class = "data.frame")
what I tried so far
nr = nrow(dta3)
nc = ncol(dta3)
# space
m = 0.8
n = 1 # do not touch this one
plot(0, xlim = c(1,nc*n), ylim = c(1, nr), type = 'n', axes = F, ylab = 'individual sequences', xlab = 'Time')
axis(1, at = c(1:nc*m), labels = c(1:nc))
axis(2, at = c(1:nr), labels = c(1:nr) )
for(i in 1:nc){
points(x = rep(i*m,nr) , y = 1:nr, col = dta3[,i], pch = 15)
}
But it is not with ggplot2 and not very satisfying.
Here you go:
library(reshape2)
library(ggplot2)
m_dta <- melt(dta,id.var="id")
m_dta
p1 <- ggplot(m_dta,aes(x=variable,y=id,fill=value))+
geom_tile()
p1

R program, ?count, rename "freq" to something else

I am studying this webpage, and cannot figure out how to rename freq to something else, say number of times imbibed
Here is dput
structure(list(name = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("Bill", "Llib"), class = "factor"), drink = structure(c(2L,
3L, 1L, 4L, 2L, 3L, 1L, 4L), .Label = c("cocoa", "coffee", "tea",
"water"), class = "factor"), cost = 1:8), .Names = c("name",
"drink", "cost"), row.names = c(NA, -8L), class = "data.frame")
And this is working code with output. Again, I'd like to rename the freq column. Thanks!
library(plyr)
bevs$cost <- as.integer(bevs$cost)
count(bevs, "name")
Output
name freq
1 Bill 4
2 Llib 4
Are you trying to do this?
counts <- count(bevs, "name")
names(counts) <- c("name", "number of times imbibed")
counts
The count() function returns a data.frame. Just rename it like any other data.frame:
counts <- count(bevs, "name")
names(counts)[which(names(counts) == "freq")] <- "number of times imbibed"
print(counts)
# name number of times imbibed
# 1 Bill 4
# 2 Llib 4

Calculating ratios by group with dplyr

Using the following dataframe I would like to group the data by replicate and group and then calculate a ratio of treatment values to control values.
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"),
replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four",
"one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"),
fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"),
quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
)), .Names = c("group", "treatment", "replicate", "fatty_acid_family",
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA,
-8L))
I have tried using dplyr as follows:
group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])
but this results in Error: incompatible size (%d), expecting %d (the group size) or 1
Initially I thought this might be because I was trying to create 4 ratios from a df 8 rows deep and so I thought summarise might be the answer (collapsing each group to one ratio) but that doesn't work either (my understanding is a shortcoming).
group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])
replicate group ratio
1 four case NA
2 four controls NA
3 one case NA
4 one controls NA
5 three case NA
6 three controls NA
7 two case NA
8 two controls NA
I would appreciate some advice on where I'm going wrong or even if this can be done with dplyr.
Thanks.
You can try:
group_by(dataIn, replicate) %>%
summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
# replicate ratio
#1 four 1.078562
#2 one 1.333333
#3 three 1.070573
#4 two 1.446449
Because you grouped by replicate and group, you could not access data from different groups at the same time.
#talat's answer solved for me. I created a minimal reproducible example to help my own understanding:
df <- structure(list(a = c("a", "a", "b", "b", "c", "c", "d", "d"),
b = c(1, 2, 1, 2, 1, 2, 1, 2), c = c(22, 15, 5, 0.2, 107,
6, 0.2, 4)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
# a b c
# 1 a 1 22.0
# 2 a 2 15.0
# 3 b 1 5.0
# 4 b 2 0.2
# 5 c 1 107.0
# 6 c 2 6.0
# 7 d 1 0.2
# 8 d 2 4.0
library(dplyr)
df %>%
group_by(a) %>%
summarise(prop = c[b == 1] / c[b == 2])
# a prop
# 1 a 1.466667
# 2 b 25.000000
# 3 c 17.833333
# 4 d 0.050000

Resources