I have data frame with two variables; attendance 4 5 6 7 2 5 7 8 and another with treatment A B B A A B B A. How do you make a pie chart comparing sum percentages of A and B in rstudio
Using dplyr and the pie function, we first group by treatment and do the total sum per group.
a = data.frame(attendance=c(4,5,6,7,2,5,7,8),
treatment=c("A","B","B","A","A","B","B","A"),
stringsAsFactors = FALSE)
A = a%>%group_by(treatment)%>%summarise(tot=sum(attendance))
pie(A$tot/sum(A$tot),labels=paste(A$treatment,round(A$tot/sum(A$tot),2)),main="Pie")
Related
I have two columns, one age and another one is the percentage. I need to draw a graph that shows the distribution of the sum of percentages for each 5 years interval.
df$group <- cut(df$age, breaks = seq(0,120,by=5), right = TRUE)
I used the above code to group the age on every 5 intervals and then used group by(age)and summarize(sum=sum(percentage) to sum all percentages on every 5 intervals. However, I'm not able to do that as a "group by" can not work on a categorical variable, Do you know any better way?
if the df is:
df <- data.frame(age=c(2,4,6,8), percentage=c(2,3,6,7))
and it transformed to below by group by(age) and summarize(sum percentage)
age(0-5, 5-10), sum percentage(5,13)
but, I need the following:
age(5,10) ,sum percentage(5,13)
You can create a new variable with groups and then use aggregate to aggregate the percentage values by group and sum them up:
df = data.frame(age=c(2,4,6,8), percentage=c(2,3,6,7))
df$age.group = cut(df$age,seq(0,120,5))
sums = aggregate(percentage ~ age.group,FUN=sum,data=df)
The result will be:
> df
age percentage age.group
1 2 2 (0,5]
2 4 3 (0,5]
3 6 6 (5,10]
4 8 7 (5,10]
> sums
age.group percentage
1 (0,5] 5
2 (5,10] 13
I have a data frame with 3 variables and 260 rows. (Sample below)
HouseID<-c(1:10)
Town<-c("D","A","B","C","A","B","C","C","C","A")
Occupants<-c(5,3,2,4,5,2,3,8,1,3)
df<-data.frame(HouseID,Town,Occupants)
HouseID Town Occupants
1 D 5
2 A 3
3 B 2
4 C 4
5 A 5
6 B 2
7 C 3
8 C 8
9 C 1
10 A 3
I want to create a box plot for the distribution of Occupants with the order of x-axis based on the descending order of frequencies of Towns
Town Freq
A 3
B 2
C 4
D 1
(Shown a sample image)
I tried sorting the data frame, but still, the box plot x-axis is displayed based on alphabetical order by default. Is there a way I could do this?
You simply have to use factor to reorder levels of df$Town according to their count summary(df$Town):
df$Town <- factor(df$Town, levels(df$Town)[order(summary(df$Town),decreasing = TRUE)])
plot_ly(df, x=~Town, y=~Occupants, type="box")
I have the following matrix:
group,value
a,2
b,4
a,3
a,2
b,5
I want to aggregate it by group and visualize it in a barplot:
9 --
8
7 --
6
5
4
3
2
1
-------
a b
With
barplot(as.matrix(aggregate(csv[2], csv[1], sum)))
I get the following plot:
So both groups are on only 1 bar. How can display 2 bars (1 for every group)?
Set the group as rownames will produce 2 bars:
barplot(t(as.matrix((data.frame(aggregate(csv[2],csv[1],sum),row.names=1)))))
I have a data frame that looks like the below. I have variables three variables per observation and I would like to create a bar graph per observation for each of these three variables. However, ggplot2 doesn't appear to have a way to specify multiple columns from the same data frame. What is the correct way to graph this data?
Aiming for something similar to the image below from Wikimedia (with a graph for each observation). Source: https://commons.wikimedia.org/wiki/File:Article_count_(en-de-fr).png
x English German French
Sample 1 5 10 14
Sample 2 4 4 14
Sample 3 5 10 53
Don't know why there are 2 row's per x-value.
This makes no sense. What do you want to plot? The sum per A,B,C? The mean?
Assuming you want to take the mean: Just do
dat <- read.table(textConnection(
"x A B C
1 5 10 14
1 4 4 14
2 5 10 14
2 4 4 14
3 5 10 14
3 4 4 14
"), header=TRUE)
dat <- aggregate(. ~ x, data=dat, mean) # instead of mean you can take your function
require(reshape2)
dat_molten <- melt(dat,"x")
require(ggplot2)
ggplot(dat_molten, aes(x=variable, y=value)) +
geom_bar(stat="identity") +
facet_grid(.~x)
For example, I have two data sets showed below. Using position as X, and count as Y, how can I plot them out in different color lines within a single plot using ggplot2 geom_line?
dataset a:
position count
1 3
2 9
3 10
4 15
5 19
6 28
7 15
8 13
9 11
10 5
dataset b:
position count
1 4
2 8
3 16
4 17
5 19
6 10
The trick is to combine your two data frames into a single data frame. First, we create a new identifier column on each data frame:
a$dataset = "a"
b$dataset = "b"
Then we combine them
dd = rbind(a, b)
All that's left is to add geom_line but condition on the dataset number:
ggplot(dd) + geom_line(aes(position, count, colour=dataset))