Aggregate values and display in barplot - r

I have the following matrix:
group,value
a,2
b,4
a,3
a,2
b,5
I want to aggregate it by group and visualize it in a barplot:
9 --
8
7 --
6
5
4
3
2
1
-------
a b
With
barplot(as.matrix(aggregate(csv[2], csv[1], sum)))
I get the following plot:
So both groups are on only 1 bar. How can display 2 bars (1 for every group)?

Set the group as rownames will produce 2 bars:
barplot(t(as.matrix((data.frame(aggregate(csv[2],csv[1],sum),row.names=1)))))

Related

Draw pie Chart from a data frame

I have data frame with two variables; attendance 4 5 6 7 2 5 7 8 and another with treatment A B B A A B B A. How do you make a pie chart comparing sum percentages of A and B in rstudio
Using dplyr and the pie function, we first group by treatment and do the total sum per group.
a = data.frame(attendance=c(4,5,6,7,2,5,7,8),
treatment=c("A","B","B","A","A","B","B","A"),
stringsAsFactors = FALSE)
A = a%>%group_by(treatment)%>%summarise(tot=sum(attendance))
pie(A$tot/sum(A$tot),labels=paste(A$treatment,round(A$tot/sum(A$tot),2)),main="Pie")

R plotly: Customize x-axis values in box plot

I have a data frame with 3 variables and 260 rows. (Sample below)
HouseID<-c(1:10)
Town<-c("D","A","B","C","A","B","C","C","C","A")
Occupants<-c(5,3,2,4,5,2,3,8,1,3)
df<-data.frame(HouseID,Town,Occupants)
HouseID Town Occupants
1 D 5
2 A 3
3 B 2
4 C 4
5 A 5
6 B 2
7 C 3
8 C 8
9 C 1
10 A 3
I want to create a box plot for the distribution of Occupants with the order of x-axis based on the descending order of frequencies of Towns
Town Freq
A 3
B 2
C 4
D 1
(Shown a sample image)
I tried sorting the data frame, but still, the box plot x-axis is displayed based on alphabetical order by default. Is there a way I could do this?
You simply have to use factor to reorder levels of df$Town according to their count summary(df$Town):
df$Town <- factor(df$Town, levels(df$Town)[order(summary(df$Town),decreasing = TRUE)])
plot_ly(df, x=~Town, y=~Occupants, type="box")

ggplot2 - Pie/Bar Chart from Multiple Columns in Data Frame

I have a data frame that looks like the below. I have variables three variables per observation and I would like to create a bar graph per observation for each of these three variables. However, ggplot2 doesn't appear to have a way to specify multiple columns from the same data frame. What is the correct way to graph this data?
Aiming for something similar to the image below from Wikimedia (with a graph for each observation). Source: https://commons.wikimedia.org/wiki/File:Article_count_(en-de-fr).png
x English German French
Sample 1 5 10 14
Sample 2 4 4 14
Sample 3 5 10 53
Don't know why there are 2 row's per x-value.
This makes no sense. What do you want to plot? The sum per A,B,C? The mean?
Assuming you want to take the mean: Just do
dat <- read.table(textConnection(
"x A B C
1 5 10 14
1 4 4 14
2 5 10 14
2 4 4 14
3 5 10 14
3 4 4 14
"), header=TRUE)
dat <- aggregate(. ~ x, data=dat, mean) # instead of mean you can take your function
require(reshape2)
dat_molten <- melt(dat,"x")
require(ggplot2)
ggplot(dat_molten, aes(x=variable, y=value)) +
geom_bar(stat="identity") +
facet_grid(.~x)

R ggplot2 number of rows of the same values in a column

I'm new to R and plotting in R. This might be a very simple question but here it is,
Suppose I have a data frame like this:
a b c d
1 5 6 7
2 3 5 7
1 4 6 2
2 3 5 NA
1 4 4 2
2 2 4 2
1 2 5 1
2 3 4 NA
Here a, b, c, d are column names. I want to plot a bar chart that has values in column d on the x axis, and the number of rows with that value on y axis. So 7 has 2 rows, 1 has 1 and 2 has 3. It's not important to include missing values in between(3, 4, 5, 6).
So the result would be something like a histogram. I know I can do counting on column d and then do the plotting but I feel there must be a better way to do this.
Here's an approach--if I understand your question, columns A, B, and C are immaterial to what you are doing, which is plotting frequencies of column D.
library(ggplot2)
library(reshape)
##get frequencies of col d
test.summary<-table(test$d)
## re-shape the data
test.summary.m<-melt(test.summary)
ggplot(test.summary.m,aes(x=as.factor(Var.1),y=value))+
geom_bar(stat='identity')

how to plot overlay multiple time series given condition(s) in lattice?

Suppose I have a data frame, df, that looks like:
f t1 t2 t3
h 1 3 4
h 2 4 3
t 3 4 5
t 5 6 8
with f being a factor and $t attributes being numerical values related to time ordered events.
I could overlay time series t1 to t3 using par(new=T) and isolate by factor manually.
But I wonder if there is some way to do this with lattice, where the overlaid time series
are conditioned by the factor. So we would have two panels, with overlaid time series corresponding to conditional factors, f. Most examples I've seen only use one time series (vector) per factor. I also thought about using a parallel plot, but time information is lost.
I've also tried something like
xyplot(df$t1+df$t2+df$t3 ~seq(3) | factor(df$f))
, but it loses row sequence connections. Anyone know if this is possible?
Here's a very crude illustration using non lattice approach.
x<-matrix(seq(12),4,3)
f<-c('a','a','b','b')
df<-data.frame(f,x)
layout(1:2); yr<-c(0,12); xr<-c(1,3);
plot(as.numeric(df[1,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='A')
par(new=T)
plot(as.numeric(df[2,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='A')
plot(as.numeric(df[3,2:4])~seq(3), type='o',ylim=yr,xlim=xr,ylab='B')
par(new=T)
plot(as.numeric(df[4,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='B')
I added an ID variable and melted with package:reshape2
dat
f t1 t2 t3 ID
1 h 1 3 4 1
2 h 2 4 3 2
3 t 3 4 5 3
4 t 5 6 8 4
datm <- melt(dat, id.vars=c("ID","f"), measure.vars=c("t1", "t2", "t3"))
> datm
ID f variable value
1 1 h t1 1
2 2 h t1 2
3 3 t t1 3
4 4 t t1 5
5 1 h t2 3
6 2 h t2 4
7 3 t t2 4
8 4 t t2 6
9 1 h t3 4
10 2 h t3 3
11 3 t t3 5
12 4 t t3 8
Since you asked to have it "overlayed" I used the group parameter to keep the ID's separate and the "|" operator to give you the two panels for "h" and "t":
xyplot(value~variable|f, group=ID, data=datm, type="b")
(1) This can be done compactly using xyplot.zoo . The first statement converts the data frame to a zoo series (series are stored in columns in zoo objects) and the second statement plots it such that the screen argument defines which panel each series is shown in:
library(zoo)
library(lattice)
z <- zoo(t(df[-1]))
xyplot(z, screen = df$f, type = "o")
(2) or if it were desired to show df's column names on the X axis instead then define z as the following (and then issue the xyplot command above):
z <- zoo(t(df[-1])), factor(names(df[-1])))
xyplot using the z in the first point looks like this (and the second is the same except for the X axis labels):
EDIT: simplified (2)

Resources