R plotly: Customize x-axis values in box plot - r

I have a data frame with 3 variables and 260 rows. (Sample below)
HouseID<-c(1:10)
Town<-c("D","A","B","C","A","B","C","C","C","A")
Occupants<-c(5,3,2,4,5,2,3,8,1,3)
df<-data.frame(HouseID,Town,Occupants)
HouseID Town Occupants
1 D 5
2 A 3
3 B 2
4 C 4
5 A 5
6 B 2
7 C 3
8 C 8
9 C 1
10 A 3
I want to create a box plot for the distribution of Occupants with the order of x-axis based on the descending order of frequencies of Towns
Town Freq
A 3
B 2
C 4
D 1
(Shown a sample image)
I tried sorting the data frame, but still, the box plot x-axis is displayed based on alphabetical order by default. Is there a way I could do this?

You simply have to use factor to reorder levels of df$Town according to their count summary(df$Town):
df$Town <- factor(df$Town, levels(df$Town)[order(summary(df$Town),decreasing = TRUE)])
plot_ly(df, x=~Town, y=~Occupants, type="box")

Related

Insert missing rows by factor level

I'm sure there's a simple solution to this problem, but I'm having trouble figuring it out. I have a data frame in the following format:
Number Category Type Count
1 X A 10
2 X B 14
3 Y B 3
4 Z A 14
"Type" is a factor with two levels, {A,B}, and each level gets at least one "Category" entry, (for simplicity, they are denoted XYZ here, but in my actual dataset there are too many to list). I would like the number of rows each Type has to match by Category:
Number Category Type Count
1 X A 10
2 X B 14
3 Y A <NA>
4 Y B 3
5 Z A 14
6 Z B <NA>
For instance, if Type A is listed in four rows of Category A, but Type B has no Category A listings, then four new rows of Category A, Type B should be created (with Count=NA). Similarly, if Type A gets four rows of Category A and Type B has two, then two new rows should be created.
I was able to find numerous answers on how to do this for missing dates in time series data using seq(), expand.grid(), and merge(), but I can't quite see how to do it in this case. I hope this is clear... Grateful for any help!
dat <- read.table(header = TRUE, text =
"Number Category Type Count
1 X A 10
2 X B 14
3 Y B 3
4 Z A 14")
Use expand.grid to make a master list and then merge:
alllevs <- do.call(expand.grid, lapply(dat[c("Type","Category")], levels))
merge(dat, alllevs, all.y=TRUE)
# Category Type Number Count
#1 X A 1 10
#2 X B 2 14
#3 Y A NA NA
#4 Y B 3 3
#5 Z A 4 14
#6 Z B NA NA

Aggregate values and display in barplot

I have the following matrix:
group,value
a,2
b,4
a,3
a,2
b,5
I want to aggregate it by group and visualize it in a barplot:
9 --
8
7 --
6
5
4
3
2
1
-------
a b
With
barplot(as.matrix(aggregate(csv[2], csv[1], sum)))
I get the following plot:
So both groups are on only 1 bar. How can display 2 bars (1 for every group)?
Set the group as rownames will produce 2 bars:
barplot(t(as.matrix((data.frame(aggregate(csv[2],csv[1],sum),row.names=1)))))

R ggplot2 number of rows of the same values in a column

I'm new to R and plotting in R. This might be a very simple question but here it is,
Suppose I have a data frame like this:
a b c d
1 5 6 7
2 3 5 7
1 4 6 2
2 3 5 NA
1 4 4 2
2 2 4 2
1 2 5 1
2 3 4 NA
Here a, b, c, d are column names. I want to plot a bar chart that has values in column d on the x axis, and the number of rows with that value on y axis. So 7 has 2 rows, 1 has 1 and 2 has 3. It's not important to include missing values in between(3, 4, 5, 6).
So the result would be something like a histogram. I know I can do counting on column d and then do the plotting but I feel there must be a better way to do this.
Here's an approach--if I understand your question, columns A, B, and C are immaterial to what you are doing, which is plotting frequencies of column D.
library(ggplot2)
library(reshape)
##get frequencies of col d
test.summary<-table(test$d)
## re-shape the data
test.summary.m<-melt(test.summary)
ggplot(test.summary.m,aes(x=as.factor(Var.1),y=value))+
geom_bar(stat='identity')

R combine nx4 into nx2

I have a dataset that has 1 factors (4 levels). However each factor level and data is currently in its own column, with a factor level label at the top (Matrix of n by 4).
To do an anova I want to change this to a n by 2 with all the factor labels in column A and all the data in column B.
I could easily cut and paste this in Excel, then back into a csv- but assume there is a way to do this with cbind.
Sample data:
A B C D
2 4 6 8
3 5 7 9
What I require:
A 2
A 3
B 4
B 5
C 6
C 7
D 8
D 9
You should use stack:
stack(df) # where `df` is your data.frame
stack is better here but also:
library(reshape2)
melt(df)

how to plot overlay multiple time series given condition(s) in lattice?

Suppose I have a data frame, df, that looks like:
f t1 t2 t3
h 1 3 4
h 2 4 3
t 3 4 5
t 5 6 8
with f being a factor and $t attributes being numerical values related to time ordered events.
I could overlay time series t1 to t3 using par(new=T) and isolate by factor manually.
But I wonder if there is some way to do this with lattice, where the overlaid time series
are conditioned by the factor. So we would have two panels, with overlaid time series corresponding to conditional factors, f. Most examples I've seen only use one time series (vector) per factor. I also thought about using a parallel plot, but time information is lost.
I've also tried something like
xyplot(df$t1+df$t2+df$t3 ~seq(3) | factor(df$f))
, but it loses row sequence connections. Anyone know if this is possible?
Here's a very crude illustration using non lattice approach.
x<-matrix(seq(12),4,3)
f<-c('a','a','b','b')
df<-data.frame(f,x)
layout(1:2); yr<-c(0,12); xr<-c(1,3);
plot(as.numeric(df[1,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='A')
par(new=T)
plot(as.numeric(df[2,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='A')
plot(as.numeric(df[3,2:4])~seq(3), type='o',ylim=yr,xlim=xr,ylab='B')
par(new=T)
plot(as.numeric(df[4,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='B')
I added an ID variable and melted with package:reshape2
dat
f t1 t2 t3 ID
1 h 1 3 4 1
2 h 2 4 3 2
3 t 3 4 5 3
4 t 5 6 8 4
datm <- melt(dat, id.vars=c("ID","f"), measure.vars=c("t1", "t2", "t3"))
> datm
ID f variable value
1 1 h t1 1
2 2 h t1 2
3 3 t t1 3
4 4 t t1 5
5 1 h t2 3
6 2 h t2 4
7 3 t t2 4
8 4 t t2 6
9 1 h t3 4
10 2 h t3 3
11 3 t t3 5
12 4 t t3 8
Since you asked to have it "overlayed" I used the group parameter to keep the ID's separate and the "|" operator to give you the two panels for "h" and "t":
xyplot(value~variable|f, group=ID, data=datm, type="b")
(1) This can be done compactly using xyplot.zoo . The first statement converts the data frame to a zoo series (series are stored in columns in zoo objects) and the second statement plots it such that the screen argument defines which panel each series is shown in:
library(zoo)
library(lattice)
z <- zoo(t(df[-1]))
xyplot(z, screen = df$f, type = "o")
(2) or if it were desired to show df's column names on the X axis instead then define z as the following (and then issue the xyplot command above):
z <- zoo(t(df[-1])), factor(names(df[-1])))
xyplot using the z in the first point looks like this (and the second is the same except for the X axis labels):
EDIT: simplified (2)

Resources