how to plot overlay multiple time series given condition(s) in lattice? - r

Suppose I have a data frame, df, that looks like:
f t1 t2 t3
h 1 3 4
h 2 4 3
t 3 4 5
t 5 6 8
with f being a factor and $t attributes being numerical values related to time ordered events.
I could overlay time series t1 to t3 using par(new=T) and isolate by factor manually.
But I wonder if there is some way to do this with lattice, where the overlaid time series
are conditioned by the factor. So we would have two panels, with overlaid time series corresponding to conditional factors, f. Most examples I've seen only use one time series (vector) per factor. I also thought about using a parallel plot, but time information is lost.
I've also tried something like
xyplot(df$t1+df$t2+df$t3 ~seq(3) | factor(df$f))
, but it loses row sequence connections. Anyone know if this is possible?
Here's a very crude illustration using non lattice approach.
x<-matrix(seq(12),4,3)
f<-c('a','a','b','b')
df<-data.frame(f,x)
layout(1:2); yr<-c(0,12); xr<-c(1,3);
plot(as.numeric(df[1,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='A')
par(new=T)
plot(as.numeric(df[2,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='A')
plot(as.numeric(df[3,2:4])~seq(3), type='o',ylim=yr,xlim=xr,ylab='B')
par(new=T)
plot(as.numeric(df[4,2:4])~seq(3),type='o',ylim=yr,xlim=xr,ylab='B')

I added an ID variable and melted with package:reshape2
dat
f t1 t2 t3 ID
1 h 1 3 4 1
2 h 2 4 3 2
3 t 3 4 5 3
4 t 5 6 8 4
datm <- melt(dat, id.vars=c("ID","f"), measure.vars=c("t1", "t2", "t3"))
> datm
ID f variable value
1 1 h t1 1
2 2 h t1 2
3 3 t t1 3
4 4 t t1 5
5 1 h t2 3
6 2 h t2 4
7 3 t t2 4
8 4 t t2 6
9 1 h t3 4
10 2 h t3 3
11 3 t t3 5
12 4 t t3 8
Since you asked to have it "overlayed" I used the group parameter to keep the ID's separate and the "|" operator to give you the two panels for "h" and "t":
xyplot(value~variable|f, group=ID, data=datm, type="b")

(1) This can be done compactly using xyplot.zoo . The first statement converts the data frame to a zoo series (series are stored in columns in zoo objects) and the second statement plots it such that the screen argument defines which panel each series is shown in:
library(zoo)
library(lattice)
z <- zoo(t(df[-1]))
xyplot(z, screen = df$f, type = "o")
(2) or if it were desired to show df's column names on the X axis instead then define z as the following (and then issue the xyplot command above):
z <- zoo(t(df[-1])), factor(names(df[-1])))
xyplot using the z in the first point looks like this (and the second is the same except for the X axis labels):
EDIT: simplified (2)

Related

R plotly: Customize x-axis values in box plot

I have a data frame with 3 variables and 260 rows. (Sample below)
HouseID<-c(1:10)
Town<-c("D","A","B","C","A","B","C","C","C","A")
Occupants<-c(5,3,2,4,5,2,3,8,1,3)
df<-data.frame(HouseID,Town,Occupants)
HouseID Town Occupants
1 D 5
2 A 3
3 B 2
4 C 4
5 A 5
6 B 2
7 C 3
8 C 8
9 C 1
10 A 3
I want to create a box plot for the distribution of Occupants with the order of x-axis based on the descending order of frequencies of Towns
Town Freq
A 3
B 2
C 4
D 1
(Shown a sample image)
I tried sorting the data frame, but still, the box plot x-axis is displayed based on alphabetical order by default. Is there a way I could do this?
You simply have to use factor to reorder levels of df$Town according to their count summary(df$Town):
df$Town <- factor(df$Town, levels(df$Town)[order(summary(df$Town),decreasing = TRUE)])
plot_ly(df, x=~Town, y=~Occupants, type="box")

Reformatting data in order to plot 2D continuous heatmap

I have data stored in a data.frame that I would like to plot as a continuous heat map. I have tried using the interp function from akima package, but as the data can be very large (2 million rows) I would like to avoid this if possible as it takes a very long time. Here is the format of my data
l1 <- c(1,2,3)
grid1 <- expand.grid(l1, l1)
lprobdens <- c(0,2,4,2,8,10,4,8,2)
df <- cbind(grid1, lprobdens)
colnames(df) <- c("age1", "age2", "probdens")
age1 age2 probdens
1 1 0
2 1 2
3 1 4
1 2 2
2 2 8
3 2 10
1 3 4
2 3 8
3 3 2
I would like to format it in a length(df$age1) x length(df$age2) matrix. I gather that once it is formatted in this manner I would be able to use basic functions such as image to plot a 2D histogram continuous heat map similar to that created using the akima package. Here is how I think the transformed data should look. Please correct me if I am wrong.
1 2 3
1 0 2 4
2 2 8 8
3 4 10 2
It seems as though ldply but I can't seem to sort out how it works.
I forgot to mention, the $age information is always continuous and regular, such that the list age1 is equal to age2 but age1 >= age2. I guess this means that it may be classed as continuous data as it stands and doesn't require the interp function.
Ok I think I get it what you want. It just a matter of reshaping data with reshape s 'cast function. The value.var argument is just to avoid the warning message that R tried to guess the value to use. The result does not change if you omit it.
library(reshape2)
as.matrix(dcast(dat, age1 ~ age2, value.var = "probdens")[-1])
1 2 3
[1,] 0 2 4
[2,] 2 8 8
[3,] 4 10 2

R ggplot2 number of rows of the same values in a column

I'm new to R and plotting in R. This might be a very simple question but here it is,
Suppose I have a data frame like this:
a b c d
1 5 6 7
2 3 5 7
1 4 6 2
2 3 5 NA
1 4 4 2
2 2 4 2
1 2 5 1
2 3 4 NA
Here a, b, c, d are column names. I want to plot a bar chart that has values in column d on the x axis, and the number of rows with that value on y axis. So 7 has 2 rows, 1 has 1 and 2 has 3. It's not important to include missing values in between(3, 4, 5, 6).
So the result would be something like a histogram. I know I can do counting on column d and then do the plotting but I feel there must be a better way to do this.
Here's an approach--if I understand your question, columns A, B, and C are immaterial to what you are doing, which is plotting frequencies of column D.
library(ggplot2)
library(reshape)
##get frequencies of col d
test.summary<-table(test$d)
## re-shape the data
test.summary.m<-melt(test.summary)
ggplot(test.summary.m,aes(x=as.factor(Var.1),y=value))+
geom_bar(stat='identity')

R combine nx4 into nx2

I have a dataset that has 1 factors (4 levels). However each factor level and data is currently in its own column, with a factor level label at the top (Matrix of n by 4).
To do an anova I want to change this to a n by 2 with all the factor labels in column A and all the data in column B.
I could easily cut and paste this in Excel, then back into a csv- but assume there is a way to do this with cbind.
Sample data:
A B C D
2 4 6 8
3 5 7 9
What I require:
A 2
A 3
B 4
B 5
C 6
C 7
D 8
D 9
You should use stack:
stack(df) # where `df` is your data.frame
stack is better here but also:
library(reshape2)
melt(df)

sort and number within levels of a factor in r

if i have the following data frame G:
z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4
I am trying to get:
z type x y
3 a 6 3
2 a 5 2
1 a 4 1
4 b 1 2
5 b 0.9 1
6 c 4 1
I.e. i want to sort the whole data frame within the levels of factor type based on vector x. Get the length of of each level a = 3 b=2 c=1 and then number in a decreasing fashion in a new vector y.
My starting place is currently with sort()
tapply(y, x, sort)
Would it be best to first try and use sapply to split everything first?
There are many ways to skin this cat. Here is one solution using base R and vectorized code in two steps (without any apply):
Sort the data using order and xtfrm
Use rle and sequence to genereate the sequence.
Replicate your data:
dat <- read.table(text="
z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4
", header=TRUE, stringsAsFactors=FALSE)
Two lines of code:
r <- dat[order(dat$type, -xtfrm(dat$x)), ]
r$y <- sequence(rle(r$type)$lengths)
Results in:
r
z type x y
3 3 a 6.0 1
2 2 a 5.0 2
1 1 a 4.0 3
4 4 b 1.0 1
5 5 b 0.9 2
6 6 c 4.0 1
The call to order is slightly complicated. Since you are sorting one column in ascending order and a second in descending order, use the helper function xtfrm. See ?xtfrm for details, but it is also described in ?order.
I like Andrie's better:
dat <- read.table(text="z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4", header=T)
Three lines of code:
dat <- dat[order(dat$type), ]
x <- by(dat, dat$type, nrow)
dat$y <- unlist(sapply(x, function(z) z:1))
I Edited my response to adapt for the comments Andrie mentioned. This works but if you went this route instead of Andrie's you're crazy.

Resources