I have a matrix:
mat<-matrix(NA, ncol=7,nrow=9)
mat[,1]<-c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9)
mat[,2]<-c(2,4,5,6,7,7,7,8,9)
mat[,3]<-c(2,48,63,72,81,100,100,100,100)
mat[,4]<-c(1,2,3,3,4,4,5,5,6)
mat[,5]<-c(1,2,6,7,8,8,9,10,10)
mat[,6]<-c(1,1,1,2,3,3,4,4,4)
mat[,7]<-c(1,1,1,3,4,4,4,5,5)
colnames(mat)<-c("facet","A1","A2","B1","B2","C1","C2")
facet A1 A2 B1 B2 C1 C2
[1,] 0.1 2 2 1 1 1 1
[2,] 0.2 4 48 2 2 1 1
[3,] 0.3 5 63 3 6 1 1
[4,] 0.4 6 72 3 7 2 3
[5,] 0.5 7 81 4 8 3 4
[6,] 0.6 7 100 4 8 3 4
[7,] 0.7 7 100 5 9 4 4
[8,] 0.8 8 100 5 10 4 5
[9,] 0.9 9 100 6 10 4 5
I would like to create the following plot:
Create 9 separate plots faceted by "facet".
Each plot should contain the following:
on the same position on the x axis plot A1 and A2 using points, i.e. (X=1, y=A1) and (X=1,y=A2)
on the same position on the x axis plot B1 and B2 using points, i.e. (X=2, y=B1) and (X=2,y=B2)
on the same position on the x axis plot C1 and C2 using points, i.e. (X=3, y=C1) and (X=3,y=C2)
How can this be done? I understand how to do faceting but I'm struggling with plotting the two values in the same position on the x axis and repeating for each A,B and C. can someone help?
First, reshape your matrix to a data frame in the long format:
library(reshape2)
dat <- melt(as.data.frame(mat), id.vars = "facet")
> head(dat)
# facet variable value
# 1 0.1 A1 2
# 2 0.2 A1 4
# 3 0.3 A1 5
# 4 0.4 A1 6
# 5 0.5 A1 7
# 6 0.6 A1 7
Then, create two variables based on the information in the column variable:
dat2 <- transform(dat, fac = substr(variable, 2, 2),
variable = substr(variable, 1, 1))
> head(dat2)
# facet variable value fac
# 1 0.1 A 2 1
# 2 0.2 A 4 1
# 3 0.3 A 5 1
# 4 0.4 A 6 1
# 5 0.5 A 7 1
# 6 0.6 A 7 1
Plot:
library(ggplot2)
ggplot(dat2, aes(x = variable, y = value)) +
geom_point(aes(colour = fac)) +
facet_wrap( ~ facet)
a <- cbind(mat[, 1], mat[, 2], 1, 1)
b <- cbind(mat[, 1], mat[, 3], 1, 2)
c <- cbind(mat[, 1], mat[, 4], 2, 1)
d <- cbind(mat[, 1], mat[, 5], 2, 2)
e <- cbind(mat[, 1], mat[, 6], 3, 1)
f <- cbind(mat[, 1], mat[, 7], 3, 2)
data <- as.data.frame(rbind(a, b, c, d, e, f))
colnames(data) <- c("facet", "value", "type", "time")
data$type <- factor(data$type, labels = c("A", "B", "C"))
ggplot(data, aes(y = value, x = type, fill = factor(time))) +
geom_point(aes(color = factor(time)),
position = position_jitter(w = 0.1, h = 0.0))+
facet_wrap(~facet)
Related
I would like to create a function in R that rounds numeric dataframes (or columns in a dataframe) depending on the number. If the number is less than 1, round to 1 decimal, but if it is greater than 1, round to 0 decimals.
This is what I have
data <- data.frame(x = c(1.111, 0.809, 5.55555, 0.567), y = c(0.235, 0.777, 4.55555555, 393.55))
round0 <- function(x) format(round(x, digits=0), nsmall = 0, trim = TRUE)
round0(data)
x y
1 1 0
2 1 1
3 6 5
4 1 394
# What I want
x y
1 1 0.2
2 0.8 1
3 6 5
4 1 394
> round0 <- function(x) ifelse(x<1,round(x,1),round(x))
> sapply(data,round0)
x y
[1,] 1.0 0.2
[2,] 0.8 0.8
[3,] 6.0 5.0
[4,] 0.6 394.0
You can use :
round0 <- function(x) ifelse(x < 1, format(round(x, 1), nsmall = 1), round(x))
data[] <- lapply(data, round0)
data
# x y
#1 1 0.2
#2 0.8 0.8
#3 6 5
#4 0.6 394
Note that this is only for display purpose and classes of columns are of type character. If you want to perform any mathematical calculation on it you need to convert it back to numeric.
Recently, I have found that I am using the following pattern over and over again. The process is:
cross-tabulate numeric variable by factor using table
create data frame from created table
add original numeric values to data frame (from row names (!))
remove row names
reorder columns of aggregated data frame
In R, it looks like this:
# Sample data
df <- data.frame(x = round(runif(100), 1),
y = factor(ifelse(runif(100) > .5, 1, 0),
labels = c('failure', 'success'))
)
# Get frequencies
dfSummary <- as.data.frame.matrix(table(df$x, df$y))
# Add column of original values from rownames
dfSummary$x <- as.numeric(rownames(dfSummary))
# Remove rownames
rownames(dfSummary) <- NULL
# Reorder columns
dfSummary <- dfSummary[, c(3, 1, 2)]
Is there anything more elegant in R, preferably using base functions? I know I can use sql to do this in single command - I think that it has to be possible to achieve similar behavior in R.
sqldf solution:
library(sqldf)
dfSummary <- sqldf("select
x,
sum(y = 'failure') as failure,
sum(y = 'success') as success
from df group by x")
An alternative with base R could be:
aggregate(. ~ x, transform(df, success = y == "sucess",
failure = y == "failure", y = NULL), sum)
# x success failure
#1 0.0 2 4
#2 0.1 6 8
#3 0.2 1 7
#4 0.3 5 4
#5 0.4 6 6
#6 0.5 3 3
#7 0.6 4 6
#8 0.7 6 6
#9 0.8 4 5
#10 0.9 6 7
#11 1.0 1 0
Your code modified as a function would be efficient compared to the other solutions in base R (so far). If you wanted the code in one-line, a "reshape/table" combo from base R could be used.
reshape(as.data.frame(table(df)), idvar='x', timevar='y',
direction='wide')
# x Freq.failure Freq.success
#1 0 3 2
#2 0.1 3 9
#3 0.2 5 5
#4 0.3 8 7
#5 0.4 5 3
#6 0.5 9 4
#7 0.6 3 6
#8 0.7 7 6
#9 0.8 3 1
#10 0.9 4 3
#11 1 0 4
In case you want to try data.table
library(data.table)
dcast.data.table(setDT(df), x~y)
# x failure success
# 1: 0.0 3 2
# 2: 0.1 3 9
# 3: 0.2 5 5
# 4: 0.3 8 7
# 5: 0.4 5 3
# 6: 0.5 9 4
# 7: 0.6 3 6
# 8: 0.7 7 6
# 9: 0.8 3 1
#10: 0.9 4 3
#11: 1.0 0 4
Update
I didn't notice the as.data.frame(table( converts to "factor" columns (thanks to #Hadley's comment). A workaround is:
res <- transform(reshape(as.data.frame(table(df), stringsAsFactors=FALSE),
idvar='x', timevar='y', direction='wide'), x= as.numeric(x))
data
set.seed(24)
df <- data.frame(x = round(runif(100), 1),
y = factor(ifelse(runif(100) > .5, 1, 0),
labels = c('failure', 'success'))
)
Benchmarks
set.seed(24)
df <- data.frame(x = round(runif(1e6), 1),
y = factor(ifelse(runif(1e6) > .5, 1, 0),
labels = c('failure', 'success'))
)
tomas <- function(){
dfSummary <- as.data.frame.matrix(table(df$x, df$y))
dfSummary$x <- as.numeric(rownames(dfSummary))
dfSummary <- dfSummary[, c(3, 1, 2)]}
doc <- function(){aggregate(. ~ x, transform(df,
success = y == "success", failure = y == "failure",
y = NULL), sum)}
akrun <- function(){reshape(as.data.frame(table(df)),
idvar='x', timevar='y', direction='wide')}
library(microbenchmark)
microbenchmark(tomas(), doc(), akrun(), unit='relative', times=20L)
Unit: relative
#expr min lq mean median uq max neval cld
#tomas() 1.000000 1.0000000 1.000000 1.000000 1.0000000 1.000000 20 a
#doc() 13.451037 11.5050997 13.082074 13.043584 12.8048306 19.715535 20 b
#akrun() 1.019977 0.9522809 1.012332 1.007569 0.9993835 1.533191 20 a
Updated with dcast.data.table
df1 <- copy(df)
akrun2 <- function() {dcast.data.table(setDT(df1), x~y)}
microbenchmark(tomas(), akrun2(), unit='relative', times=20L)
# Unit: relative
# expr min lq mean median uq max neval cld
# tomas() 6.493231 6.345752 6.410853 6.51594 6.502044 5.591753 20 b
# akrun2() 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 20 a
This should be relatively efficient. You cannot really suppress rownames in a dataframe, since they are a requirement of a valid dataframe
X <- table(df$x,df$y)
cbind( data.frame(x=rownames(X)), unclass(X) )
x failure success
0 0 5 3
0.1 0.1 6 1
0.2 0.2 7 8
0.3 0.3 7 3
0.4 0.4 6 6
0.5 0.5 6 4
0.6 0.6 2 5
0.7 0.7 2 7
0.8 0.8 3 7
0.9 0.9 4 6
1 1 2 0
I want to create a matrix with similarities based on two identifiers, consider following matrix:
x1 <- c(2,2,2,3,1,2,4,6,4)
y1 <- c(5,4,3,3,4,2,1,6,3)
x2 <- c(8,2,7,3,1,2,2,2,6)
y2 <- c(1,3,3,3,1,2,4,3,8)
x3 <- c(4,4,1,2,4,6,3,2,9)
y3 <- c(1,2,3,3,1,2,4,6,1)
id1 <- c("a","a","a","a","b","b","b","b","b")
id2 <- c(2002,2002,2003,2003,2002,2002,2003,2003,2003)
dat <- data.frame(x1,y1,x2,y2,x3,y3,id1,id2)
For the groups marked by id1 and id2 I want to create the euclidean distance (sqrt((x1a-x1b)^2+(y1a-y1b)^2 + ... + (y3a-y3b)^2)) between the lines in the dataset. In the best case, there would be a new variable that indicates the distances of each line to each other line with the same id1 and id2. Please note that different numbers of members can be in each group as for instance in 2003 in the b-group there are three cases.
Any advice would be great!!!
I think it would be a good idea first to distinguish the lines whose distances you want to calculate. For example, for id1 == b and id2 == 2003 you have 3 lines, and you want to calculate 3 different distances (between each possible pair). So let's first assign each of these a unique id.
f <- function(n) {
# Returns a vector
# 1, 2, 1, 3, ..., 1, n, 2, 3, 2, 4, ..., 2, n, ..., (n-1), n
m <- matrix(ncol = 2, nrow = n * (n-1) / 2)
m[, 1] <- rep(1:(n-1), (n-1):1)
m[, 2] <- unlist(lapply(2:n, function(x) x:n))
as.numeric(t(m))
}
# Alternatively,
# f <- function(n) {
# d <- expand.grid(a = 1:n, b = 1:n)
# d <- d[d$a < d$b, ]
# unlist(d)
# }
# but this is slower
# Using plyr...
library(plyr)
dat <- ddply(dat, .(id1, id2), function(d) {
d <- d[f(nrow(d)), ]
d$id3 <- paste0(d$id1, rep(1:(nrow(d) / 2), each = 2))
d
})
# ...or using base R
dat <- do.call(rbind,
by(dat, list(dat$id1, dat$id2), function(d) {
d <- d[f(nrow(d)), ]
d$id3 <- paste0(d$id1, rep(1:(nrow(d) / 2), each = 2))
d
}))
Now there will only be two lines for each (id3, id2) pair and you can calculate the differences as follows
# Using plyr
result <- ddply(dat, .(id3, id2), function(d) {
d <- d[paste0(rep(c("x", "y"), 3), 1:3)]
d$dist <- sqrt(sum((d[1, ] - d[2, ])^2))
d
})
# Base R
result <- do.call(rbind,
by(dat[paste0(rep(c("x", "y"), 3), 1:3)],
list(dat$id3, dat$id2),
function(d){
d$dist <- sqrt(sum((d[1, ] - d[2, ])^2))
d
}
))
result[c("id3", "id2")] <- dat[c("id3", "id2")]
result
# x1 y2 x3 y1 x2 y3 dist id3 id2
# 1 2 1 4 5 8 1 6.480741 a1 2002
# 2 2 3 4 4 2 2 6.480741 a1 2002
# 5 1 1 4 4 1 1 3.464102 b1 2002
# 6 2 2 6 2 2 2 3.464102 b1 2002
# 3 2 3 1 3 7 3 4.242641 a1 2003
# 4 3 3 2 3 3 3 4.242641 a1 2003
# 7 4 4 3 1 2 4 5.916080 b1 2003
# 8 6 3 2 6 2 6 5.916080 b1 2003
# 7.1 4 4 3 1 2 4 9.000000 b2 2003
# 9 4 8 9 3 6 1 9.000000 b2 2003
# 8.1 6 3 2 6 2 6 11.313708 b3 2003
# 9.1 4 8 9 3 6 1 11.313708 b3 2003
Maybe this could be helpful.
dist(dat[which(dat[,"id1"]=="a" & dat[,"id2"]=="2002"),], method ="euclidean")
dist(dat[which(dat[,"id1"]=="b" & dat[,"id2"]=="2003"),], method ="euclidean")
I got two large matrix with this format:
row.names 1 2 3 ... row.names 1 2 3 ....
A 0.1 0.2 0.3 A 1 1 1
B 0.4 0.9 0.3 B 2 3 1
C 0.9 0.9 0.4 C 1 3 1
.
And I want to obtain something like this:
X S CONF P
1 A 0.1 1
1 B 0.4 2
1 C 0.9 1
2 A 0.2 1
2 B ......
Getting the colnames in one column and repeat the rownames and the information per each of the column names.
Thank you so much
You can do this pretty easily with some rep and c work:
out <- data.frame(X = rep(colnames(conf), each = nrow(conf)),
S = rep(rownames(conf), ncol(conf)),
CONF = c(conf), P = c(P))
out
# X S CONF P
# 1 1 A 0.1 1
# 2 1 B 0.2 1
# 3 1 C 0.3 1
# 4 2 A 0.4 2
# 5 2 B 0.9 3
# 6 2 C 0.3 1
# 7 3 A 0.9 1
# 8 3 B 0.9 3
# 9 3 C 0.4 1
#Thomas had a similar approach (but one which matches the answer you show in your question). His answer looked like this:
cbind.data.frame(X = rep(colnames(conf), each=nrow(conf)),
S = rep(rownames(conf), times=nrow(conf)),
CONF = matrix(t(conf), ncol=1),
P = matrix(t(P), ncol=1))
Assuming we're talking about matrices, I would convert to a data frame, add the rownames as a column and then "melt" each data.frame...
conf <- matrix(
c(0.1, 0.4, 0.9,
0.2, 0.9, 0.9,
0.3, 0.3, 0.4),
ncol=3, byrow=T
)
rownames(conf) <- c("A", "B", "C")
colnames(conf) <- 1:3
P <- matrix(
c(1, 2, 1,
1, 3, 3,
1, 1, 1),
ncol=3, byrow=T
)
rownames(P) <- c("A", "B", "C")
colnames(P) <- 1:3
library(reshape)
conf <- cbind(as.data.frame(conf), "S"=rownames(conf))
P <- cbind(as.data.frame(P), "S"=rownames(P))
out <- merge(melt(conf, id="S"), melt(P, id="S"), by=c("variable", "S"))
colnames(out) <- c("X", "S", "CONF", "P")
I have a problem where I have a bunch of lengths and want to start at the origin (pretend I'm facing to the positive end of the y axis), I make a right and move positively along the x axis for the distance of length_i. At this time I make another right turn, walk the distance of length_i and repeat n times. I can do this but I think there's a more efficient way to do it and I lack a math background:
## Fake Data
set.seed(11)
dat <- data.frame(id = LETTERS[1:6], lens=sample(2:9, 6),
x1=NA, y1=NA, x2=NA, y2=NA)
## id lens x1 y1 x2 y2
## 1 A 4 NA NA NA NA
## 2 B 2 NA NA NA NA
## 3 C 5 NA NA NA NA
## 4 D 8 NA NA NA NA
## 5 E 6 NA NA NA NA
## 6 F 9 NA NA NA NA
## Add a cycle of 4 column
dat[, "cycle"] <- rep(1:4, ceiling(nrow(dat)/4))[1:nrow(dat)]
##For loop to use the information from cycle column
for(i in 1:nrow(dat)) {
## set x1, y1
if (i == 1) {
dat[1, c("x1", "y1")] <- 0
} else {
dat[i, c("x1", "y1")] <- dat[(i - 1), c("x2", "y2")]
}
col1 <- ifelse(dat[i, "cycle"] %% 2 == 0, "x1", "y1")
col2 <- ifelse(dat[i, "cycle"] %% 2 == 0, "x2", "y2")
dat[i, col2] <- dat[i, col1]
col3 <- ifelse(dat[i, "cycle"] %% 2 != 0, "x2", "y2")
col4 <- ifelse(dat[i, "cycle"] %% 2 != 0, "x1", "y1")
mag <- ifelse(dat[i, "cycle"] %in% c(1, 4), 1, -1)
dat[i, col3] <- dat[i, col4] + (dat[i, "lens"] * mag)
}
This gives the desired result:
> dat
id lens x1 y1 x2 y2 cycle
1 A 4 0 0 4 0 1
2 B 2 4 0 4 -2 2
3 C 5 4 -2 -1 -2 3
4 D 8 -1 -2 -1 6 4
5 E 6 -1 6 5 6 1
6 F 9 5 6 5 -3 2
Here it is as a plot:
library(ggplot2); library(grid)
ggplot(dat, aes(x = x1, y = y1, xend = x2, yend = y2)) +
geom_segment(aes(color=id), size=3, arrow = arrow(length = unit(0.5, "cm"))) +
ylim(c(-10, 10)) + xlim(c(-10, 10))
This seems slow and clunky. I'm guessing there's a better way to do this than the items I do in the for loop. What's a more efficient way to keep making programatic rights?
(As suggested by #DWin) Here is a solution using complex numbers, which is flexible to any kind of turn, not just 90 degrees (-pi/2 radians) right angles. Everything is vectorized:
set.seed(11)
dat <- data.frame(id = LETTERS[1:6], lens = sample(2:9, 6),
turn = -pi/2)
dat <- within(dat, { facing <- pi/2 + cumsum(turn)
move <- lens * exp(1i * facing)
position <- cumsum(move)
x2 <- Re(position)
y2 <- Im(position)
x1 <- c(0, head(x2, -1))
y1 <- c(0, head(y2, -1))
})
dat[c("id", "lens", "x1", "y1", "x2", "y2")]
# id lens x1 y1 x2 y2
# 1 A 4 0 0 4 0
# 2 B 2 4 0 4 -2
# 3 C 5 4 -2 -1 -2
# 4 D 8 -1 -2 -1 6
# 5 E 6 -1 6 5 6
# 6 F 9 5 6 5 -3
The turn variable should really be considered as an input together with lens. Right now all turns are -pi/2 radians but you can set each one of them to whatever you want. All other variables are outputs.
Now having a little fun with it:
trace.path <- function(lens, turn) {
facing <- pi/2 + cumsum(turn)
move <- lens * exp(1i * facing)
position <- cumsum(move)
x <- c(0, Re(position))
y <- c(0, Im(position))
plot.new()
plot.window(range(x), range(y))
lines(x, y)
}
trace.path(lens = seq(0, 1, length.out = 200),
turn = rep(pi/2 * (-1 + 1/200), 200))
(My attempt at replicating the graph here: http://en.wikipedia.org/wiki/Turtle_graphics)
I also let you try these:
trace.path(lens = seq(1, 10, length.out = 1000),
turn = rep(2 * pi / 10, 1000))
trace.path(lens = seq(0, 1, length.out = 500),
turn = seq(0, pi, length.out = 500))
trace.path(lens = seq(0, 1, length.out = 600) * c(1, -1),
turn = seq(0, 8*pi, length.out = 600) * seq(-1, 1, length.out = 200))
Feel free to add yours!
This is yet another method using complex numbers. You can rotate a vector "to the right" in the complex plane by multiplying by -1i. The code below makes the first traversal go in the positive X (the Re()-al axis) and each subsequent traversal would be rotated to the "right"
imVecs <- lengths*c(0-1i)^(0:3)
imVecs
# [1] 9+0i 0-5i -9+0i 0+9i 8+0i 0-5i -8+0i 0+7i 8+0i 0-1i -5+0i 0+3i 4+0i 0-7i -4+0i 0+2i
#[17] 3+0i 0-7i -5+0i 0+8i
cumsum(imVecs)
# [1] 9+0i 9-5i 0-5i 0+4i 8+4i 8-1i 0-1i 0+6i 8+6i 8+5i 3+5i 3+8i 7+8i 7+1i 3+1i 3+3i 6+3i 6-4i 1-4i
#[20] 1+4i
plot(cumsum(imVecs))
lines(cumsum(imVecs))
This is the approach to using complex plane rotations to do 45 degree turns to the right:
> sqrt(-1i)
[1] 0.7071068-0.7071068i
> imVecs <- lengths*sqrt(0-1i)^(0:7)
Warning message:
In lengths * sqrt(0 - (0+1i))^(0:7) :
longer object length is not a multiple of shorter object length
> plot(cumsum(imVecs))
> lines(cumsum(imVecs))
And the plot:
This isn't a pretty plot, but I've included it to show that this 'vectorized' coordinate calculation produces correct results which shouldn't be too hard to adapt to your needs:
xx <- c(1,0,-1,0)
yy <- c(0,-1,0,1)
coords <- suppressWarnings(cbind(x = cumsum(c(0,xx*dat$lens)),
y = cumsum(c(0,yy*dat$lens))))
plot(coords, type="l", xlim=c(-10,10), ylim=c(-10,10))
It might be useful to think about this in terms of distance and bearing. Distance is given by dat$lens, and bearing is the angle of movement relative to some arbitrary reference line (say, the x-axis). Then, at each step,
x.new = x.old + distance * cos(bearing)
y.new = y.old + distance * sin(bearing)
bearing = bearing + increment
Here, since we start at the origin and move in the +x direction, (x,y)=(0,0) and bearing starts at 0 degrees. A right turn is simply a bearing increment of -90 degrees (-pi/2 radians). So in R code, using your definition of dat:
x <-0
y <- 0
bearing <- 0
for (i in 1:nrow(dat)){
dat[i,c(3,4)] <- c(x,y)
length <- dat[i,2]
x <- x + length * cos(bearing)
y <- y + length * sin(bearing)
dat[i,c(5,6)] <- c(x,y)
bearing <- bearing - pi/2
}
This produces what you had and has the advantage that you can update it very simply to make left turns, or 45 degree turns, or whatever. You can even add a bearing.increment column to dat to create a random walk.
Very similar to Josh's solution:
lengths <- sample(1:10, 20, repl=TRUE)
x=cumsum(lengths*c(1,0,-1,0))
y=cumsum(lengths*c(0,1,0,-1))
cbind(x,y)
x y
[1,] 9 0
[2,] 9 5
[3,] 0 5
[4,] 0 -4
[5,] 8 -4
[6,] 8 1
[7,] 0 1
[8,] 0 -6
[9,] 8 -6
[10,] 8 -5
[11,] 3 -5
[12,] 3 -8
[13,] 7 -8
[14,] 7 -1
[15,] 3 -1
[16,] 3 -3
[17,] 6 -3
[18,] 6 4
[19,] 1 4
[20,] 1 -4
Base graphics:
plot(cbind(x,y))
arrows(cbind(x,y)[-20,1],cbind(x,y)[-20,2], cbind(x,y)[-1,1], cbind(x,y)[-1,2] )
This does highlight the fact that both Josh's and my solutions are "turning the wrong way", so you need to change the signs on our "transition matrices". And we probably should have started at (0,0), but You should have not trouble adapting this you your needs.