Within a data frame. I want to compare today's value against a look back 'n' day period.
I know how to do it in excel for comparing today's value to see if it was higher than the previous 10 days.
=IF(A11>MAX(A1:A10),1,0)
How can I do the same logic within a function in R?
The output would look like this below:
Column Output
1 12 NA
2 13 NA
3 14 NA
4 15 NA
5 9 NA
6 9 NA
7 7 NA
8 8 NA
9 16 NA
10 17 NA
11 20 1
12 14 0
13 9 0
14 8 0
15 6 0
16 5 0
17 28 1
In row 11. Because value 20 is higher than the previous 10 days it denotes a 1 value.
In row 12, because value 14 is not the highest number in previous 10 days it receives a 0 value.
And it of course rolls on moving 10 day window.
P Lapointe's answer is great but anytime I'm doing a 'rolling' calculation my first instinct is to think of rollapply from the zoo package.
is_last_greatest <- function(x){
#' Given an input vector this returns
#' 1 if the last element is greater than
#' all of the other elements and 0 otherwise
ifelse(all(tail(x,1) > head(x,-1)), 1, 0)
}
# We want to compare to the previous 10 values but the function
# I wrote requires us to include the value we're using as
# comparison so I set the width to 11
output <- rollapply(dat,
width = 11,
FUN = is_last_greatest,
fill = NA,
align = "right")
cbind(dat, output)
which gives
dat vals
[1,] 12 NA
[2,] 13 NA
[3,] 14 NA
[4,] 15 NA
[5,] 9 NA
[6,] 9 NA
[7,] 7 NA
[8,] 8 NA
[9,] 16 NA
[10,] 17 NA
[11,] 20 1
[12,] 14 0
[13,] 9 0
[14,] 8 0
[15,] 6 0
[16,] 5 0
[17,] 28 1
Here's how to do that with roll_maxr from RcppRoll.
library(RcppRoll)
df$Output2 <- ifelse(df$Column>roll_maxr(lag(df$Column),11, na.rm = TRUE),1,0)
Column Output Output2
1 12 NA NA
2 13 NA NA
3 14 NA NA
4 15 NA NA
5 9 NA NA
6 9 NA NA
7 7 NA NA
8 8 NA NA
9 16 NA NA
10 17 NA NA
11 20 1 1
12 14 0 0
13 9 0 0
14 8 0 0
15 6 0 0
16 5 0 0
17 28 1 1
data
df <- read.table(text=" Column Output
1 12 NA
2 13 NA
3 14 NA
4 15 NA
5 9 NA
6 9 NA
7 7 NA
8 8 NA
9 16 NA
10 17 NA
11 20 1
12 14 0
13 9 0
14 8 0
15 6 0
16 5 0
17 28 1",header=TRUE,stringsAsFactors=FALSE)
Here is a base R method using embed to construct the comparison sets and performing the comparisons with apply.
# get a matrix for comparisons
myMat <- embed(df$Column, 11)
Here, this returns
myMat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 20 17 16 8 7 9 9 15 14 13 12
[2,] 14 20 17 16 8 7 9 9 15 14 13
[3,] 9 14 20 17 16 8 7 9 9 15 14
[4,] 8 9 14 20 17 16 8 7 9 9 15
[5,] 6 8 9 14 20 17 16 8 7 9 9
[6,] 5 6 8 9 14 20 17 16 8 7 9
[7,] 28 5 6 8 9 14 20 17 16 8 7
So the goal is to compare the value in the first column with those in the remaining column for each row.
as.integer(max.col(myMat) == 1L)
[1] 1 0 0 0 0 0 1
Now, tack on the appropriate number of NA values, which is the number of columns in myMat minus one.
df$output2 <- c(rep(NA, ncol(myMat) - 1), as.integer(max.col(myMat) == 1L))
this returns
df
Column Output output2
1 12 NA NA
2 13 NA NA
3 14 NA NA
4 15 NA NA
5 9 NA NA
6 9 NA NA
7 7 NA NA
8 8 NA NA
9 16 NA NA
10 17 NA NA
11 20 1 1
12 14 0 0
13 9 0 0
14 8 0 0
15 6 0 0
16 5 0 0
17 28 1 1
An advantage of max.col is that it is quite fast. One of its biggest drawbacks is that it does not have an na.rm argument to remove missing values. In the case that there are missing values, here is a method using apply on myMat instead of max.col.
apply(myMat, 1, function(x) as.integer(all(head(x, 1) > tail(x, -1))))
The operating comparison function here is
all(head(x, 1) > tail(x, -1))
Functions that produce the same result include the following
head(x, 1) == max(x) # or
x[1] == max(x)
and
1L == which.max(x)
Related
My sample data looks like this
DF
n a b c d
1 NA NA NA NA
2 1 2 3 4
3 5 6 7 8
4 9 NA 11 12
5 NA NA NA NA
6 4 5 6 NA
7 8 9 10 11
8 12 13 15 16
9 NA NA NA NA
I need to substract row 2 from row 3 and row 4.
Similarly i need to subtract row 6 from row 7 and row 8
My real data is huge, is there a way of doing it automatically. It seems it could be some for loop but as I am dummy R user my trials were not successful.
Thank you for any help and tips.
UPDATE
I want to achieve something like this
DF2
rowN1<-DF$row3-DF$row2
rowN2<-DF$row4-DF$row2
rowN3<-DF$row7-DF$row6 # there is NA in row 6 so after subtracting there should be NA also
rowN4<-DF$row8-DF$row6
Here's one idea
set.seed(1)
(m <- matrix(sample(c(1:9, NA), 60, T), ncol=5))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 7 3 8 8
# [2,] 4 4 4 2 7
# [3,] 6 8 1 8 5
# [4,] NA 5 4 5 9
# [5,] 3 8 9 9 5
# [6,] 9 NA 4 7 3
# [7,] NA 4 5 8 1
# [8,] 7 8 6 6 1
# [9,] 7 NA 5 6 4
# [10,] 1 3 2 8 6
# [11,] 3 7 9 1 7
# [12,] 2 2 7 5 5
idx <- seq(2, nrow(m)-2, 4)
do.call(rbind, lapply(idx, function(x) {
rbind(m[x+1, ]-m[x, ], m[x+2, ]-m[x, ])
}))
# [1,] 2 4 -3 6 -2
# [2,] NA 1 0 3 2
# [3,] NA NA 1 1 -2
# [4,] -2 NA 2 -1 -2
# [5,] 2 4 7 -7 1
# [6,] 1 -1 5 -3 -1
I have matrix, but want to extend it with the same pattern. Note that it may be extended for any given number of rows and columns, and is not normally square
04/06/2012 11/06/2012 18/06/2012 25/06/2012 02/07/2012
26/03/2012 10 11 12 13 14
02/04/2012 9 10 11 12 13
09/04/2012 8 9 10 11 12
16/04/2012 7 8 9 10 11
23/04/2012 6 7 8 9 10
30/04/2012 5 6 7 8 9
07/05/2012 4 5 6 7 8
14/05/2012 3 4 5 6 7
21/05/2012 2 3 4 5 6
28/05/2012 1 2 3 4 5
I.e. I want to extend it to something like this:
04/06/2012 11/06/2012 18/06/2012 25/06/2012 02/07/2012
26/03/2012 10 11 12 13 14
02/04/2012 9 10 11 12 13
09/04/2012 8 9 10 11 12
16/04/2012 7 8 9 10 11
23/04/2012 6 7 8 9 10
30/04/2012 5 6 7 8 9
07/05/2012 4 5 6 7 8
14/05/2012 3 4 5 6 7
21/05/2012 2 3 4 5 6
28/05/2012 1 2 3 4 5
04/06/2012 0 1 2 3 4
11/06/2012 NA 0 1 2 3
18/06/2012 NA NA 0 1 2
25/06/2012 NA NA NA 0 1
02/07/2012 NA NA NA NA 0
I'm sure there's a clever way to do this with Reduce or something, but this is what came to mind:
lengthOut <- 6 ## Set to one less than the number of columns you want to create
startAt <- 10 ## Set the maximum value of the FIRST column
vapply(c(0, sequence(lengthOut)), function(x) {
x <- (startAt + x):0 # Create a sequence in the normal manner
length(x) <- startAt + lengthOut + 1 # Extend the length of that sequence
x
}, numeric(startAt + lengthOut + 1)) # Specify what to return
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 10 11 12 13 14 15 16
# [2,] 9 10 11 12 13 14 15
# [3,] 8 9 10 11 12 13 14
# [4,] 7 8 9 10 11 12 13
# [5,] 6 7 8 9 10 11 12
# [6,] 5 6 7 8 9 10 11
# [7,] 4 5 6 7 8 9 10
# [8,] 3 4 5 6 7 8 9
# [9,] 2 3 4 5 6 7 8
# [10,] 1 2 3 4 5 6 7
# [11,] 0 1 2 3 4 5 6
# [12,] NA 0 1 2 3 4 5
# [13,] NA NA 0 1 2 3 4
# [14,] NA NA NA 0 1 2 3
# [15,] NA NA NA NA 0 1 2
# [16,] NA NA NA NA NA 0 1
# [17,] NA NA NA NA NA NA 0
Here's another approach
x <- 16:0
matrix(c(sapply(6:1, function(z) rep(lead(x, z))), x), ncol=7)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 10 11 12 13 14 15 16
#[2,] 9 10 11 12 13 14 15
#[3,] 8 9 10 11 12 13 14
#[4,] 7 8 9 10 11 12 13
#[5,] 6 7 8 9 10 11 12
#[6,] 5 6 7 8 9 10 11
#[7,] 4 5 6 7 8 9 10
#[8,] 3 4 5 6 7 8 9
#[9,] 2 3 4 5 6 7 8
#[10,] 1 2 3 4 5 6 7
#[11,] 0 1 2 3 4 5 6
#[12,] NA 0 1 2 3 4 5
#[13,] NA NA 0 1 2 3 4
#[14,] NA NA NA 0 1 2 3
#[15,] NA NA NA NA 0 1 2
#[16,] NA NA NA NA NA 0 1
#[17,] NA NA NA NA NA NA 0
Edit: forgot to mention that I used dplyr::lead
Not sure if this helps:
m1 <- matrix(rep(10:1,each=7)+0:6,ncol=7,byrow=T)
m2 <- matrix(NA,ncol=7,nrow=7)
indx <- 0:6+rep(c(0:-6),each=7)
m2[lower.tri(m2, diag=TRUE)] <- indx[indx>=0]
rbind(m1,t(m2))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 10 11 12 13 14 15 16
# [2,] 9 10 11 12 13 14 15
# [3,] 8 9 10 11 12 13 14
# [4,] 7 8 9 10 11 12 13
# [5,] 6 7 8 9 10 11 12
# [6,] 5 6 7 8 9 10 11
# [7,] 4 5 6 7 8 9 10
# [8,] 3 4 5 6 7 8 9
# [9,] 2 3 4 5 6 7 8
# [10,] 1 2 3 4 5 6 7
# [11,] 0 1 2 3 4 5 6
# [12,] NA 0 1 2 3 4 5
# [13,] NA NA 0 1 2 3 4
# [14,] NA NA NA 0 1 2 3
# [15,] NA NA NA NA 0 1 2
# [16,] NA NA NA NA NA 0 1
# [17,] NA NA NA NA NA NA 0
I have 2 data frames with different numbers of rows (A has 55 and B has 41). I would like to take the Py values from data frame B and put them into A$Py corresponding to the "Link".
I tried
link.list <- A$Link
for(i in 1:length(link.list)){
A$Py[i] <- B[which(B$Link==link.list[i]), "Py"]
}
But get:
Error in A$Py[i] <- B[which(B$Link == link.list[i]), "Py"] :
replacement has length zero
I assume this error is triggered when there is a A$Link that is not in B. Any ideas solving this problem?
Thanks
data frame A:
Link VU Py
1 DVH1-1 1 NA
2 DVH1-10 9 NA
3 DVH1-2 1 NA
4 DVH1-3 1 NA
5 DVH1-4 9 NA
6 DVH1-5 9 NA
7 DVH1-6 1 NA
8 DVH1-7 1 NA
9 DVH1-8 10 NA
10 DVH1-9 10 NA
11 DVH2-1 2 NA
12 DVH2-2 1 NA
13 DVH2-3 9 NA
14 DVH2-4 9 NA
15 DVH2-5 10 NA
16 DVH2-6 9 NA
17 DVH2-7 4 NA
18 DVH2-8 9 NA
19 DVH3-1 1 NA
20 DVH3-2 12 NA
21 DVH3-3 12 NA
22 DWH1-1 4 NA
23 DWH1-10 8 NA
24 DWH1-2 4 NA
25 DWH1-3 4 NA
26 DWH1-4 8 NA
27 DWH1-5 8 NA
28 DWH1-6 4 NA
29 DWH1-7 4 NA
30 DWH1-8 9 NA
31 DWH1-9 9 NA
32 DWH2-1 4 NA
33 DWH2-2 4 NA
34 DWH2-3 8 NA
35 DWH2-4 8 NA
36 DWH2-5 8 NA
37 DWH2-6 8 NA
38 DWH2-7 7 NA
39 DWH2-8 5 NA
40 DWH3-1 3 NA
41 DWH3-2 49 NA
42 DWH3-3 0 NA
43 MH1-1 0 NA
44 MH1-2 1 NA
45 MH1-3 1 NA
46 MH1-4 1 NA
47 MH1-5 1 NA
48 UH1-1 17 NA
49 UH1-2 17 NA
50 UH1-3 17 NA
51 UH1-4 19 NA
52 UH2-1 4 NA
53 UH2-2 15 NA
54 UH3-1 24 NA
55 UH3-2 25 NA
data frame B:
Link Py
1 DVH1-1 0
2 DVH1-10 4
3 DVH1-2 0
4 DVH1-3 14
5 DVH1-4 0
6 DVH1-5 2
7 DVH1-6 12
8 DVH1-7 11
9 DVH1-8 9
10 DVH1-9 9
11 DVH2-1 0
12 DVH2-2 14
13 DVH2-3 3
14 DVH2-4 0
15 DVH2-5 10
16 DVH2-6 0
17 DVH2-7 2
18 DVH2-8 4
19 DVH3-1 16
20 DVH3-3 8
21 DWH1-1 6
22 DWH1-10 2
23 DWH1-2 0
24 DWH1-3 7
25 DWH1-5 0
26 DWH1-6 12
27 DWH1-7 10
28 DWH1-8 0
29 DWH1-9 3
30 DWH2-1 0
31 DWH2-2 10
32 DWH2-7 0
33 DWH2-8 9
34 DWH3-1 0
35 DWH3-2 0
36 MH1-1 0
37 UH1-3 6
38 UH1-4 4
39 UH2-1 0
40 UH2-2 9
41 UH3-2 4
Use merge and merge by Link, all.x will return all rows for x (in your case x= A).
I've only passed the first two columns of A, as A$pY in your example were all NA
merge(A[,1:2],B,by='Link', all.x = TRUE)
> head(a)
X Link VU Py
1 1 DVH1-1 1 NA
2 2 DVH1-10 9 NA
3 3 DVH1-2 1 NA
4 4 DVH1-3 1 NA
5 5 DVH1-4 9 NA
6 6 DVH1-5 9 NA
> head(b)
X Link Py
1 1 DVH1-1 0
2 2 DVH1-10 4
3 3 DVH1-2 0
4 4 DVH1-3 14
5 5 DVH1-4 0
6 6 DVH1-5 2
a[a$Link %in% b$Link,5]<-b[a$Link %in% b$Link,3]
names(a)[5]<-"Py1"
> head(a)
X Link VU Py Py1
1 1 DVH1-1 1 NA 0
2 2 DVH1-10 9 NA 4
3 3 DVH1-2 1 NA 0
4 4 DVH1-3 1 NA 14
5 5 DVH1-4 9 NA 0
6 6 DVH1-5 9 NA 2
Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.
Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.