Related
I want to create a sequence of numbers like this:
X=22+1
Y=x+2
Z=x+3
A=x+4
B=X+5
1,2,X,3,4,Y,5,6,Z,7,8,A,10,11,B #and so on...
1,2,23,3,4,25,5,6,26,7,8,27,10,11,28 #and so on...
How do this with R? there's a function to do this?
We can do
unlist(Map(c, split(v1, as.integer(gl(length(v1), 2,
length(v1)))), c(X, Y, Z, A, B)), use.names = FALSE)
#[1] 1 2 23 3 4 25 5 6 26 7 8 27 9 10 28
data
v1 <- 1:10
X <- 23
Y <- X + 2
Z <- X + 3
A <- X + 4
B <- X + 5
You can create a duplicated record at specific position and replace them with another sequence.
seq1 <- 1:10
seq2 <- c(23, 25:28)
seq3 <- sort(c(seq1, seq(2, 10, 2)))
seq3[duplicated(seq3)] <- seq2
seq3
#[1] 1 2 23 3 4 25 5 6 26 7 8 27 9 10 28
I'm looking for a way to identify a growing season which consists of a number of days greater than say 60 between the last frost day of spring and the first frost day in the fall. A general version of this problem is this. If I have a vector of numbers like testVec, I want the item numbers of the beginning and end range of values where the number of items is 5 or greater and all of them are greater than 0.
testVec <- c(1,3,4,0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
In this example, the relevant range is 1,3,4,6,7,5,9 which is testVec[9] to testVec[15]
One option could be:
testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]
[1] 1 3 4 6 7 5 9
Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.
1) rleid This also handles any number of sequences including zero. rleid(ok) is a vector the same length as ok such that the first run of identical elements is replaced with 1, the second run with 2 and so on. The result is a list of vectors where each vector has its positions in the original input as its names.
library(data.table)
getSeq <- function(x) {
names(x) <- seq_along(x)
ok <- x > 0
s <- split(x[ok], rleid(ok)[ok])
unname(s)[lengths(s) >= 5]
}
getSeq(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
getSeq(numeric(16))
## list()
getSeq(c(testVec, 10 * testVec))
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
##
## [[2]]
## 25 26 27 28 29 30 31
## 10 30 40 60 70 50 90
If a data frame were desired then following gives the values and which sequence the row came from. The row names indicate the positions in the original input.
gs <- getSeq(c(testVec, 10 * testVec))
names(gs) <- seq_along(gs)
if (length(gs)) stack(gs) else gs
## values ind
## 9 1 1
## 10 3 1
## 11 4 1
## 12 6 1
## 13 7 1
## 14 5 1
## 15 9 1
## 25 10 2
## 26 30 2
## 27 40 2
## 28 60 2
## 29 70 2
## 30 50 2
## 31 90 2
2) gregexpr Replace each element that is > 0 with 1 and each other element with 0 pasting the 0's and 1's into a single character string. Then use gregexpr to look for sequences of 1's at least 5 long and for the ith such nonoverlapping sequence return the first positions, g, and lengths, attr(g, "match.length"). Define a function vals which extracts the values at the required positions from testVec of the ith such nonoverlapping sequence returning a list such that the ith component of the list is the ith such sequence. The names in the output vector are its positions in the input.
getSeq2 <- function(x) {
g <- gregexpr("1{5,}", paste(+(x > 0), collapse = ""))[[1]]
vals <- function(i) {
ix <- seq(g[i], length = attr(g, "match.length")[i])
setNames(x[ix], ix)
}
if (length(g) == 1 && g == -1) list() else lapply(seq_along(g), vals)
}
getSeq2(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
The above handles any number of sequences including 0 but if we knew there were exactly one sequence (which is the case for the example in the question) then it could be simplified to the following where the return value is just that vector:
g <- gregexpr("1{5,}", paste(+(testVec > 0), collapse = ""))[[1]]
ix <- seq(g, length = attr(g, "match.length"))
setNames(testVec[ix], ix)
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
You could "fix" #tmfmnk's solution like this:
f1 <- function(x, threshold, n) {
range(which(with(rle(x > threshold), rep(lengths * values >= n, lengths))))
}
x <- c(1, 3, 4, 0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
f1(x, 0, 5)
#[1] 9 15
But that does not work well when there are multiple runs
xx <- c(x, x)
f1(xx, 0, 5)
#[1] 9 31
Here is another, not so concise approach that returns the start and end of the longest run (the first one if there are ties).
f2 <- function(x, threshold, n) {
y <- x > threshold
y[is.na(y)] <- FALSE
a <- ave(y, cumsum(!y), FUN=cumsum)
m <- max(a)
if (m < n) return (c(NA, NA))
i <- which(a == m)[1]
c(i-m+1, i)
}
f2(x, 0, 5)
#[1] 9 15
f2(xx, 0, 5)
#[1] 9 15
or with rle
f3 <- function(x, threshold, n) {
y <- x > threshold
r <- rle(y)
m <- max(r$lengths)
if (m < n) return (c(NA, NA))
i <- sum(r$lengths[1:which.max(r$lengths)[1]])
c(i-max(r$lengths)+1, i)
}
f3(x, 0, 5)
#[1] 9 15
f3(xx, 0, 5)
#[1] 9 15
If you wanted the first run that is at least n, that is you do not want a next run, even if it is longer, you could do
f4 <- function(x, threshold, n) {
y <- with(rle(x > threshold), rep(lengths * values >= n, lengths))
i <- which(y)[1]
j <- i + which(!y[-c(1:i)])[1] - 1
c(i, j)
}
df <- data.frame(x = seq(1:10))
I want this:
df$y <- c(1, 2, 3, 4, 5, 15, 20 , 25, 30, 35)
i.e. each y is the sum of previous five x values. This implies the first
five y will be same as x
What I get is this:
df$y1 <- c(df$x[1:4], RcppRoll::roll_sum(df$x, 5))
x y y1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 15
6 15 20
7 20 25
8 25 30
9 30 35
10 35 40
In summary, I need y but I am only able to achieve y1
1) enhanced sum function Define a function Sum which sums its first 5 values if it receives 6 values and returns the last value otherwise. Then use it with partial=TRUE in rollapplyr:
Sum <- function(x) if (length(x) < 6) tail(x, 1) else sum(head(x, -1))
rollapplyr(x, 6, Sum, partial = TRUE)
## [1] 1 2 3 4 5 15 20 25 30 35
2) sum 6 and subtract off original Another possibility is to take the running sum of 6 elements filling in the first 5 elements with NA and subtracting off the original vector. Finally fill in the first 5.
replace(rollsumr(x, 6, fill = NA) - x, 1:5, head(x, 5))
## [1] 1 2 3 4 5 15 20 25 30 35
3) specify offsets A third possibility is to use the offset form of width to specify the prior 5 elements:
c(head(x, 5), rollapplyr(x, list(-(1:5)), sum))
## [1] 1 2 3 4 5 15 20 25 30 35
4) alternative specification of offsets In this alternative we specify an offset of 0 for each of the first 5 elements and offsets of -(1:5) for the rest.
width <- replace(rep(list(-(1:5)), length(x)), 1:5, list(0))
rollapply(x, width, sum)
## [1] 1 2 3 4 5 15 20 25 30 35
Note
The scheme for filling in the first 5 elements seems quite unusual and you might consider using partial sums for the first 5 with NA or 0 for the first one since there are no prior elements fir that one:
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = NA)
## [1] NA 1 3 6 10 15 20 25 30 35
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = 0)
## [1] 0 1 3 6 10 15 20 25 30 35
rollapplyr(x, 6, sum, partial = TRUE) - x
## [1] 0 1 3 6 10 15 20 25 30 35
A simple approach would be:
df <- data.frame(x = seq(1:10))
mysum <- function(x, k = 5) {
res <- rep(NA, length(x))
for (i in seq_along(x)) {
if (i <= k) { # edited ;-)
res[i] <- x[i]
} else {
res[i] <- sum(x[(i-k):(i-1)])
}
}
res
}
mysum(df$x)
# [1] 1 2 3 4 5 15 20 25 30 35
mysum <- function(x, k = 5) {
res <- x[1:k]
append<-sapply(2:(len(x)+1-k),function(i) sum(x[i:(i+k-1)]))
return(c(res,append))
}
mysum(df$x)
now I have a lot of matrices with the different number of rows. And I want to sum the odd-number rows and even number rows element respectivelylike below:
o <- matrix(rep(c(1,2,3,4,5,6),6),ncol = 6)
o2 <- matrix(rep(c(1,2,3,4,5,6),12),ncol = 6)
#I want to sum the odd-number rows and even number rows element respectively
i=1
kg <- NULL
while(i <= 2){
op<-unlist(Map(sum,o[i,],o[i+2,],o[i+4,]))
kg <- c(kg,op)
i=i+1
}
i=1
kg2 <- NULL
while(i <= 2){
op2<-unlist(Map(sum,o2[i,],o2[i+2,],o2[i+4,],o2[i+6],o2[i+8],o2[i+10]))
kg2 <- c(kg2,op2)
i=i+1
}
kg
kg2 #the result should be a vector sequence like kg and kg2
> kg2
[1] 18 18 18 18 18 18 24 24 24 24 24 24
It is what I can do know. But my data have a lot of different length of columns. Is that any method I can do it quickly?
And how can I generate a sring like "o2[i,],o2[i+2,],o2[i+4,],o2[i+6],o2[i+8],o2[i+10])" automatically according to the input number? Thank you for your help :)
Perhaps something like this?
o <- matrix(rep(c(1,2,3,4,5,6),6),ncol = 6)
o2 <- matrix(rep(c(1,2,3,4,5,6),12),ncol = 6)
even <- function(x) 2 * seq(1, nrow(x) / 2);
odd <- function(x) 2 * seq(1, nrow(x) / 2) - 1;
colSums(o[even(o), ]);
#[1] 12 12 12 12 12 12
colSums(o[odd(o), ]);
#[1] 9 9 9 9 9 9
colSums(o2[even(o2), ]);
#[1] 24 24 24 24 24 24
colSums(o2[odd(o2), ]);
#[1] 18 18 18 18 18 18
Explanation: even/odd return even/odd row indices of a matrix/data.frame; we can then use colSums to sum entries by column.
Update
To sum entries from rows 3, 6, 9, 12 (or any other sequence) you just need to define a corresponding function, e.g.
another_seq <- function(x) 3 * seq(1, nrow(x) / 3)
colSums(o2[another_seq(o2), ]);
#[1] 18 18 18 18 18 18
In the OP's loop, if we want to change the Map to make it more automatic
unlist(do.call(Map, c(f = sum, as.data.frame(t(o2[seq(i, i+10, by = 2),])))))
Using the full code
o <- matrix(rep(c(1,2,3,4,5,6),6),ncol = 6)
o2 <- matrix(rep(c(1,2,3,4,5,6),12),ncol = 6)
#I want to sum the odd-number rows and even number rows
i=1
kg <- NULL
while(i <= 2){
#op<-unlist(Map(sum,o[i,],o[i+2,],o[i+4,]))
op <- unlist(do.call(Map, c(f = sum,
as.data.frame(t(o[seq(i, i+4, by = 2),]))))) # change here
kg <- c(kg,op)
i=i+1
}
i=1
kg2 <- NULL
while(i <= 2){
#op2<-unlist(Map(sum,o2[i,],o2[i+2,],o2[i+4,],o2[i+6],o2[i+8],o2[i+10]))
op2 <- unlist(do.call(Map, c(f = sum,
s.data.frame(t(o2[seq(i, i+10, by = 2),]))))) # change here
kg2 <- c(kg2,op2)
i=i+1
}
kg
#[1] 9 9 9 9 9 9 12 12 12 12 12 12
kg2
#[1] 18 18 18 18 18 18 24 24 24 24 24 24
In the OP's code, if we analyze the individual arguments of Map with just two arguments i.e. the first and 3rd row of 'o'
i <- 1
Map(function(x, y) c(x, y), o[i,], o[i+2,])
#[[1]]
#[1] 1 3
#[[2]]
#[1] 1 3
#[[3]]
#[1] 1 3
#[[4]]
#[1] 1 3
#[[5]]
#[1] 1 3
#[[6]]
#[1] 1 3
Here, each element of the list is the column values concatenated (c). If we need to get a similar structure, by subsetting the odd rows, we transpose the subset of rows, convert it to data.frame, so that each individual block is a column (that corresponds to the original rows subsetted)
do.call(Map, c(f=c, as.data.frame(t(o[c(i, i+2),]))))
#[[1]]
#V1 V2
# 1 3
#[[2]]
#V1 V2
# 1 3
#[[3]]
#V1 V2
# 1 3
#[[4]]
#V1 V2
# 1 3
#[[5]]
#V1 V2
# 1 3
#[[6]]
#V1 V2
# 1 3
Keeping it as a matrix will not solve it as it take the whole matrix as a single cell (a matrix is a vector with dimension attribute)
do.call(Map, c(f=c, o[c(i, i+2),]))
#[[1]]
#[1] 1 3 1 3 1 3 1 3 1 3 1 3
while using Map directly will loop through each element of the matrix (vector) instead of each column
Map(c, o[c(i, i+2),]) # check the output
Another option would be to split the object by col and then do the sum
onew <- o[seq(i, i+4, by = 2),]
Map(sum, split(onew, col(onew)))
The above approach is loopy, but we can also use vectorized approach (just like in the #Maurits Evers post). Instead of seq, here we are using the recycling of logical vector to subset the rows and then do the colSums
i1 <- c(TRUE, FALSE)
colSums(cbind(o[i1,], o[!i1,]))
#[1] 9 9 9 9 9 9 12 12 12 12 12 12
colSums(cbind(o2[i1,], o2[!i1,]))
#[1] 18 18 18 18 18 18 24 24 24 24 24 24
Given two columns (perhaps from a data frame) of equal length N, how can I produce a column of length 2N with the odd entries from the first column and the even entries from the second column?
Suppose I have the following data frame
df.1 <- data.frame(X = LETTERS[1:10], Y = 2*(1:10)-1, Z = 2*(1:10))
How can I produce this data frame df.2?
i <- 1
j <- 0
XX <- NA
while (i <= 10){
XX[i+j] <- LETTERS[i]
XX[i+j+1]<- LETTERS[i]
i <- i+1
j <- i-1
}
df.2 <- data.frame(X.X = XX, Y.Z = c(1:20))
ggplot2 has an unexported function interleave which does this.
Whilst unexported it does have a help page (?ggplot2:::interleave)
with(df.1, ggplot2:::interleave(Y,Z))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
If I understand you right, you want to create a new vector twice the length of the vectors X, Y and Z in your data frame and then want all the elements of X to occupy the odd indices of this new vector and all the elements of Y the even indices. If so, then the code below should do the trick:
foo<-vector(length=2*nrow(df.1), mode='character')
foo[seq(from = 1, to = 2*length(df.1$X), by=2)]<-as.character(df.1$X)
foo[seq(from = 2, to = 2*length(df.1$X), by=2)]<-df.1$Y
Note, I first create an empty vector foo of length 20, then fill it in with elements of df.1$X and df.1$Y.
Cheers,
Danny
You can use melt from reshape2:
library(reshape2)
foo <- melt(df.1, id.vars='X')
> foo
X variable value
1 A Y 1
2 B Y 3
3 C Y 5
4 D Y 7
5 E Y 9
6 F Y 11
7 G Y 13
8 H Y 15
9 I Y 17
10 J Y 19
11 A Z 2
12 B Z 4
13 C Z 6
14 D Z 8
15 E Z 10
16 F Z 12
17 G Z 14
18 H Z 16
19 I Z 18
20 J Z 20
Then you can sort and pick the columns you want:
foo[order(foo$X), c('X', 'value')]
Another solution using base R.
First index the character vector of the data.frame using the vector [1,1,2,2 ... 10,10] and store as X.X. Next, rbind the data.frame vectors Y & Z effectively "zipping" them and store in Y.X.
> res <- data.frame(
+ X.X = df.1$X[c(rbind(1:10, 1:10))],
+ Y.Z = c(rbind(df.1$Y, df.1$Z))
+ )
> head(res)
X.X Y.Z
1 A 1
2 A 2
3 B 3
4 B 4
5 C 5
6 C 6
A one two liner in base R:
test <- data.frame(X.X=df.1$X,Y.Z=unlist(df.1[c("Y","Z")]))
test[order(test$X.X),]
Assuming that you want what you asked for in the first paragraph, and the rest of what you posted is your attempt at solving it.
a=df.1[df.1$Y%%2>0,1:2]
b=df.1[df.1$Z%%2==0,c(1,3)]
names(a)=c("X.X","Y.Z")
names(b)=names(a)
df.2=rbind(a, b)
If you want to group them by X.X as shown in your example, you can do:
library(plyr)
arrange(df.2, X.X)