I have a dataframe containing columns with 0's and 1's. I want to mutate the last sequence of 1's into zeros like this:
# data
a <- c(0,1,1,0,1,1,1)
b <- c(0,1,1,1,0,1,1)
c <- data.frame(cbind(a,b))
head(c,7)
# desired output
a_desired <- c(0,1,1,0,0,0,0)
b_desired <- c(0,1,1,1,0,0,0)
c_desired <- data.frame(cbind(a_desired,b_desired))
head(c_desired,7)
such that I end up with the same sequence except that the last sequence of 1's has been mutated into 0's. I've tried using tail() but haven't found a solution so far
You may try using rle
apply(c, 2, function(x){
y <- max(which(rle(x == 1)$values))
x[(sum(rle(x == 1)$lengths[1:(y-1)]) + 1): sum(rle(x == 1)$lengths[1:y])] <- 0
x
})
a b
[1,] 0 0
[2,] 1 1
[3,] 1 1
[4,] 0 1
[5,] 0 0
[6,] 0 0
[7,] 0 0
purrr::map variant
library(purrr)
map(c, function(x){
last1 <- max(which(x == 1))
last0 <- which(x[1:last1] == 0)
c(x[seq_len(max(last0))], rep(0, length(x) - max(last0)))
})
You can try a combination of cumsum of x == 0 and replace the values where this is equal to max.
sapply(c, function(x) {
. <- cumsum(diff(c(0,x)==1)==1)
`[<-`(x, . == max(.), 0L)
#replace(x, . == max(.), 0L) #Alternaive to [<-
})
# a b
#[1,] 0 0
#[2,] 1 1
#[3,] 1 1
#[4,] 0 1
#[5,] 0 0
#[6,] 0 0
#[7,] 0 0
Or the same but written i a different way (thanks to #thelatemail
)
sapply(c, function(x) {
cs <- cumsum(diff(c(0,x)==1)==1)
x[cs == max(cs)] <- 0L
x
})
Or another variant iterating from the last element to the beginning until 0 is found.
sapply(c, function(x) {
n <- length(x)
i <- n
while(x[i] != 1 & i>1L) i <- i-1L
while(x[i] != 0 & i>1L) i <- i-1L
x[i:n] <- 0L
x
})
You can write your own function:
fun <- function(x){
y <- rle(x)
y$values[length(y$values)] <- 0
inverse.rle(y)
}
Now run:
data.frame(sapply(c, fun))
a b
1 0 0
2 1 1
3 1 1
4 0 1
5 0 0
6 0 0
7 0 0
If you sequences always end with 1s, you can try (given df <- data.frame(a,b))
> df * sapply(df, function(x) rev(cumsum(rev(x != 1)) != 0))
a b
1 0 0
2 1 1
3 1 1
4 0 1
5 0 0
6 0 0
7 0 0
Related
I have a binary vector x=(1,0,0,1). lower-order terms including itself of this vectors are assumed (0,0,0,0), (0,0,0,1), (1,0,0,0) and (1,0,0,1). How do I find this lower-order vectors in R.
What I understand so far: basically we want o find subsets, replace each 1 by 0. But to do it in R? I am clueless?
here what I tried so far.
a<-c(1,0,0,1)
M<-length(a)
for(i in 1:M){
ifelse(a[i]==1, a[i]<-0, next)
print(a)
}
[1] 0 0 0 1
[1] 0 0 0 0
what I am looking for in detail: for example, I have 4 factors A,B,C,D. Here (1,0,0,1) means AD.
Now I want a subset of (1,0,0,1) that means AD. In my subsets, I can not have B and C. Result will be {} {A} {D} {AD} in binary form (0,0,0,0), (1,0,0,0),(0,0,0,1),(1,0,0,1).
Here's a method relying on expand.grid to do the heavy lifting:
vecs = lapply(a, seq, 0) # keep 0s as 0, make 1s c(1, 0)
do.call(expand.grid, vecs) # generate all combinations
# Var1 Var2 Var3 Var4
# 1 1 0 0 1
# 2 0 0 0 1
# 3 1 0 0 0
# 4 0 0 0 0
Using RcppAlgos::permuteGeneral.
library(RcppAlgos)
A <- t(apply(permuteGeneral(length(a), sum(a)), 1, function(x) {a[x] <- 0; a}))
A[!duplicated(A), ]
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 1
# [2,] 0 0 0 0
# [3,] 1 0 0 1
# [4,] 1 0 0 0
We can use the which, combn, and *apply functions to perform this operation. Since this is a step-by-step operation, it may be helpful to look at the results line-by-line.
Here it is wrapped in a function called find_binary_subsets:
find_binary_subsets <- function(x){
# where does x equal 1
x_eq_1 <- which(x == 1)
# combinations of indexes where x == 1
l_w_x <- lapply(length(x_eq_1):1,
FUN = function(l) combn(x_eq_1, l))
# loop over the combinations of indexes where x == 1, replace by 0, return vector
# apply(., 2) loops over the columns of a matrix, which is what we want
combs <- lapply(l_w_x,
FUN = function(d)
apply(d, 2, FUN = function(i){x[i] <- 0; x}))
# cbind results, then transpose to arrange by row
t(cbind(do.call("cbind", combs), x))
}
find_binary_subsets(a)
[,1] [,2] [,3] [,4]
0 0 0 0
0 0 0 1
1 0 0 0
x 1 0 0 1
I would like the find the pattern of either a 0/1 followed by a 2 which occurs more than three times in a row. I would like to find this pattern and transform the 2's in this pattern into 1s - such as
Input:
Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1)
Some Function findPattern that finds the pattern:
findPattern(Y)
And Outputs the following:
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
I have tried the following:
as.numeric(Y == 2 & lead(Y) %in% 1:2)
1. Find 0/1 followed by 2s
findPattern<-function(Y){
as.numeric(Y==2 & (c(NA,Y[-length(Y)])==0 |c(NA,Y[-length(Y)])==1 ))
}
I add a NA a the start and remove last item so that you "shift" your vector by 1 position but still keep same vector length. This way you avoid for loops.
If you want to use %in% which avoids a second passage:
findPattern<-function(Y){
as.numeric(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1))
}
2. Select only those that have at least three 1s every other position
findPattern<-function(Y){
w <- which(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1)))
centers<- w[((w - 2) %in% w) & ((w+2) %in% w)]
result<-rep(0, times = length(Y))
result[c(centers,centers-2,centers+2)]<-1
return(result)
}
Testing:
findPattern(c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1))
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
Here is a possible approach to solve the problem where you can combine with the regular expression to find the pattern.
Starting vector:
> Y
[1] 0 2 0 3 2 5 2 1 2 0 2 1 2 0 1
1) Find out all the 2s preceded by 0 or 1;
> ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
> ind
[1] 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
2) Paste the resulting vector into a string and use regular expression to find out the location and length of the required pattern, i.e., alternating 0 and 1 equal or more than three times;
> id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
> id
[[1]]
[1] 8
attr(,"match.length")
[1] 6
attr(,"useBytes")
[1] TRUE
3) Extracting the location and length from the regular expression result and convert them into the index pattern;
> start <- as.numeric(id[[1]])
> end <- start + attr(id[[1]], "match.length") - 1
> indArray <- unlist(Map(`:`, start, end))
> indArray
[1] 8 9 10 11 12 13
4) Assign all the values at 01 pattern less than 3 times to 0
> ind[-indArray] <- 0
> ind
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0
Wrap them into a function:
library(dplyr)
findPattern <- function(Y) {
ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
start <- as.numeric(id[[1]])
end <- start + attr(id[[1]], "match.length") - 1
indArray <- unlist(Map(`:`, start, end))
ind[-indArray] <- 0
ind
}
Using stringi package
Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1)
matchVec = stri_count(Y,fixed=2)
remapVec = as.integer(matchVec & (cumsum(matchVec)>=3))
remapVec
#[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0
I am trying to recode a set of data that cannot be easily done with the split function or ifelse function. How would I recode the following data?
1 --> 1
2 --> 0
3 --> 0
4 --> 1
5 --> 1
7 --> 0
8 --> 1
Thank you for your time!
Another approach:
x <- +(x %in% c(1,4,5,8))
#[1] 1 0 0 1 1 0 1
The +(..) nomenclature is a method to coerce a logical vector to integer the same way that as.integer(..) would.
You could try:
library(car)
v <- c(1,2,3,4,5,6,7,8)
recode(v, "c(1,4,5,8) = 1; else = 0")
Or as per mentioned by #zx8754 you could use ifelse():
ifelse(v %in% c(1,4,5,8), 1, 0)
Which gives:
#[1] 1 0 0 1 1 0 0 1
Maybe try this? Although continuous-to-continuous usually implies some type of function that could be applied.
x <- c(1:5, 7:8)
x
# [1] 1 2 3 4 5 7 8
x[x == 1] <- 1
x[x == 2] <- 0
x[x == 3] <- 0
x[x == 4] <- 1
x[x == 5] <- 1
x[x == 7] <- 0
x[x == 8] <- 1
x
# [1] 1 0 0 1 1 0 1
I am working with a matrix containing a large number of NA. I would like to record the length of each sequence of NA in a new matrix.
The following example should be more plain.
#Generating a random 5x5 population matrix with 15 NA
M=matrix(sample(1:9,25,T),5)
M[sample(1:length(M),15,F)]=NA
dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
M
Year1 Year2 Year3 Year4 Year5
City1 2 NA NA NA NA
City2 NA NA NA 6 8
City3 1 NA NA 6 NA
City4 NA 5 NA NA 1
City5 8 NA 1 NA 2
The desired output is the following. e.g. 4 4 4 4 denotes a sequence of 4 consecutive NA.
Year1 Year2 Year3 Year4 Year5
City1 0 4 4 4 4
City2 3 3 3 0 0
City3 0 2 2 0 1
City4 1 0 2 2 0
City5 0 1 0 1 0
Do you have an idea of how I could go about that?
Not the most efficient code ever:
r1=c(1,1,NA,1,1)
r2=c(1,NA,NA,1,1)
r3=c(1,NA,NA,NA,1)
r4=c(NA,NA,1,1,1)
r5=c(1,1,1,NA,NA)
M=rbind(r1,r2,r3,r4,r5)
like #Pascal pointed out, your approach will convert the entire matrix to characters, so you can assign the 1s to 0s instead and do this:
M[M == 1] <- 0
(xx <- t(apply(M, 1, function(x) {
s <- sum(is.na(x))
if (is.na(x[1])) x[is.na(x)] <- rep(4, s) else
if (is.na(tail(x, 1))) x[is.na(x)] <- rep(5, s) else
x[is.na(x)] <- s
x
})))
# [,1] [,2] [,3] [,4] [,5]
# r1 0 0 1 0 0
# r2 0 2 2 0 0
# r3 0 3 3 3 0
# r4 4 4 0 0 0
# r5 0 0 0 5 5
This is your desired output. If you don't believe me, convert the 0s back to 1s and assign the letters based on the integers
xx[xx > 0] <- letters[xx[xx > 0]]
xx[xx == '0'] <- 1
r1=c(1,1,"a",1,1)
r2=c(1,"b","b",1,1)
r3=c(1,"c","c","c",1)
r4=c("d","d",1,1,1)
r5=c(1,1,1,"e","e")
R=rbind(r1,r2,r3,r4,r5)
identical(R, xx)
# [1] TRUE
This is another basis for a function that would be applied over each row. I tried, but couldn't avoid a for loop:
x = c(1,NA,1,NA,NA,1,NA,NA,NA,1,NA,NA,NA,NA)
#Find the Start and End of each sequence of NA's (Vectorized)
(start <- is.na(x) * c(T,!is.na(x[-length(x)])))
#> [1] 0 1 0 1 0 0 1 0 0 0 1 0 0 0
(end <- is.na(x) * c(!is.na(x[-1]),T))
#> [1] 0 1 0 0 1 0 0 0 1 0 0 0 0 1
# The difference betweeen the start and end of the sequence +1 is the sequence length
wStart <- which(!!start)
wEnd <- which(!!end)
sequenceLength <- wEnd[i] - wStart[i] + 1
# replace the sequence of NA's with it's class
for(i in seq_along(wStart))
x[`:`(wStart[i],wEnd[i])] <- letters[sequenceLength]
x
#> [1] "1" "a" "1" "b" "b" "1" "c" "c" "c" "1" "d" "d" "d" "d"
as in:
(xx <- t(apply(M, 1, function(x) {
wStart <- which(!!(is.na(x) * c(T,!is.na(x[-length(x)]))))
wEnd <- which(!!is.na(x) * c(!is.na(x[-1]),T))
sequenceLength <-
for(i in seq_along(wStart))
x[`:`(wStart[i],wEnd[i])] <- letters[wEnd[i] - wStart[i] + 1]
return(x)
})))
I have a list of dataframes with some overlapping columns in each. The number of dataframes in the list is unknown. How can I efficiently, in base, rbind the dataframes together and fill in non overlapping columns with zeros?
Example data:
x <- data.frame(a=1:2, b=1:2, c=1:2)
y <- data.frame(a=1:2, r=1:2, f=1:2)
z <- data.frame(b=1:3, c=1:3, v=1:3, t=c("A", "A", "D"))
L1 <- list(x, y, z)
Desired output:
a b c f r t v
1 1 1 1 0 0 0 0
2 2 2 2 0 0 0 0
3 1 0 0 1 1 0 0
4 2 0 0 2 2 0 0
5 0 1 1 0 0 A 1
6 0 2 2 0 0 A 2
7 0 3 3 0 0 D 3
Pad out each data frame with the missing columns, then rbind them:
allnames <- unique(unlist(lapply(L1, names)))
do.call(rbind, lapply(L1, function(df) {
not <- allnames[!allnames %in% names(df)]
df[, not] <- 0
df
}))
I have an old (and probably inefficient) function that does this. I've made one modification here to allow the fill to be specified.
RBIND <- function(datalist, keep.rownames = TRUE, fill = NA) {
Len <- sapply(datalist, ncol)
if (all(diff(Len) == 0)) {
temp <- names(datalist[[1]])
if (all(sapply(datalist, function(x) names(x) %in% temp))) tryme <- "basic"
else tryme <- "complex"
}
else tryme <- "complex"
almost <- switch(
tryme,
basic = { do.call("rbind", datalist) },
complex = {
Names <- unique(unlist(lapply(datalist, names)))
NROWS <- c(0, cumsum(sapply(datalist, nrow)))
NROWS <- paste(NROWS[-length(NROWS)]+1, NROWS[-1], sep=":")
out <- lapply(1:length(datalist), function(x) {
emptyMat <- matrix(fill, nrow = nrow(datalist[[x]]), ncol = length(Names))
colnames(emptyMat) <- Names
emptyMat[, match(names(datalist[[x]]),
colnames(emptyMat))] <- as.matrix(datalist[[x]])
emptyMat
})
do.call("rbind", out)
})
Final <- as.data.frame(almost, row.names = 1:nrow(almost))
Final <- data.frame(lapply(Final, function(x) type.convert(as.character(x))))
if (isTRUE(keep.rownames)) {
row.names(Final) <- make.unique(unlist(lapply(datalist, row.names)))
}
Final
}
Here it is on your sample data.
RBIND(L1, fill = 0)
# a b c r f v t
# 1 1 1 1 0 0 0 0
# 2 2 2 2 0 0 0 0
# 1.1 1 0 0 1 1 0 0
# 2.1 2 0 0 2 2 0 0
# 1.2 0 1 1 0 0 1 A
# 2.2 0 2 2 0 0 2 A
# 3 0 3 3 0 0 3 D