Recode continuous into continuous variables (without split function) in R - r

I am trying to recode a set of data that cannot be easily done with the split function or ifelse function. How would I recode the following data?
1 --> 1
2 --> 0
3 --> 0
4 --> 1
5 --> 1
7 --> 0
8 --> 1
Thank you for your time!

Another approach:
x <- +(x %in% c(1,4,5,8))
#[1] 1 0 0 1 1 0 1
The +(..) nomenclature is a method to coerce a logical vector to integer the same way that as.integer(..) would.

You could try:
library(car)
v <- c(1,2,3,4,5,6,7,8)
recode(v, "c(1,4,5,8) = 1; else = 0")
Or as per mentioned by #zx8754 you could use ifelse():
ifelse(v %in% c(1,4,5,8), 1, 0)
Which gives:
#[1] 1 0 0 1 1 0 0 1

Maybe try this? Although continuous-to-continuous usually implies some type of function that could be applied.
x <- c(1:5, 7:8)
x
# [1] 1 2 3 4 5 7 8
x[x == 1] <- 1
x[x == 2] <- 0
x[x == 3] <- 0
x[x == 4] <- 1
x[x == 5] <- 1
x[x == 7] <- 0
x[x == 8] <- 1
x
# [1] 1 0 0 1 1 0 1

Related

R: Mutate last sequence of specific values

I have a dataframe containing columns with 0's and 1's. I want to mutate the last sequence of 1's into zeros like this:
# data
a <- c(0,1,1,0,1,1,1)
b <- c(0,1,1,1,0,1,1)
c <- data.frame(cbind(a,b))
head(c,7)
# desired output
a_desired <- c(0,1,1,0,0,0,0)
b_desired <- c(0,1,1,1,0,0,0)
c_desired <- data.frame(cbind(a_desired,b_desired))
head(c_desired,7)
such that I end up with the same sequence except that the last sequence of 1's has been mutated into 0's. I've tried using tail() but haven't found a solution so far
You may try using rle
apply(c, 2, function(x){
y <- max(which(rle(x == 1)$values))
x[(sum(rle(x == 1)$lengths[1:(y-1)]) + 1): sum(rle(x == 1)$lengths[1:y])] <- 0
x
})
a b
[1,] 0 0
[2,] 1 1
[3,] 1 1
[4,] 0 1
[5,] 0 0
[6,] 0 0
[7,] 0 0
purrr::map variant
library(purrr)
map(c, function(x){
last1 <- max(which(x == 1))
last0 <- which(x[1:last1] == 0)
c(x[seq_len(max(last0))], rep(0, length(x) - max(last0)))
})
You can try a combination of cumsum of x == 0 and replace the values where this is equal to max.
sapply(c, function(x) {
. <- cumsum(diff(c(0,x)==1)==1)
`[<-`(x, . == max(.), 0L)
#replace(x, . == max(.), 0L) #Alternaive to [<-
})
# a b
#[1,] 0 0
#[2,] 1 1
#[3,] 1 1
#[4,] 0 1
#[5,] 0 0
#[6,] 0 0
#[7,] 0 0
Or the same but written i a different way (thanks to #thelatemail
)
sapply(c, function(x) {
cs <- cumsum(diff(c(0,x)==1)==1)
x[cs == max(cs)] <- 0L
x
})
Or another variant iterating from the last element to the beginning until 0 is found.
sapply(c, function(x) {
n <- length(x)
i <- n
while(x[i] != 1 & i>1L) i <- i-1L
while(x[i] != 0 & i>1L) i <- i-1L
x[i:n] <- 0L
x
})
You can write your own function:
fun <- function(x){
y <- rle(x)
y$values[length(y$values)] <- 0
inverse.rle(y)
}
Now run:
data.frame(sapply(c, fun))
a b
1 0 0
2 1 1
3 1 1
4 0 1
5 0 0
6 0 0
7 0 0
If you sequences always end with 1s, you can try (given df <- data.frame(a,b))
> df * sapply(df, function(x) rev(cumsum(rev(x != 1)) != 0))
a b
1 0 0
2 1 1
3 1 1
4 0 1
5 0 0
6 0 0
7 0 0

Compare the present value with the previous value in R while in ifelse()

I was working on a project in which if I want to compare the present value with the previous value and return an output 1 if true and 0 if false.
I tried
brv_trx1$'first' <- ifelse(brv_trx1$`Total TRx` != lag(brv_trx1$`Total TRx`),1,0)
This code did not work as expected.
x= c(1,2,2,2,3,4,5,5,5,5,6,7)
I wanted an output similar to this:
x y
1 1
2 1
2 0
2 0
3 1
4 1
5 1
5 0
5 0
After this step I have a decile function
brv_trx1$decvar <- ifelse(brv_trx1$cum != 0 & brv_trx1$first == 1, (11 - ceiling(round((brv_trx1$cum/total) * 10, 4))),
ifelse(brv_trx1$cum != 0 & brv_trx1$first == 0 , lag(brv_trx1$decvar), 0))
For this function, I was getting a lot of NAs.
The output expected was :
Y Dec
1 10
1 10
1 9
0 9
0 9
1 8
0 8
1 8
1 8
Because lag() will produce NA for the first entry, consider the following:
x= c(1,2,2,2,3,4,5,5,5,5,6,7)
x <- as.data.frame(x=x)
x$y <- ifelse( (x$x==lag(x$x)) %in% c(NA, FALSE), 1, 0)
If the comparison of x == lag(x) is FALSE or NA (because it's the first comparison of the lag), flag 1, else flag 0 per your example above.
You can use indexes. Here I've made vectors that go from 1-9, and 2-10. Then you compare the elements of your original vector by using the "shifted by 1" indexes (1 compares to 2, 2 compares to 3, etc).
x <- c(1,2,2,3,4,4,4,5,6,7)
length(x)
#[1] 10
i.1 <- 1:(length(x)-1)
i.2 <- 2:length(x)
x[i.1] == x[i.2]
#[1] FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
Using diff
ifelse(c(1,diff(x))==0,0,1)
[1] 1 1 0 0 1 1 1 0 0 0 1 1

removing columns equal to 0 from multiple data frames in a list; lapply not actually removing columns when applying function to a list

I have a list of three data frames that are similar (same number of columns but different number of rows), and were split from a larger data set.
Here is some example code to make three data frames and put them in a list. It is really hard to make an exact replicate of my data since the files are so large (over 400 columns and the first 6 columns are not numerical)
a <- c(0,1,0,1,0,0,0,0,0,1,0,1)
b <- c(0,0,0,0,0,0,0,0,0,0,0,0)
c <- c(1,0,1,1,1,1,1,1,1,1,0,1)
d <- c(0,0,0,0,0,0,0,0,0,0,0,0)
e <- c(1,1,1,1,0,1,0,1,0,1,1,1)
f <- c(0,0,0,0,0,0,0,0,0,0,0,0)
g <- c(1,0,1,0,1,1,1,1,1,1)
h <- c(0,0,0,0,0,0,0,0,0,0)
i <- c(1,0,0,0,0,0,0,0,0,0)
j <- c(0,0,0,0,1,1,1,1,1,0)
k <- c(0,0,0,0,0)
l <- c(1,0,1,0,1)
m <- c(1,0,1,0,0)
n <- c(0,0,0,0,0)
o <- c(1,0,1,0,1)
df1 <- data.frame(a,b,c,d,e,f)
df2 <- data.frame(g,h,i,j)
df3 <- data.frame(k,l,m,n,o)
my.list <- list(df1,df2,df3)
I am looking to remove all the columns in each data frame whose total == 0. The code is below:
list2 <- lapply(my.list, function(x) {x[, colSums(x) != 0];x})
list2 <- lapply(my.list, function(x) {x[, colSums(x != 0) > 0];x})
Both of the above codes will run, but neither actually remove the columns == 0.
I am not sure why that is, any tips are greatly appreciated
The OP found a solution by exchanging comments with me. But I wanna drop the following. In lapply(my.list, function(x) {x[, colSums(x) != 0];x}), the OP was asking R to do two things. The first thing was subsetting each data frame in my.list. The second thing was showing each data frame. I think he thought that each data frame was updated after subsetting columns. But he was simply asking R to show each data frame as it is in the second command. So R was showing the result for the second command. (On the surface, he did not see any change.) If I follow his way, I would do something like this.
lapply(my.list, function(x) {foo <- x[, colSums(x) != 0]; foo})
He wanted to create a temporary object in the anonymous function and return the object. Alternatively, he wanted to do the following.
lapply(my.list, function(x) x[, colSums(x) != 0])
For each data frame in my.list, run a logical check for each column. If colSums(x) != 0 is TRUE, keep the column. Otherwise remove it. Hope this will help future readers.
[[1]]
a c e
1 0 1 1
2 1 0 1
3 0 1 1
4 1 1 1
5 0 1 0
6 0 1 1
7 0 1 0
8 0 1 1
9 0 1 0
10 1 1 1
11 0 0 1
12 1 1 1
[[2]]
g i j
1 1 1 0
2 0 0 0
3 1 0 0
4 0 0 0
5 1 0 1
6 1 0 1
7 1 0 1
8 1 0 1
9 1 0 1
10 1 0 0
[[3]]
l m o
1 1 1 1
2 0 0 0
3 1 1 1
4 0 0 0
5 1 0 1

R for loop: For all groups of rows with the same value in column, do

I would love some help understanding the syntax needed to do a certain calculation in R.
I have a dataframe like this:
a b c
1 1 0
2 1 1
3 1 0
4 2 0
5 2 0
6 3 1
7 3 0
8 3 0
9 4 0
and I want to create a new column "d" that has a value of 1 if (and only if) any of the values in column "c" equal 1 for each group of rows that have the same value in column "b." Otherwise (see rows 4,5 and 9) column "d" gives 0.
a b c d
1 1 0 1
2 1 1 1
3 1 0 1
4 2 0 0
5 2 0 0
6 3 1 1
7 3 0 1
8 3 0 1
9 4 0 0
Can this be done with a for loop? If so, any advice on how to write that would be greatly appreciated.
Using data.table
setDT(df)
df[, d := as.integer(any(c == 1L)), b]
Since you asked for a loop:
# adding the result col
dat <- data.frame(dat, d = rep(NA, nrow(dat)))
# iterate over group
for(i in unique(dat$b)){
# chek if there is a one for
# each group
if(any(dat$c[dat$b == i] == 1))
dat$d[dat$b == i] <- 1
else
dat$d[dat$b == i] <- 0
}
of course the data.table solutions is more elegant ;)
To do this in base R (using the same general function as the dat.table method any), you can use ave:
df$d <- ave(cbind(df$c), df$b, FUN=function(i) any(i)==1)

Consecutive value after and new level of factor in R

I have the following sample
id <- c("a","b","a","b","a","a","a","a","b","b","c")
SOG <- c(4,4,0,0,0,0,0,0,0,0,9)
data <- data.frame(id,SOG)
I would like in a new column the cumulative value when SOG == 0.
with the following code
tmp <- rle(SOG) #run length encoding:
tmp$values <- tmp$values == 0 #turn values into logicals
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values]) #cumulative sum of TRUE values
inverse.rle(tmp) #inverse the run length encoding
I create the column "stop":
data$Stops <- inverse.rle(tmp)
and I can get in it:
[1] 0 0 1 1 1 1 1 1 1 1 0
But I would like to have instead
[1] 0 0 1 2 3 3 3 3 4 4 0
I mean that when the level of the factor "id" is different from the previous row, I want to jump to the next "stop" (i+1).
have a look a the dplyr package
library(dplyr)
data %>%
mutate(
Stops = ifelse(
SOG > 0,
0,
cumsum(SOG == 0 & lag(id) != id)
)
)
We can try
library(data.table)
setDT(data1)[, v1 := if(all(!SOG)) c(TRUE, id[-1]!= id[-.N]) else
rep(FALSE, .N), .(grp = rleid(SOG))][,cumsum(v1)*(!SOG)]
#[1] 0 0 1 2 3 3 3 3 4 4 0 0 0 0 5 5 0 6 6 0
Using the old data
setDT(data)[, v1 := if(all(!SOG)) c(TRUE, id[-1]!= id[-.N])
else rep(FALSE, .N), .(grp = rleid(SOG))][,cumsum(v1)*(!SOG)]
#[1] 0 0 1 2 3 3 3 3 4 4 0
data
id <- c("a","b","a","b","a","a","a","a","b","b","c","a","a","a","a","a","a","a","a", "a")
SOG <- c(4,4,0,0,0,0,0,0,0,0,9,1,5,3,0,0,4,0,0,1)
data1 <- data.frame(id, SOG, stringsAsFactors=FALSE)

Resources