Assume I have the following list:
list(c(1:5,NA,NA),NA,c(NA,6:10))
[[1]]
[1] 1 2 3 4 5 NA NA
[[2]]
[1] NA
[[3]]
[1] NA 6 7 8 9 10
I want to replace all NAs with 0:
[[1]]
[1] 1 2 3 4 5 0 0
[[2]]
[1] 0
[[3]]
[1] 0 6 7 8 9 10
I was originally thinking is.na would be involved, but couldn't get it to affect all list elements. I learned from the related question (Remove NA from list of lists), that using lapply would allow me to apply is.na to each element, but that post demonstrates how to remove (not replace) NA values.
How do I replace NA values from multiple list elements?
I've tried for loops and ifelse approaches, but everything I've tried is either slow, doesn't work or just plain clunky. There's got to be a simple way to do this with an apply function...
And there is!
Here's a simple lapply approach using the replace function:
L1 <-list(c(1:5,NA,NA),NA,c(NA,6:10))
lapply(L1, function(x) replace(x,is.na(x),0))
With the desired result:
[[1]]
[1] 1 2 3 4 5 0 0
[[2]]
[1] 0
[[3]]
[1] 0 6 7 8 9 10
There are multiple ways to do this:
using map from purrrr package.
lt <- list(c(1:5,NA,NA),NA,c(NA,6:10))
lt %>%
map(~replace(., is.na(.), 0))
#output
[[1]]
[1] 1 2 3 4 5 0 0
[[2]]
[1] 0
[[3]]
[1] 0 6 7 8 9 10
kk<- list(c(1:5,NA,NA),NA,c(1,6:10))
lapply(kk, function(i)
{ p<- which(is.na(i)==TRUE)
i[p] <- 0
i
})
Edited upon Gregor's commment
lapply(kk, function(i) {i[is.na(i)] <- 0; i})
I've decided to benchmark the various lapply approaches mentioned:
lapply(Lt, function(x) replace(x,is.na(x),0))
lapply(Lt, function(x) {x[is.na(x)] <- 0; x})
lapply(Lt, function(x) ifelse(is.na(x), 0, x))
Benchmarking code:
Lt <- lapply(1:10000, function(x) sample(c(1:10000,rep(NA,1000))) ) ##Sample list
elapsed.time <- data.frame(
m1 = mean(replicate(25,system.time(lapply(Lt, function(x) replace(x,is.na(x),0)))[3])),
m2 = mean(replicate(25,system.time(lapply(Lt, function(x) {x[is.na(x)] <- 0; x}))[3])),
m3 = mean(replicate(25,system.time(lapply(Lt, function(x) ifelse(is.na(x), 0, x)))[3]))
)
Results:
Function Average Elapsed Time
lapply(Lt, function(x) replace(x,is.na(x),0)) 0.8684
lapply(Lt, function(x) {x[is.na(x)] <- 0; x}) 0.8936
lapply(Lt, function(x) ifelse(is.na(x), 0, x)) 8.3176
The replace approach is fastest followed closely by the [] approach. The ifelse approach is 10x slower.
This will deal with any list depth and structure:
x <- eval(parse(text=gsub("NA","0",capture.output(dput(a)))))
# [[1]]
# [1] 1 2 3 4 5 0 0
#
# [[2]]
# [1] 0
#
# [[3]]
# [1] 0 6 7 8 9 10
Try this:
lapply(enlist, function(x) { x[!is.na(x)]})
where:
enlist <- list(c(1:5,NA,NA),NA,c(NA,6:10))
This yields:
[[1]]
[1] 1 2 3 4 5
[[2]]
logical(0)
[[3]]
[1] 6 7 8 9 10
Related
Will most likely expose that I am new to R, but in SPSS, running lags is very easy. Obviously this is user error, but what I am missing?
x <- sample(c(1:9), 10, replace = T)
y <- lag(x, 1)
ds <- cbind(x, y)
ds
Results in:
x y
[1,] 4 4
[2,] 6 6
[3,] 3 3
[4,] 4 4
[5,] 3 3
[6,] 5 5
[7,] 8 8
[8,] 9 9
[9,] 3 3
[10,] 7 7
I figured I would see:
x y
[1,] 4
[2,] 6 4
[3,] 3 6
[4,] 4 3
[5,] 3 4
[6,] 5 3
[7,] 8 5
[8,] 9 8
[9,] 3 9
[10,] 7 3
Any guidance will be much appreciated.
I had the same problem, but I didn't want to use zoo or xts, so I wrote a simple lag function for data frames:
lagpad <- function(x, k) {
if (k>0) {
return (c(rep(NA, k), x)[1 : length(x)] );
}
else {
return (c(x[(-k+1) : length(x)], rep(NA, -k)));
}
}
This can lag forward or backwards:
x<-1:3;
(cbind(x, lagpad(x, 1), lagpad(x,-1)))
x
[1,] 1 NA 2
[2,] 2 1 3
[3,] 3 2 NA
Another way to deal with this is using the zoo package, which has a lag method that will pad the result with NA:
require(zoo)
> set.seed(123)
> x <- zoo(sample(c(1:9), 10, replace = T))
> y <- lag(x, -1, na.pad = TRUE)
> cbind(x, y)
x y
1 3 NA
2 8 3
3 4 8
4 8 4
5 9 8
6 1 9
7 5 1
8 9 5
9 5 9
10 5 5
The result is a multivariate zoo object (which is an enhanced matrix), but easily converted to a data.frame via
> data.frame(cbind(x, y))
lag does not shift the data, it only shifts the "time-base". x has no "time base", so cbind does not work as you expected. Try cbind(as.ts(x),lag(x)) and notice that a "lag" of 1 shifts the periods forward.
I would suggesting using zoo / xts for time series. The zoo vignettes are particularly helpful.
Using just standard R functions this can be achieved in a much simpler way:
x <- sample(c(1:9), 10, replace = T)
y <- c(NA, head(x, -1))
ds <- cbind(x, y)
ds
lag() works with time series, whereas you are trying to use bare matrices. This old question suggests using embed instead, like so:
lagmatrix <- function(x,max.lag) embed(c(rep(NA,max.lag), x), max.lag+1)
for instance
> x
[1] 8 2 3 9 8 5 6 8 5 8
> lagmatrix(x, 1)
[,1] [,2]
[1,] 8 NA
[2,] 2 8
[3,] 3 2
[4,] 9 3
[5,] 8 9
[6,] 5 8
[7,] 6 5
[8,] 8 6
[9,] 5 8
[10,] 8 5
The easiest way to me now appears to be the following:
require(dplyr)
df <- data.frame(x = sample(c(1:9), 10, replace = T))
df <- df %>% mutate(y = lag(x))
tmp<-rnorm(10)
tmp2<-c(NA,tmp[1:length(tmp)-1])
tmp
tmp2
This should accommodate vectors or matrices as well as negative lags:
lagpad <- function(x, k=1) {
i<-is.vector(x)
if(is.vector(x)) x<-matrix(x) else x<-matrix(x,nrow(x))
if(k>0) {
x <- rbind(matrix(rep(NA, k*ncol(x)),ncol=ncol(x)), matrix(x[1:(nrow(x)-k),], ncol=ncol(x)))
}
else {
x <- rbind(matrix(x[(-k+1):(nrow(x)),], ncol=ncol(x)),matrix(rep(NA, -k*ncol(x)),ncol=ncol(x)))
}
if(i) x[1:length(x)] else x
}
Using data.table:
> x <- sample(c(1:9), 10, replace = T)
> y <- data.table::shift(x)
> ds <- cbind(x, y)
> ds
x y
[1,] 5 NA
[2,] 4 5
[3,] 3 4
[4,] 3 3
[5,] 4 3
[6,] 8 4
[7,] 1 8
[8,] 7 1
[9,] 9 7
[10,] 7 9
a simple way to do the same may be copying the data to a new data
frame and changing the index number. Make sure the original table is indexed sequentially with no gaps
e.g.
tempData <- originalData
rownames(tempData) <- 2:(nrow(tempData)+1)
if you want it in the same data frame as the original use a cbind function
Two options, in base R and with data.table:
baseShiftBy1 <- function(x) c(NA, x[-length(x)])
baseShiftBy1(x)
[1] NA 3 8 4 8 9 1 5 9 5
data.table::shift(x)
[1] NA 3 8 4 8 9 1 5 9 5
Data:
set.seed(123)
(x <- sample(c(1:9), 10, replace = T))
[1] 3 8 4 8 9 1 5 9 5 5
I went with a similar solution to Andrew's (dedicated function instead of xts or zoo), but with a terser formulation that I find easier to reason about:
lagpad <- function(x, k) {
if (k == 0) { return(x) }
k.pos <- max(0, k)
k.neg <- max(0, -k)
c(rep(NA, k.pos), head(x, -k.pos), # empty if k<0, else lagging x
tail(x, -k.neg), rep(NA, k.neg)) # empty if k>0, else leading x
}
Just get rid of lag. Change your line for y to:
y <- c(NA, x[-1])
Will most likely expose that I am new to R, but in SPSS, running lags is very easy. Obviously this is user error, but what I am missing?
x <- sample(c(1:9), 10, replace = T)
y <- lag(x, 1)
ds <- cbind(x, y)
ds
Results in:
x y
[1,] 4 4
[2,] 6 6
[3,] 3 3
[4,] 4 4
[5,] 3 3
[6,] 5 5
[7,] 8 8
[8,] 9 9
[9,] 3 3
[10,] 7 7
I figured I would see:
x y
[1,] 4
[2,] 6 4
[3,] 3 6
[4,] 4 3
[5,] 3 4
[6,] 5 3
[7,] 8 5
[8,] 9 8
[9,] 3 9
[10,] 7 3
Any guidance will be much appreciated.
I had the same problem, but I didn't want to use zoo or xts, so I wrote a simple lag function for data frames:
lagpad <- function(x, k) {
if (k>0) {
return (c(rep(NA, k), x)[1 : length(x)] );
}
else {
return (c(x[(-k+1) : length(x)], rep(NA, -k)));
}
}
This can lag forward or backwards:
x<-1:3;
(cbind(x, lagpad(x, 1), lagpad(x,-1)))
x
[1,] 1 NA 2
[2,] 2 1 3
[3,] 3 2 NA
Another way to deal with this is using the zoo package, which has a lag method that will pad the result with NA:
require(zoo)
> set.seed(123)
> x <- zoo(sample(c(1:9), 10, replace = T))
> y <- lag(x, -1, na.pad = TRUE)
> cbind(x, y)
x y
1 3 NA
2 8 3
3 4 8
4 8 4
5 9 8
6 1 9
7 5 1
8 9 5
9 5 9
10 5 5
The result is a multivariate zoo object (which is an enhanced matrix), but easily converted to a data.frame via
> data.frame(cbind(x, y))
lag does not shift the data, it only shifts the "time-base". x has no "time base", so cbind does not work as you expected. Try cbind(as.ts(x),lag(x)) and notice that a "lag" of 1 shifts the periods forward.
I would suggesting using zoo / xts for time series. The zoo vignettes are particularly helpful.
Using just standard R functions this can be achieved in a much simpler way:
x <- sample(c(1:9), 10, replace = T)
y <- c(NA, head(x, -1))
ds <- cbind(x, y)
ds
lag() works with time series, whereas you are trying to use bare matrices. This old question suggests using embed instead, like so:
lagmatrix <- function(x,max.lag) embed(c(rep(NA,max.lag), x), max.lag+1)
for instance
> x
[1] 8 2 3 9 8 5 6 8 5 8
> lagmatrix(x, 1)
[,1] [,2]
[1,] 8 NA
[2,] 2 8
[3,] 3 2
[4,] 9 3
[5,] 8 9
[6,] 5 8
[7,] 6 5
[8,] 8 6
[9,] 5 8
[10,] 8 5
The easiest way to me now appears to be the following:
require(dplyr)
df <- data.frame(x = sample(c(1:9), 10, replace = T))
df <- df %>% mutate(y = lag(x))
tmp<-rnorm(10)
tmp2<-c(NA,tmp[1:length(tmp)-1])
tmp
tmp2
This should accommodate vectors or matrices as well as negative lags:
lagpad <- function(x, k=1) {
i<-is.vector(x)
if(is.vector(x)) x<-matrix(x) else x<-matrix(x,nrow(x))
if(k>0) {
x <- rbind(matrix(rep(NA, k*ncol(x)),ncol=ncol(x)), matrix(x[1:(nrow(x)-k),], ncol=ncol(x)))
}
else {
x <- rbind(matrix(x[(-k+1):(nrow(x)),], ncol=ncol(x)),matrix(rep(NA, -k*ncol(x)),ncol=ncol(x)))
}
if(i) x[1:length(x)] else x
}
Using data.table:
> x <- sample(c(1:9), 10, replace = T)
> y <- data.table::shift(x)
> ds <- cbind(x, y)
> ds
x y
[1,] 5 NA
[2,] 4 5
[3,] 3 4
[4,] 3 3
[5,] 4 3
[6,] 8 4
[7,] 1 8
[8,] 7 1
[9,] 9 7
[10,] 7 9
a simple way to do the same may be copying the data to a new data
frame and changing the index number. Make sure the original table is indexed sequentially with no gaps
e.g.
tempData <- originalData
rownames(tempData) <- 2:(nrow(tempData)+1)
if you want it in the same data frame as the original use a cbind function
Two options, in base R and with data.table:
baseShiftBy1 <- function(x) c(NA, x[-length(x)])
baseShiftBy1(x)
[1] NA 3 8 4 8 9 1 5 9 5
data.table::shift(x)
[1] NA 3 8 4 8 9 1 5 9 5
Data:
set.seed(123)
(x <- sample(c(1:9), 10, replace = T))
[1] 3 8 4 8 9 1 5 9 5 5
I went with a similar solution to Andrew's (dedicated function instead of xts or zoo), but with a terser formulation that I find easier to reason about:
lagpad <- function(x, k) {
if (k == 0) { return(x) }
k.pos <- max(0, k)
k.neg <- max(0, -k)
c(rep(NA, k.pos), head(x, -k.pos), # empty if k<0, else lagging x
tail(x, -k.neg), rep(NA, k.neg)) # empty if k>0, else leading x
}
Just get rid of lag. Change your line for y to:
y <- c(NA, x[-1])
What is the most elegant way to split a vector into n-Elements based on a condition?
Every separate true-block should go into its own list element. All the false elements get thrown away.
example1:
vec <- c(1:3,NA,NA,NA,4:6,NA,NA,NA,7:9,NA)
cond <- !is.na(vec)
result = list(1:3,4:6,7:9)
example2:
vec_2 <- c(3:1,11:13,6:4,14:16,9:7,20)
cond_2 <- vec_2 < 10
results_2 = list(3:1,6:4,9:7)
It would be great to have a general solution for a vector vec and a relating condition cond.
My best try:
res <- split(vec,data.table::rleidv(cond))
odd <- as.logical(seq_along(res)%%2)
res[if(cond[1])odd else !odd]
I guess this should work generally:
> split(vec[cond], data.table::rleid(cond)[cond])
$`1`
[1] 1 2 3
$`3`
[1] 4 5 6
$`5`
[1] 7 8 9
Let's make it a function:
> f <- function(vec, cond) split(vec[cond], data.table::rleid(cond)[cond])
> f(vec_2, cond_2)
$`1`
[1] 3 2 1
$`3`
[1] 6 5 4
$`5`
[1] 9 8 7
Here is a base R option with rle
grp <- with(rle(cond), rep(seq_along(values) * NA^ !values, lengths))
split(vec[cond], grp[cond])
#$`1`
#[1] 1 2 3
#$`3`
#[1] 4 5 6
#$`5`
#[1] 7 8 9
Similarly with 'vec_2'
grp <- with(rle(cond_2), rep(seq_along(values) * NA^ !values, lengths))
split(vec_2[cond_2], grp[cond_2])
#$`1`
#[1] 3 2 1
#$`3`
#[1] 6 5 4
#$`5`
#[1] 9 8 7
Or create a grouping variable with cumsum and diff
grp <- cumsum(c(TRUE, diff(cond) < 0)) * NA^ is.na(vec)
Suppose we have a value y=4, and a list of vectors, I want to check if this value belongs to any vector in the list if yes, I will add this value to all the elements of vectors.
y<-4
M<- list( c(1,3,4,6) , c(2,3,5), c(1,3,6) ,c(1,4,5,6))
> M
[[1]]
[1] 1 3 4 6
[[2]]
[1] 2 3 5
[[3]]
[1] 1 3 6
[[4]]
[1] 1 4 5 6
The outcomes will be similar to :
> R
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
We can use keep which only keeps elements that satisfy a predicate. In this case, it is only keeping the vectors that contain y.
We then add y to each of the vectors.
library('tidyverse')
keep(M, ~y %in% .) %>%
map(~. + y)
Here is a simple hacky way to do this:
lapply(M[sapply(M, function(x){y %in% x})],function(x){x+y})
returning:
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Logic: use sapply to work out which parts of M have a 4 in, then add 4 to those with lapply
You can do this with...
lapply(M[sapply(M, `%in%`, x=y)], `+`, y)
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Here is a method with lapply and set functions.
# loop through M, check length of intersect
myList <- lapply(M, function(x) if(length(intersect(y, x)) > 0) x + y else NULL)
# now subset, dropping the NULL elements
myList <- myList[lengths(myList) > 0]
this returns
myList
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Wow! everyone has given great answers, just including the use of Map functionality.
Map("+",M[unlist(Map("%in%", y,M))],y)
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
I have the following list in R and I want to replace all NULL in the list with zero. Is there a better way of doing this rather than iterating through the list?
$`2014-06-15`
NULL
$`2014-06-16`
[1] 7
$`2014-06-17`
[1] 17
$`2014-06-18`
[1] 24
$`2014-06-19`
[1] 8
$`2014-06-20`
[1] 11
$`2014-06-21`
NULL
$`2014-06-22`
[1] 1
$`2014-06-23`
[1] 20
$`2014-06-24`
[1] 21
In reference to your solution, this way is easier and faster than replacing with a for loop and if statement. Here's a short example.
> ( temp <- list(A = NULL, B = 1:5) )
# $A
# NULL
#
# $B
# [1] 1 2 3 4 5
> temp[sapply(temp, is.null)] <- 0
> temp
# $A
# [1] 0
#
# $B
# [1] 1 2 3 4 5
Nevermind solved it.
temp is my list of dates above
allDates <- names(temp)
for (i in allDates) {
if (is.null(temp[[i]]))
temp[[i]] <- 0
}