I have a function that takes a nested list (or list of lists) of integers as the input and assigns values of NA randomly according to some probability, p1. I would like to extend this function to remove the NAs from the list.
I know removing NAs is a common question on the internet and have reviewed the the questions on Stack Overflow and elsewhere, but none of the solutions work. In general, the questions posed do not refer to an actual list of lists.
I have tried:
#Example data
d<-list(1,3,c(0,NA,0),c(0,0))
e<-list(1,6,c(0,3,NA,0,NA,0),c(0,NA,0,1,0,0),1,NA,c(0,0))
f<-list(1,0)
L.miss<-list(d,e,f)
#Tests
test1<-lapply(L.miss,function(x) x[!is.na(x)]) #Doesnt work
test2<-lapply(L.miss,Filter,f=Negate(is.na)) #Doesnt work
test3<-lapply(L.miss,na.omit) #Doesnt work
Below is the function I am using to assign the NA values (also, don't laugh if its clunky, I am likely not near as experienced in coding as you!). I am also adding code that would generate a sample list of lists of length three, with various lengths of lists nested, that are similar to my actual data input (though length of 2000).
imperfect.passive<-function(z,p1){
z.imp<-z
obs.surv<-integer()
for (i in 1:length(z)){
for (j in 1:length(z[[i]])){
for (k in 1:length(z[[i]][[j]])){
for (l in 1:length(z[[i]][[j]][[k]])){
obs.surv[l]<-rbinom(1,1,p1)
if (obs.surv[l]==0){
z.imp[[i]][[j]][[k]]<-NA
}
}
#######################################
##'#TODO -> Remove NA values from list
#######################################
}
}
}
return(z.imp)
}
#####for small example
a<-list(1,3,c(0,2,0),c(0,0))
b<-list(1,6,c(0,3,2,0,1,0),c(0,0,0,1,0,0),1,2,c(0,0))
c<-list(1,0)
L.full<-list(a,b,c)
#assign NA with p=0.5
example<-imperfect.passive(L.full,0.5)
Any advice would be appreciated, and I apologize if this is answered elsewhere - I could not find it.
Use rapply:
rapply(L.miss, na.omit, how = "replace")
As an alternative to na.omit in the rapply function, which produces some extraneous information, I also found the following code to perform better for my purposes:
rapply(test2,function(x) x[!is.na(x)], how="replace")
Of course, either would work, just another option!
Related
I'm terrible at apply functions and every answer I looked up on here somehow is hard for me to apply to this problem, I've tried as hard as I've can to not post here.
I have a list of column names called "log_fields"
I want to go through each of these columns in my data frame "df" and replace the infinite values with 0.
This is the code I'm currently trying to use, their must be a syntax error with my function argument because I'm being told the argument values is missing.
sapply(df[log_fields], function(x) replace(is.infinite(x),0))
I'm incredibly greatful for the help!
lapply(df[log_fields], function(x) ifelse(is.infinite(x), 0, x)) as 李哲源 suggested.
lapply (df[log_fields], function(x) {x[is.infinite(x)] <- 0;x}) as dww suggested.
I'm pretty new in R, I only know its foundamental concepts.
I have the v vector:
v<-c(1,2,3,4)
and I would like to append to v four NA values, obtaining:
v(1,2,3,4,NA,NA,NA,NA)
T o solve this I can use a for loop:
for(i in 1:4){
v<-append(v, NA)
}
My question is: are there clever and/or faster R solutions I could use?
From the above comments we had found some useful answers where every new OP can view in aswer window rather than comment sections, thanks OP for your valuable answers
v <- c(v, rep(NA, 4)) # joel.wilson
length(v)<-length(v)+4 # Nicola
'length<-'(v, 8) # akrun
Please note:
in general the Joel.Wilson's option is the good one 'cause can be used to append several times a specific value (numeric, character, boolean, etc.), while other two solutions only NA values as they play on the length property.
This question is related to an earlier one, except the answer there (use c() function) is precisely what was not working for me.
First I create a list of vectors and then an additional vector.
a_list <- lapply(mtcars, as.integer)
junk <- c(1:length(a_list[[1]]))
Now, if use c(a_list, junk) (as suggested in the answer to the earlier question), I get a completely different answer from what I get if I say a_list[["junk"]] <- junk (the latter yielding the desired result). It seems that what gets added by the former approach is as.list(junk).
How can I add junk using c() without it being converted to the result of as.list(junk)?
Use list() in this way: c(a_list, junk=list(junk)).
In trying to work out exactly what was being added in the problematic scenario above (to better formulate my question), I realized that list and as.list do very different things. By turning junk into a single-element list (using list()), it gets added "as is" in the desired fashion.
Actually this was buried in a "show more comments" comment to Dirk Eddelbuettel's answer and (more embarrassingly) in the help for c() itself. Quoting the help:
## do *not* use
c(ll, d = 1:3) # which is == c(ll, as.list(c(d = 1:3))
## but rather
c(ll, d = list(1:3)) # c() combining two lists
I need to do a quality control in a dataset with more than 3000 variables (columns). However, I only want to apply some conditions in a couple of them. A first step would be to replace outliers by NA. I want to replace the observations that are greater or smaller than 3 standard deviations from the mean by NA. I got it, doing column by column:
height = ifelse(abs(height-mean(height,na.rm=TRUE)) <
3*sd(height,na.rm=TRUE),height,NA)
And I also want to create other variables based on different columns. For example:
data$CGmark = ifelse(!is.na(data$mark) & !is.na(data$height) ,
paste(data$age, data$mark,sep=""),NA)
An example of my dataset would be:
name = factor(c("A","B","C","D","E","F","G","H","H"))
height = c(120,NA,150,170,NA,146,132,210,NA)
age = c(10,20,0,30,40,50,60,NA,130)
mark = c(100,0.5,100,50,90,100,NA,50,210)
data = data.frame(name=name,mark=mark,age=age,height=height)
data
I have tried this (for one condition):
d1=names(data)
list = c("age","height","mark")
ntraits=length(list)
nrows=dim(data)[1]
for(i in 1:ntraits){
a=list[i]
b=which(d1==a)
d2=data[,b]
for (j in 1:nrows){
d2[j] = ifelse(abs(d2[j]-mean(d2,na.rm=TRUE)) < 3*sd(d2,na.rm=TRUE),d2[j],NA)
}
}
Someone told me that I am not storing d2. How can I create for loops to apply the conditions I want? I know that there are similar questions but i didnt get it yet. Thanks in advance.
You pretty much wrote the answer in your first line. You're overthinking this one.
First, it's good practice to encapsulate this kind of operation in a function. Yes, function dispatch is a tiny bit slower than otherwise, but the code is often easier to read and debug. Same goes for assigning "helper" variables like mean_x: the cost of assigning the variable is very, very small and absolutely not worth worrying about.
NA_outside_3s <- function(x) {
mean_x <- mean(x)
sd_x <- sd(x,na.rm=TRUE)
x_outside_3s <- abs(x - mean(x)) < 3 * sd_x
x[x_outside_3s] <- NA # no need for ifelse here
x
}
of course, you can choose any function name you want. More descriptive is better.
Then if you want to apply the function to very column, just loop over the columns. That function NA_outside_3s is already vectorized, i.e. it takes a logical vector as an argument and returns a vector of the same length.
cols_to_loop_over <- 1:ncol(my_data) # or, some subset of columns.
for (j in cols_to_loop_over) {
my_data[, j] <- NA_if_3_sd(my_data[, j])
}
I'm not sure why you wrote your code the way you did (and it took me a minute to even understand what you were trying to do), but looping over columns is usually straightforward.
In my comment I said not to worry about efficiency, but once you understand how the loop works, you should rewrite it using lapply:
my_data[cols_to_loop_over] <- lapply(my_data[cols_to_loop_over], NA_outside_3s)
Once you know how the apply family of functions works, they are very easy to read if written properly. And yes, they are somewhat faster than looping, but not as much as they used to be. It's more a matter of style and readability.
Also: do NOT name a variable list! This masks the function list, which is an R built-in function and a fairly important one at that. You also shouldn't generally name variables data because there is also a data function for loading built-in data sets.
Does anybody know if it is possible to define an own function that takes undetermined many arguments as input? My concrete problem is that I would like to write an own rbind function that is able to rbind data.frames with similar rownames (and just assigns new, numerical rownames).
This approach here is obviously wrong, but I hope you get my problem/idea:
rbindDF <- function(x){
N <- length(x)
# Join x[1] and x[2]
...
# Join x[n-1] and x[n]
}
I tried to find, how it is done e.g. in rbind or sum but I cannot remember how to see the source code from .Internal functions.
Using a call rbindDF(list(...)) is maybe one compromise, but I would be happy if it could be done in such a way if three data frames are present rbindDF(data1,data2,data3) and e.g. in case of two like this rbindDF(data1,data2).
Thanks a lot for any hint!
You can use the ellipsis operator (...). E.g.:
rbindDF <- function(...) {
df_list <- list(...)
do.call(rbind, df_list)
}
This will allow any number of data frames to be passed in:
rbindDF(df1, df2, df3)
I take it this question was just about the need for passing an unknown number of arguments rather than the contents of the function itself.