Let's say I have two vectors, one that includes NA values, and another that is the length of the first vector after dropping the NA values. I am looking to insert the NA values from the first vector into the second vector, while keeping the position of the NA values the same.
a<-c(1,2,3,6,5,NA,4,5,NA,45,6,NA)
b<-c(1,2,4,3,6,5,7,8,40)
This can be done by concatenating each component, but this seems extremely tedious, especially since my data are much more complicated than the above example. Something like
b[which(is.na(a))]<-NA
is what I am looking for, but this of course replaces elements instead of inserting elements like I want. I am at a loss for this even though it seems relatively simple.
Create a NA vector of the same length as 'a' and then replace based on the non NA elements in 'a'
b <- replace(rep(NA, length(a)), !is.na(a), b)
-output
b
#[1] 1 2 4 3 6 NA 5 7 NA 8 40 NA
Or more compactly, do the replace on 'a'
replace(a, !is.na(a), b)
[1] 1 2 4 3 6 NA 5 7 NA 8 40 NA
Related
I want to turn the entire content of a numeric (incl. NA's) data frame into one column. What would be the smartest way of achieving the following?
>df <- data.frame(C1=c(1,NA,3),C2=c(4,5,NA),C3=c(NA,8,9))
>df
C1 C2 C3
1 1 4 NA
2 NA 5 8
3 3 NA 9
>x <- mysterious_operation(df)
>x
[1] 1 NA 3 4 5 NA NA 8 9
I want to calculate the mean of this vector, so ideally I'd want to remove the NA's within the mysterious_operation - the data frame I'm working on is very large so it will probably be a good idea.
Here's a couple ways with purrr:
# using invoke, a wrapper around do.call
purrr::invoke(c, df, use.names = FALSE)
# similar to unlist, reduce list of lists to a single vector
purrr::flatten_dbl(df)
Both return:
[1] 1 NA 3 4 5 NA NA 8 9
The mysterious operation you are looking for is called unlist:
> df <- data.frame(C1=c(1,NA,3),C2=c(4,5,NA),C3=c(NA,8,9))
> unlist(df, use.names = F)
[1] 1 NA 3 4 5 NA NA 8 9
We can use unlist and create a single column data.frame
df1 <- data.frame(col =unlist(df))
Just for fun. Of course unlist is the most appropriate function.
alternative
stack(df)[,1]
alternative
do.call(c,df)
do.call(c,c(df,use.names=F)) #unnamed version
Maybe they are more mysterious.
I have a translation table with 67 columns and I get an input of 67 columns.
My goal is to check if I can find it within this translation table.
To be clear, 67 columns build a key, and additional 10 are the actual values for this key.
Please advise how can I quickly find it if some of the columns (variables) in the input can be with NA value?
small example:
input:
a b c d e
1 9 "r" NA NA
translation table:
a b c d e
5 NA NA NA 9
6 9 "o" 4 3
1 9 "r" NA NA
We can use a paste method to create a string for each row in both datasets and then with %in% get a logical vector indicating the string is contained in the other vector. Wrapping with which gives the position of the rows where this is TRUE
which(do.call(paste, df2) %in% do.call(paste, df1))
I wish to know (by name) which columns in my data frame satisfy a particular condition. For example, if I was looking for the names of any columns that contained more than 3 NA, how could I proceed?
>frame
m n o p
1 0 NA NA NA
2 0 2 2 2
3 0 NA NA NA
4 0 NA NA 1
5 0 NA NA NA
6 0 1 2 3
> for (i in frame){
na <- is.na(i)
as.numeric(na)
total<-sum(na)
if(total>3){
print (i) }}
[1] NA 2 NA NA NA 1
[2] NA 2 NA NA NA 2
So this actually succeeds in evaluating which columns satisfy the condition, however, it does not display the column name. Perhaps subsetting the columns which interest me would be another way to do it, but I'm not sure how to solve it that way either. Plus I'd prefer to know if there's a way to just get the names directly.
I'll appreciate any input.
We can use colSums on a logical matrix (is.na(frame)), check whether it is greater than 3 to get a logical vector and then subset the names of 'frame' based on that.
names(frame)[colSums(is.na(frame))>3]
#[1] "n" "o"
If we are using dplyr, one way is
library(dplyr)
frame %>%
summarise_each(funs(sum(is.na(.))>3)) %>%
unlist() %>%
names(.)[.]
#[1] "n" "o"
Looking for a quick-and-easy solution to a problem which I have only been able to solve inelegantly, by looping. I have an ID vector which looks something like this:
id<-c(NA,NA,1,1,1,NA,1,NA,2,2,2,NA,3,NA,3,3,3)
The NA's that fall in-between a sequence of a single number (id[6], id[14]) need to be replaced by that number. However, the NA's that don't meet this condition (those between sequences of two different numbers) need to be left alone (i.e., id[1],id[2],id[8],id[12]). The target vector is therefore:
id.target<-c(NA,NA,1,1,1,1,1,NA,2,2,2,NA,3,3,3,3,3)
This is not difficult to do by looping through each value, but I am looking to do this to many very long vectors, and was hoping for a neater solution. Thanks for any suggestions.
This seem to work. The idea is to use zoo::na.locf in order to fill the NAs correctly and then insert NAs when they are between different numbers
id.target <- zoo::na.locf(id, na.rm = FALSE)
id.target[(c(diff(id.target), 1L) > 0L) & is.na(id)] <- NA
id.target
## [1] NA NA 1 1 1 1 1 NA 2 2 2 NA 3 3 3 3 3
Here is a base R option
d1 <- do.call(rbind,lapply(split(seq_along(id), id), function(x) {
i1 <- min(x):max(x)
data.frame(val= unique(id[x]), i1)}))
id[seq_along(id) %in% d1$i1 ] <- d1$val
id
#[1] NA NA 1 1 1 1 1 NA 2 2 2 NA 3 3 3 3 3
I have a vector of values which include NAs. The values need to be processed by an external program that can't handle NAs, so they are stripped out, its written to a file, processed, then read back in, resulting in a vector of the length of the number of non-NAs. Example, suppose the input is 7 3 4 NA 5 4 6 NA 1 NA, then the output would just be 7 values. What I need to do is re-insert the NAs in position.
So, given two vectors X and Y:
> X
[1] 64 1 9 100 16 NA 25 NA 4 49 36 NA 81
> Y
[1] 8 1 3 10 4 5 2 7 6 9
produce:
8 1 3 10 4 NA 5 NA 2 7 6 NA 9
(you may notice that X is Y^2, thats just for an example).
I could knock out a function to do this but I wonder if there's any nice tricksy ways of doing it... split, list, length... hmmm...
na.omit keeps an attribute of the locations of the NA in the original series, so you could use that to know where to put the missing values:
Y <- sqrt(na.omit(X))
Z <- rep(NA,length(Y)+length(attr(Y,"na.action")))
Z[-attr(Y,"na.action")] <- Y
#> Z
# [1] 8 1 3 10 4 NA 5 NA 2 7 6 NA 9
Answering my own question is probably very bad form, but I think this is probably about the neatest:
rena <- function(X,Z){
Y=rep(NA,length(X))
Y[!is.na(X)]=Z
Y
}
Can also try replace:
replace(X, !is.na(X), Y)
Another variant on the same theme
rena <- function(X,Z){
X[which(!is.na(X))]=Z
X
}
R automatically fills the rest with NA.
Edit: Corrected by Marek.