Looking for a quick-and-easy solution to a problem which I have only been able to solve inelegantly, by looping. I have an ID vector which looks something like this:
id<-c(NA,NA,1,1,1,NA,1,NA,2,2,2,NA,3,NA,3,3,3)
The NA's that fall in-between a sequence of a single number (id[6], id[14]) need to be replaced by that number. However, the NA's that don't meet this condition (those between sequences of two different numbers) need to be left alone (i.e., id[1],id[2],id[8],id[12]). The target vector is therefore:
id.target<-c(NA,NA,1,1,1,1,1,NA,2,2,2,NA,3,3,3,3,3)
This is not difficult to do by looping through each value, but I am looking to do this to many very long vectors, and was hoping for a neater solution. Thanks for any suggestions.
This seem to work. The idea is to use zoo::na.locf in order to fill the NAs correctly and then insert NAs when they are between different numbers
id.target <- zoo::na.locf(id, na.rm = FALSE)
id.target[(c(diff(id.target), 1L) > 0L) & is.na(id)] <- NA
id.target
## [1] NA NA 1 1 1 1 1 NA 2 2 2 NA 3 3 3 3 3
Here is a base R option
d1 <- do.call(rbind,lapply(split(seq_along(id), id), function(x) {
i1 <- min(x):max(x)
data.frame(val= unique(id[x]), i1)}))
id[seq_along(id) %in% d1$i1 ] <- d1$val
id
#[1] NA NA 1 1 1 1 1 NA 2 2 2 NA 3 3 3 3 3
Related
Let's say I have two vectors, one that includes NA values, and another that is the length of the first vector after dropping the NA values. I am looking to insert the NA values from the first vector into the second vector, while keeping the position of the NA values the same.
a<-c(1,2,3,6,5,NA,4,5,NA,45,6,NA)
b<-c(1,2,4,3,6,5,7,8,40)
This can be done by concatenating each component, but this seems extremely tedious, especially since my data are much more complicated than the above example. Something like
b[which(is.na(a))]<-NA
is what I am looking for, but this of course replaces elements instead of inserting elements like I want. I am at a loss for this even though it seems relatively simple.
Create a NA vector of the same length as 'a' and then replace based on the non NA elements in 'a'
b <- replace(rep(NA, length(a)), !is.na(a), b)
-output
b
#[1] 1 2 4 3 6 NA 5 7 NA 8 40 NA
Or more compactly, do the replace on 'a'
replace(a, !is.na(a), b)
[1] 1 2 4 3 6 NA 5 7 NA 8 40 NA
The sample data set of available much bigger data set is in following format:
Station <-c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A")
Parameter <-c(2,3,NA,4,4,9,NA,NA,10,15,NA,NA,NA,18,20)
Par_Count <-c(1,1,1,2,2,1,2,2,1,1,3,3,3,1,1)
df<-data.frame(Station, Parameter, Par_Count)
df
Station Parameter Par_Count
A 2 1
A 3 1
A NA 1
A 4 2
A 4 2
A 9 1
A NA 2
A NA 2
A 10 1
A 15 1
A NA 3
A NA 3
A NA 3
A 18 1
A 20 1
I want to approximate NA's which are less than 2 in number with average of next and previous available values for NA in that column. In original data set somewhere NA's are 100's in number, so I want to ignore consecutive NA's greater than 3 in number. Par_Count represent number of consecutive occurrences of that particular value in parameter.
I tried with:
library(zoo)
df1 <- within(df, na.approx(df$Parameter, maxgap = 2))
and even for for single occurence with:
df1 <- within(df, Parameter[Parameter == is.na(df$Parameter) & Par_Count == 1] <-
lead(Parameter) - lag(Parameter))
but nothing worked. It didn't change any occurrence of NA value.
The desired output is like:
Station Parameter Par_Count
A 2 1
A 3 1
A 3.5 1
A 4 2
A 4 2
A 9 1
A 9.5 2
A 9.75 2 <--here 9.5 will also work
A 10 1
A 15 1
A NA 3
A NA 3
A NA 3
A 18 1
A 20 1
You are nearly there. I think you have misinterpreted the use of within. If you would like to use within, You need to assign the output of na.approx to a column of the data frame. The following will work:
library(zoo)
df1 <- within(df, Parameter <- na.approx(Parameter, maxgap = 2, na.rm = FALSE))
Note it is advisable to use na.rm = FALSE, otherwise leading or trailing NAs will be removed, leading to an error.
Personally, I think the following is more readable, though it is a matter of style.
library(zoo)
df1 <- df
df1$Parameter <- na.approx(df$Parameter, maxgap = 2, na.rm = FALSE))
To count how many times one row is equal to a value
I have a df here:
df <- data.frame('v1'=c(1,2,3,4,5),
'v2'=c(1,2,1,1,2),
'v3'=c(NA,2,1,4,'1'),
'v4'=c(1,2,3,NaN,5),
'logical'=c(1,2,3,4,5))
I would like to know how many times one row is equal to the value of the variable 'logical' with a new variable 'count'
I wrte a for loop like this:
attach(df)
df$count <- 0
for(i in colnames(v1:v4)){
if(df$logical == i){
df$count <- df$count+1}
}
but it doesn't work. there's still all 0 in the new variable 'count'.
Please help to fix it.
the perfect result should looks like this:
df <- data.frame('v1'=c(1,2,3,4,5),
'v2'=c(1,2,1,1,2),
'v3'=c(NA,2,1,4,'1'),
'v4'=c(1,2,3,NaN,5),
'logical'=c(1,2,3,4,5),
'count'=c(3,4,2,2,2))
Many thanks from a beginner.
We can use rowSums after creating a logical matrix
df$count <- rowSums(df[1:4] == df$logical, na.rm = TRUE)
df$count
#[1] 3 4 2 2 2
Personally I guess so far the solution by #akrun is an elegant and also the best efficient way to add the column count.
Another way (I don't know if that is the one you are looking for the "elegance") you can used to "attach" the column the count column to the end of df might be using within, i.e.,
df <- within(df, count <- rowSums(df[1:4]==logical,na.rm = T))
such that you will get
> df
v1 v2 v3 v4 logical count
1 1 1 <NA> 1 1 3
2 2 2 2 2 2 4
3 3 1 1 3 3 2
4 4 1 4 NaN 4 2
5 5 2 1 5 5 2
I want to turn the entire content of a numeric (incl. NA's) data frame into one column. What would be the smartest way of achieving the following?
>df <- data.frame(C1=c(1,NA,3),C2=c(4,5,NA),C3=c(NA,8,9))
>df
C1 C2 C3
1 1 4 NA
2 NA 5 8
3 3 NA 9
>x <- mysterious_operation(df)
>x
[1] 1 NA 3 4 5 NA NA 8 9
I want to calculate the mean of this vector, so ideally I'd want to remove the NA's within the mysterious_operation - the data frame I'm working on is very large so it will probably be a good idea.
Here's a couple ways with purrr:
# using invoke, a wrapper around do.call
purrr::invoke(c, df, use.names = FALSE)
# similar to unlist, reduce list of lists to a single vector
purrr::flatten_dbl(df)
Both return:
[1] 1 NA 3 4 5 NA NA 8 9
The mysterious operation you are looking for is called unlist:
> df <- data.frame(C1=c(1,NA,3),C2=c(4,5,NA),C3=c(NA,8,9))
> unlist(df, use.names = F)
[1] 1 NA 3 4 5 NA NA 8 9
We can use unlist and create a single column data.frame
df1 <- data.frame(col =unlist(df))
Just for fun. Of course unlist is the most appropriate function.
alternative
stack(df)[,1]
alternative
do.call(c,df)
do.call(c,c(df,use.names=F)) #unnamed version
Maybe they are more mysterious.
I am trying to replace all the groups of elements in a vector that sum up to zero with NAs.
The size of each group is 3. For instance:
a = c(0,0,0,0,2,3,1,0,2,0,0,0,0,1,2,0,0,0)
should be finally:
c(NA,NA,NA,0,2,3,1,0,2,NA,NA,NA,0,1,2,NA,NA,NA)
Until now, I have managed to find the groups having the sum equal to zero via:
b = which(tapply(a,rep(1:(length(a)/3),each=3),sum) == 0)
which yields c(1,4,6)
I then calculate the starting indexes of the groups in the vector via: b <- b*3-2.
Probably there is a more elegant way, but this is what I've stitched together so far.
Now I am stuck at "expanding" the vector of start indexes, to generate a sequence of the elements to be replaced. For instance, if vector b now contains c(1,10,16), I will need a sequence c(1,2,3,10,11,12,16,17,18) which are the indexes of the elements to replace by NAs.
If you have any idea of a solution without a for loop or even a more simple/elegant solution for the whole problem, I would appreciate it. Thank you.
Marius
You can use something like this:
a[as.logical(ave(a, 0:(length(a)-1) %/% 3,
FUN = function(x) sum(x) == 0))] <- NA
a
# [1] NA NA NA 0 2 3 1 0 2 NA NA NA 0 1 2 NA NA NA
The 0:(length(a)-1) %/% 3 creates groups of your desired length (in this case, 3) and ave is used to check whether those groups add to 0 or not.
To designate the values to the same group turn your vector into (a three-row) matrix. You can then calculate the column-wise sums and compare with 0. The rest is simple.
a <- c(0,0,0,0,2,3,1,0,2,0,0,0,0,1,2,0,0,0)
a <- as.integer(a)
is.na(a) <- rep(colSums(matrix(a, 3L)) == 0L, each = 3L)
a
#[1] NA NA NA 0 2 3 1 0 2 NA NA NA 0 1 2 NA NA NA
Note that I make the comparison with integers to indicate that if your vector is not an integer, you need to consider this FAQ.
Or using gl, ave and all
n <- length(a)
a[ave(!a, gl(n, 3, n), FUN=all)] <- NA
a
#[1] NA NA NA 0 2 3 1 0 2 NA NA NA 0 1 2 NA NA NA