lets say data is 'ab':
a <- c(1,2,3,NA,5,NA)
b <- c(5,NA,4,NA,NA,6)
ab <-c(a,b)
I would like to have new variable which is sum of the two but keeping NA's as follows:
desired output:
ab$c <-(6,2,7,NA,5,6)
so addition of number + NA should equal number
I tried following but does not work as desired:
ab$c <- a+b
gives me : 6 NA 7 NA NA NA
Also don't know how to include "na.rm=TRUE", something I was trying.
I would also like to create third variable as categorical based on cutoff <=4 then event 1, otherwise 0:
desired output:
ab$d <-(1,1,1,NA,0,0)
I tried:
ab$d =ifelse(ab$a<=4|ab$b<=4,1,0)
print(ab$d)
gives me logical(0)
Thanks!
a <- c(1,2,3,NA,5,NA)
b <- c(5,NA,4,NA,NA,6)
dfd <- data.frame(a,b)
dfd$c <- rowSums(dfd, na.rm = TRUE)
dfd$c <- ifelse(is.na(dfd$a) & is.na(dfd$b), NA_integer_, dfd$c)
dfd$d <- ifelse(dfd$c >= 4, 1, 0)
dfd
a b c d
1 1 5 6 1
2 2 NA 2 0
3 3 4 7 1
4 NA NA NA NA
5 5 NA 5 1
6 NA 6 6 1
Suppose we have a vector:
x<-c(1,3,4,6,7)
And we have another vector that specifies the positions of NAs:
NAs<-c(2,5)
How can I add NA to the vector x in the 2nd and 5th index so x becomes
x
1 NA 3 4 NA 6 7
Thanks!
Do you want this?
> replace(sort(c(x, NAs)), NAs, NA)
[1] 1 NA 3 4 NA 6 7
or a safer solution
> v <- c(x, NAs)
> replace(rep(NA, length(v)), !seq_along(v) %in% NAs, x)
[1] 1 NA 3 4 NA 6 7
With a for loop, using append:
for (i in sort(NAs)) x <- append(x, NA, after = i - 1)
#[1] 1 NA 3 4 NA 6 7
I have a date frame like this
individual <- c("1",NA,NA,NA,NA,NA,NA,NA,"1","1")
x <- c(665,NA,NA,NA,NA,NA,NA,NA,663,665)
y <- c(-474.5,NA,NA,NA,NA,NA,NA,NA,-474.5,-472.5)
frame <- rep(1:10)
df <- data.frame(individual,x,y,frame)
I have an ID column labeled 'individual', xy coordinates, and a frame number.
I need to calculate the euclidean distances for the x,y coordinates between rows but over the NA values.
So, in the example I gave - I would need to calculate the distances between rows 1 and 9, as well as 10 and 9. In the real data there would be substantially more rows of course.
Eventually what I need to do is interpolate the data, so that if the euclidean distance is <5, fill in the data rows that are missing with the ID of the individual. If the euclidean distance is >5, then ignore and interpolate nothing.
Here is the example result data frame that's needed:
individual <- c("1","1","1","1","1","1","1","1","1","1")
x <- c(665,NA,NA,NA,NA,NA,NA,NA,663,665)
y <- c(-474.5,NA,NA,NA,NA,NA,NA,NA,-474.5,-472.5)
frame <- rep(1:10)
dist_measure <- c(NA,NA,NA,NA,NA,NA,NA,NA,2,2.828427)
df <- data.frame(individual,x,y,frame,dist_measure)
Any advice on an approach to this problem is greatly appreciated. My first thought was to have a function that calculates Euclidean distance and put it in a for loop. But I'm a bit stuck on how to work this over the NA values. I thought somehow using the lag function in the tidyverse would help, but not sure again how to integrate that into the loop/function.
Thank you in advance.
This should work. I've added another individual into the hypothetical data to show how it works.
individual <- c("1",NA,NA,NA,NA,NA,NA,NA,"1","1",
"2",NA,NA,NA,NA,NA,NA,NA,"2","2")
x <- c(665,NA,NA,NA,NA,NA,NA,NA,663,665,
.665,NA,NA,NA,NA,NA,NA,NA,.663,.665)
y <- c(-474.5,NA,NA,NA,NA,NA,NA,NA,-474.5,-472.5,
-.4745,NA,NA,NA,NA,NA,NA,NA,-.4745,-.4725)
frame <- rep(1:10, 2)
df <- data.frame(individual,x,y,frame)
for(i in 1:2){
tmp <- df[min(which(df$individual == as.character(i))):
max(which(df$individual == as.character(i))), ]
ends <- range(which(is.na(tmp$individual))) + c(-1,1)
if(nrow(tmp) > 1 & ends[1] > 0 & ends[2] <= nrow(tmp)){
d <- c(dist(tmp[ends, c("x", "y")]))
if(d < 5){
df$individual[min(which(df$individual == as.character(i))):
max(which(df$individual == as.character(i)))] <- tmp$individual[ends[1]]
}
}
}
df
# individual x y frame
# 1 1 665.000 -474.5000 1
# 2 1 NA NA 2
# 3 1 NA NA 3
# 4 1 NA NA 4
# 5 1 NA NA 5
# 6 1 NA NA 6
# 7 1 NA NA 7
# 8 1 NA NA 8
# 9 1 663.000 -474.5000 9
# 10 1 665.000 -472.5000 10
# 11 2 0.665 -0.4745 1
# 12 2 NA NA 2
# 13 2 NA NA 3
# 14 2 NA NA 4
# 15 2 NA NA 5
# 16 2 NA NA 6
# 17 2 NA NA 7
# 18 2 NA NA 8
# 19 2 0.663 -0.4745 9
# 20 2 0.665 -0.4725 10
I'd like to translate the following Stata loop to R:
foreach day of numlist 1/7 {;
replace dywt = 1/7 * 1/Freq[`day',1] if interview_day==`day';
}
Data (R Output):
> INTERVIEW_DAY[1:15]
[1] 5 6 6 4 4 4 1 2 6 4 6 7 6 3 6
> Freq
[1] 0.14353969 0.14795762 0.14089618 0.14074198 0.14194271 0.14295769 0.14196413
> F
[1] 20720
> DYWT[1:15]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Thank you in advance.
In R, if all of them are vectors, then the equivalent would be to just replace the NA vector ('DYWT') by getting corresponding 'Freq' for each sequence value of 'INTERVIEW_DAY' (Freq[INTERVIEW_DAY] - as INTERVIEW_DAY is a sequence of numeric vector which can be used as position vector for 'Freq'), divide by 1, and multiply with 1/max(INTERVIEW_DAY)
DYWT <- 1/max(INTERVIEW_DAY) * 1/Freq[INTERVIEW_DAY]
Or if it is based on the number of unique elements, it can be also
DYWT <- 1/length(unique(INTERVIEW_DAY)) * 1/Freq[INTERVIEW_DAY]
or it is 1/7 where 7 is the number of unique elements in 'INTERVIEW_DAY' (if some of the index are missing, then it may be better to use 1/7)
data
INTERVIEW_DAY <- scan(text = '5 6 6 4 4 4 1 2 6 4 6 7 6 3 6', what = integer())
Freq <- scan(text = '0.14353969 0.14795762 0.14089618 0.14074198 0.14194271 0.14295769 0.14196413', what = numeric())
This question already has answers here:
ifelse matching vectors in r
(2 answers)
Closed 9 years ago.
I have a dataframe that looks like this:
> df<-data.frame(A=c(NA,1,2,3,4),B=c(NA,5,NA,3,4),C=c(NA,NA,NA,NA,4))
> df
A B C
1 NA NA NA
2 1 5 NA
3 2 NA NA
4 3 3 NA
5 4 4 4
I am trying to create a "D" column based on the row values in df, where D gets an NA if the values in the row are different (i.e. row 2) or all NAs (i.e. row 1), and the value in the row if the values in that row are the same, excluding NAs (i.e. rows 3, 4, 5). This would produce a vector and dataframe that looks like this:
> df$D<-c(NA,NA,2,3,4)
> df
A B C D
1 NA NA NA NA
2 1 5 NA NA
3 2 NA NA 2
4 3 3 NA 3
5 4 4 4 4
Thank you in advance for your suggestions.
You can use apply() to do calculation for each row and then use unique() and !is.na(). With !is.na() you select values that are not NA. With unique() you get unique values and then with length() get number of unique values. If number is 1 then use first non NA value, if not then NA.
df$D<-apply(df,1,function(x)
ifelse(length(unique(x[!is.na(x)]))==1,x[!is.na(x)][1],NA))
Here is one possible approach:
FUN <- function(x) {
no.na <- x[!is.na(x)]
len <- length(no.na)
if (len == 0) return(NA)
if (len == 1) return(no.na)
runs <- rle(no.na)[[2]]
if(length(runs) > 1) return(NA)
runs
}
df$D <- apply(df, 1, FUN)
## > df
## A B C D
## 1 NA NA NA NA
## 2 1 5 NA NA
## 3 2 NA NA 2
## 4 3 3 NA 3
## 5 4 4 4 4