Replace negative values by NA values - r

I have positive, negative and NA values in a Table, I need to replace negative values by NA values. Positive and NA values should remain as they are in Table. My Data set is similar to the one below:
NO. q
1 NA
2 NA
3 -133.6105198
4 -119.6991209
5 28.84460104
6 66.05345087
7 84.7058947
8 -134.4522694
9 NA
10 NA
11 73.20465643
12 -69.90723514
13 NA
14 69.70833003
15 65.27859906
I tried this:
if (q>0) {
q=NA
} else {
q=q
}

Or use replace:
> df$q2 <- replace(df$q, which(df$q < 0), NA)
> df
NO. q q2
1 1 NA NA
2 2 NA NA
3 3 -133.61052 NA
4 4 -119.69912 NA
5 5 28.84460 28.84460
6 6 66.05345 66.05345
7 7 84.70589 84.70589
8 8 -134.45227 NA
9 9 NA NA
10 10 NA NA
11 11 73.20466 73.20466
12 12 -69.90724 NA
13 13 NA NA
14 14 69.70833 69.70833
15 15 65.27860 65.27860
Or with data.table:
library(data.table)
setDT(df)[q < 0, q := NA]
Or with replace in a dplyr pipe:
library(dplyr)
df %>% mutate(q = replace(q, which(q<0), NA))

You could try this:
sample <- c(1, -2, NA)
sample[sample < 0] <- NA
sample
[1] 1 NA NA
Or if you're using a data.frame (suppose it's called df):
df$q[df$q < 0] <- NA

You could try
df1$q1 <- NA^(df1$q <0) * df1$q
df1
# NO. q q1
#1 1 NA NA
#2 2 NA NA
#3 3 -133.61052 NA
#4 4 -119.69912 NA
#5 5 28.84460 28.84460
#6 6 66.05345 66.05345
#7 7 84.70589 84.70589
#8 8 -134.45227 NA
#9 9 NA NA
#10 10 NA NA
#11 11 73.20466 73.20466
#12 12 -69.90724 NA
#13 13 NA NA
#14 14 69.70833 69.70833
#15 15 65.27860 65.27860
Or use ifelse
with(df1, ifelse(q < 0, NA, q))
Or
is.na(df1$q) <- df1$q < 0

Another way of accomplishing the same thing is (now I see this is ALMOST the same as another answer by akrun, sorry for that)
daf$q = ifelse(daf$q < 0, NA_real_, daf$q)

Related

Making the rows of a data frame to NAs using R

I have a data frame as follows,
aid=c(1:10)
x1_var=rnorm(10,0,1)
x2_var=rnorm(10,0,1)
x3_var=rbinom(10,1,0.5)
data=data.frame(aid,x1_var,x2_var,x3_var)
head(data)
aid x1_var x2_var x3_var
1 1 -0.99759448 -0.2882535 1
2 2 -0.12755695 -1.3706875 0
3 3 1.04709366 0.8977596 1
4 4 0.48883458 -0.1965846 1
5 5 -0.40264114 0.2925659 1
6 6 -0.08409966 -1.3489460 1
I want to make the all the rows in this data frame completely to NA if x3_var==1(without making aid column to NA)
I tried the following code.
> data[which(data$x3_var==1),]=NA
> data
aid x1_var x2_var x3_var
1 NA NA NA NA
2 2 -0.12755695 -1.3706875 0
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 8 -1.78160459 -1.8677633 0
9 9 -1.65895704 -0.8086148 0
10 10 -0.06281384 1.8888726 0
But this code have made the values of aid column also to NA. Can anybody help me to fix this?
Also are there any methods that do the same thing?
Thank you
Your code would work if you remove aid column from it.
data[which(data$x3_var==1),-1]=NA
You can also do this without which :
data[data$x3_var==1, -1]=NA
In the above two cases I am assuming that you know the position of aid column i.e 1. If in reality you don't know the position of the column you can use match to get it's position.
data[data$x3_var==1, -match('aid', names(data))] = NA
A dplyr solution. Assuming the columns to be altered begin with "x" as in the example data.
library(dplyr)
set.seed(1001)
df1 <- data.frame(aid = 1:10,
x1_var = rnorm(10,0,1),
x2_var = rnorm(10,0,1),
x3_var = rbinom(10,1,0.5))
df1 %>%
mutate(across(starts_with("x"), ~ifelse(x3_var == 1, NA, .x)))
aid x1_var x2_var x3_var
1 1 2.1886481 0.3026445 0
2 2 -0.1775473 1.6343924 0
3 3 NA NA NA
4 4 -2.5065362 0.4671611 0
5 5 NA NA NA
6 6 -0.1435595 0.1102652 0
7 7 NA NA NA
8 8 -0.6229437 -1.0302508 0
9 9 NA NA NA
10 10 NA NA NA

Wide format: a function to calculate row means for specific batches of columns, then scale up for multiple batches

This is a followup question to a previous post of mine about building a function for calculating row means.
I want to use any function of the apply family to iterate over my dataset and each time compute the row mean (which is what the function does) for a group of columns I specify. Unfortunately, I miss something critical in the way I should tweak apply(), because I get an error that I can't troubleshoot.
Example Data
capital_cities_df <-
data.frame("europe_paris" = 1:10,
"europe_london" = 11:20,
"europe_rome" = 21:30,
"asia_bangkok" = 31:40,
"asia_tokyo" = 41:50,
"asia_kathmandu" = 51:60)
set.seed(123)
capital_cities_df <- as.data.frame(lapply(capital_cities_df,
function(cc) cc[ sample(c(TRUE, NA),
prob = c(0.70, 0.30),
size = length(cc),
replace = TRUE) ]))
> capital_cities_df
europe_paris europe_london europe_rome asia_bangkok asia_tokyo asia_kathmandu
1 1 NA NA NA 41 NA
2 NA 12 22 NA 42 52
3 3 NA 23 33 43 NA
4 NA 14 NA NA NA NA
5 NA 15 25 35 45 NA
6 6 NA NA 36 NA 56
7 NA 17 NA NA NA 57
8 NA 18 NA 38 48 NA
9 NA 19 NA 39 49 NA
10 10 NA 30 40 NA 60
Custom Function
library(dplyr)
library(rlang)
continent_mean <- function(df, continent) {
df %>%
select(starts_with(continent)) %>%
dplyr::mutate(!!quo_name(continent) := rowMeans(., na.rm = TRUE))
}
## works for a single case:
continent_mean(capital_cities_df, "europe")
europe_paris europe_london europe_rome europe
1 1 NA 21 11
2 2 12 22 12
3 3 NA 23 13
4 4 14 NA 9
5 NA 15 25 20
6 6 16 26 16
7 NA 17 NA 17
8 NA 18 NA 18
9 NA 19 NA 19
10 10 20 30 20
Trying to apply the function over the data, unsuccessfully
apply(
capital_cities_df,
MARGIN = 2,
FUN = continent_mean(capital_cities_df, continent = "europe")
)
Error in match.fun(FUN) :
'continent_mean(capital_cities_df, continent = "europe")' is not a function, character or symbol
Any other combination of the arguments in apply() didn't work either, nor did sapply. This unsuccessful attempt of using apply is only for one type of columns I wish to get the mean for ("europe"). However, my ultimate goal is to be able to pass c("europe", "asia", etc.) with apply, so I could get the custom function to create row means columns for all groups of columns I specify, in one hit.
What is wrong with my code?
Thanks!
EDIT 19-AUG-2019
I was trying the solution suggested by A. Suliman (see below). It did work for the example data I posted here, but not when trying to scale it up to my real dataset, where I need to subset additional columns (rather than the "continent" batch only). More specifically, in my real data I have an ID column which I want to get outputted along the other data, when I apply my custom-made function.
Example data including "ID" column
capital_cities_df <- data.frame(
"europe_paris" = 1:10,
"europe_london" = 11:20,
"europe_rome" = 21:30,
"asia_bangkok" = 31:40,
"asia_tokyo" = 41:50,
"asia_kathmandu" = 51:60)
set.seed(123)
capital_cities_df <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA),
prob = c(0.70, 0.30),
size = length(cc),
replace = TRUE) ]))
id <- 1:10
capital_cities_df <- cbind(id, capital_cities_df)
> capital_cities_df
id europe_paris europe_london europe_rome asia_bangkok asia_tokyo asia_kathmandu
1 1 1 NA NA NA 41 NA
2 2 NA 12 22 NA 42 52
3 3 3 NA 23 33 43 NA
4 4 NA 14 NA NA NA NA
5 5 NA 15 25 35 45 NA
6 6 6 NA NA 36 NA 56
7 7 NA 17 NA NA NA 57
8 8 NA 18 NA 38 48 NA
9 9 NA 19 NA 39 49 NA
10 10 10 NA 30 40 NA 60
My function (edited to select id as well)
continent_mean <- function(df, continent) {
df %>%
select(., id, starts_with(continent)) %>%
dplyr::mutate(!!quo_name(continent) := rowMeans(., na.rm = TRUE))
}
> continent_mean(capital_cities_df, "europe") ## works in a single run
id europe_paris europe_london europe_rome europe
1 1 1 NA NA 1.000000
2 2 NA 12 22 12.000000
3 3 3 NA 23 9.666667
4 4 NA 14 NA 9.000000
5 5 NA 15 25 15.000000
6 6 6 NA NA 6.000000
7 7 NA 17 NA 12.000000
8 8 NA 18 NA 13.000000
9 9 NA 19 NA 14.000000
10 10 10 NA 30 16.666667
Trying to apply the function beyond the single use (based on A. Suliman's method) -- unsuccessfully
continents <- c("europe", "asia")
lst <- lapply(continents, function(x) continent_mean(df=capital_cities_df[, grep(x, names(capital_cities_df))], continent=x))
## or:
purrr::map_dfc(continents, ~continent_mean(df=capital_cities_df[, grep(.x, names(capital_cities_df))], continent=.x))
In either case I get a variety of error messages:
Error in inds_combine(.vars, ind_list) : Position must be between 0
and n
At other times:
Error: invalid column index : NA for variable: 'NA' = 'NA'
All I wanted was a simple function to let me calculate row means per specification of which columns to run over, but this gets nasty for some reason. Even though I'm eager to figure out what's wrong with my code, if anybody has a better overarching solution for the entire process I'd be thankful too.
Thanks!
Use lapply to loop through continents then use grep to select columns with the current continent
continents <- c("europe", "asia")
lst <- lapply(continents, function(x) continent_mean(df=capital_cities_df[, grep(x, names(capital_cities_df))], continent=x))
#To a dataframe not a list
do.call(cbind, lst)
Using map_dfc from purrr we can get the result in one step
purrr::map_dfc(continents, ~continent_mean(df=capital_cities_df[, grep(.x, names(capital_cities_df))], continent=.x))
Update:
#grep will return column positions when they match with "europe" or "asia", e.g
> grep("europe", names(capital_cities_df))
[1] 2 3 4
#If we need the column names then we add value=TRUE to grep
> grep("europe", names(capital_cities_df), value = TRUE)
[1] "europe_paris" "europe_london" "europe_rome"
So to add a new column we can just use the c() function and call the function as usual
#NOTE: Here I'm using the old function without select
lst <- lapply(continents, function(x) continent_mean(df=capital_cities_df[, c('id',grep(x, names(capital_cities_df), value = TRUE))], continent=x))
do.call(cbind, lst)
id europe_paris europe_london europe_rome europe id asia_bangkok asia_tokyo asia_kathmandu asia
1 1 1 NA NA 1.00000 1 NA 41 51 31.00000
2 2 NA 12 22 12.00000 2 NA 42 52 32.00000
3 3 3 13 23 10.50000 3 33 43 NA 26.33333
4 4 NA 14 NA 9.00000 4 NA 44 54 34.00000
5 5 NA 15 25 15.00000 5 35 45 55 35.00000
6 6 6 NA NA 6.00000 6 36 46 56 36.00000
7 7 7 17 27 14.50000 7 NA 47 57 37.00000
8 8 NA 18 28 18.00000 8 38 48 NA 31.33333
9 9 9 19 29 16.50000 9 39 49 NA 32.33333
10 10 10 NA 30 16.66667 10 40 NA 60 36.66667
#We have one problem, id column gets duplicated, map_dfc with select will solve this issue
purrr::map_dfc(continents, ~continent_mean(df=capital_cities_df[, c('id',grep(.x, names(capital_cities_df), value = TRUE))], continent=.x)) %>%
#Don't select any column name ends with id followed by one digit
select(-matches('id\\d'))
If you'd like to use the new function with select then just pass capital_cities_df without grep, e.g using map_dfc
purrr::map_dfc(continents, ~continent_mean(df=capital_cities_df, continent=.x)) %>%
select(-matches('id\\d'))
Correction: in continent_mean
continent_mean <- function(df, continent) {
df %>%
select(., id, starts_with(continent)) %>%
#Exclude id from the rowMeans calculation
dplyr::mutate(!!quo_name(continent) := rowMeans(.[grep(continent, names(.))], na.rm = TRUE))
}

make an NA prior a none NA value (1) a 0 using R

I've got the following data frame df
time <- c("01/01/1951", "02/01/1951", "03/01/1951", "04/01/1951", "05/01/1951", "06/01/1951", "07/01/1951", "08/01/1951", "09/01/1951", "10/01/1951", "11/01/1951", "12/01/1951", "13/01/1951", "14/01/1951", "15/01/1951", "16/01/1951", "17/01/1951", "18/01/1951", "19/01/1951", "20/01/1951", "21/01/1951", "22/01/1951", "23/01/1951")
member <- c(1,NA,NA,3,NA,NA,NA,NA,NA,1,1,NA,2,NA,NA,NA,NA,NA,1,NA,NA,NA,NA)
df <- data.frame(time, member)
df$time = as.Date(df$time,format="%d/%m/%Y")
I like the day with an NA value for "member" before a day where member is 1 to become a 0, UNLESS there is a 1 on the day before a 1 (two consecutive ones), I wouldnt want the 1 to become a 0, just the NA values before a 1.
the desired data frame would be:
df
time member
1 01/01/1951 1
2 02/01/1951 NA
3 03/01/1951 NA
4 04/01/1951 3
5 05/01/1951 NA
6 06/01/1951 NA
7 07/01/1951 NA
8 08/01/1951 NA
9 09/01/1951 0
10 10/01/1951 1
11 11/01/1951 1
12 12/01/1951 NA
13 13/01/1951 2
14 14/01/1951 NA
15 15/01/1951 NA
16 16/01/1951 NA
17 17/01/1951 NA
18 18/01/1951 0
19 19/01/1951 1
20 20/01/1951 NA
21 21/01/1951 NA
22 22/01/1951 NA
23 23/01/1951 NA
ideas?
So we need to check if df$member is NA and the next value is 1. When both of those are true, we set df$member equal to 0:
df$member[is.na(df$member) & c(df$member[-1] == 1, FALSE)] = 0
df
# time member
# 1 1951-01-01 1
# 2 1951-01-02 NA
# 3 1951-01-03 NA
# 4 1951-01-04 3
# 5 1951-01-05 NA
# 6 1951-01-06 NA
# 7 1951-01-07 NA
# 8 1951-01-08 NA
# 9 1951-01-09 0
# 10 1951-01-10 1
# 11 1951-01-11 1
# 12 1951-01-12 NA
# 13 1951-01-13 2
# 14 1951-01-14 NA
# 15 1951-01-15 NA
# 16 1951-01-16 NA
# 17 1951-01-17 NA
# 18 1951-01-18 0
# 19 1951-01-19 1
# 20 1951-01-20 NA
# 21 1951-01-21 NA
# 22 1951-01-22 NA
# 23 1951-01-23 NA

Add an integer to every element of data frame

Say I have a data frame as follows
rsi5 rsi10
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 44.96650 NA
7 39.68831 NA
8 28.35625 NA
9 37.77910 NA
10 53.54822 NA
11 52.05308 46.01867
12 80.44368 66.09973
13 60.88418 56.04507
14 53.59851 52.10633
15 46.45874 48.23648
I wish to simply add 1 (i.e. 9 becomes 10) to each non-NA element of this data frame. There is probably a very simple solution to this but simple arithmetics on dataframes do not seem to work in R giving very strange results.
Just use + 1 as you would expect. Below is a mock example as it wasn't worth copying your data for for this.
Step One: Create a data.frame
R> df <- data.frame(A=c(NA, 1, 2, 3), B=c(NA, NA, 12, 13))
R> df
A B
1 NA NA
2 1 NA
3 2 12
4 3 13
R>
Step Two: Add one
R> df + 1
A B
1 NA NA
2 2 NA
3 3 13
4 4 14
R>

rowMean if row passes a test

I'm working on a data set where the source name is specified by a 2-letter abbreviation in front of the variable. So all variables from source AA start with AA_var1, and source bb has bb_variable_name_2. There are actually a lot of sources, and a lot of variable names, but I leave only 2 as a minimal example.
I want to create a mean variable for any row where the number of sources, that is, where the number of unique prefixes for which the data on that row is not NA, is greater than 1. If there's only one source, I want that total variable to be NA.
So, for example, my data looks like this:
> head(df)
AA_var1 AA_var2 myid bb_meow bb_A_v1
1 NA NA 123456 10 12
2 NA 10 194200 12 NA
3 12 10 132200 NA NA
4 12 NA 132201 NA 12
5 NA NA 132202 NA NA
6 12 13 132203 14 NA
And I want the following:
> head(df)
AA_var1 AA_var2 myid bb_meow bb_A_v1 rowMeanIfDiverseData
1 NA NA 123456 10 12 NA #has only bb
2 NA 10 194200 12 NA 11 #has AA and bb
3 12 10 132200 NA NA NA #has only AA
4 12 NA 132201 NA 12 12 #has AA and bb
5 NA NA 132202 NA NA NA #has neither
6 12 13 132203 14 NA 13 #has AA and bb
Normally, I just use rowMeans() for this kind of thing. But the additional subsetting of selecting only rows whose variable names follow a convention /at the row level/ has caught me confused between the item-level and the general apply-level statements I'm used to.
I can get the prefixes at the dataframe level:
mynames <- names(df[!names(df) %in% c("myid")])
tmp <- str_extract(mynames, perl("[A-Za-z]{2}(?=_)"))
uniq <- unique(tmp[!is.na(tmp)])
So,
> uniq
[1] "AA" "bb"
So, I can make this a function I can apply to df like so:
multiSource <- function(x){
nm = names(x[!names(x) %in% badnames]) # exclude c("myid")
tmp <- str_extract(nm, perl("[A-Za-z]{2}(?=_)")) # get prefixes
uniq <- unique(tmp[!is.na(tmp)]) # ensure unique and not NA
if (length(uniq) > 1){
return(T)
} else {
return(F)
}
}
But this is clearly confused, and still getting data-set level, ie:
> lapply(df,multiSource)
$AA_var1
[1] FALSE
$AA_var2
[1] FALSE
$bb_meow
[1] FALSE
$bb_A_v1
[1] FALSE
And...
> apply(df,MARGIN=1,FUN=multiSource)
Gives TRUE for all.
I'd otherwise like to be saying...
df$rowMean <- rowMeans(df, na.rm=T)
# so, in this case
rowMeansIfTest <- function(X,test) {
# is this row muliSource True?
# if yes, return(rowMeans(X))
# else return(NA)
}
df$rowMeanIfDiverseData <- rowMeansIfTest(df, test=multiSource)
But it is unclear to me how to do this without some kind of for loop.
The strategy here is to split the data frame by columns into variable groups, and for each row identifying if there are non-NA values. We then check with rowsums to make sure there are at least two variables with non-NA values for a row, and if so, add the mean of those values with cbind.
This will generalize to any number of columns so long as they are named in the AA_varXXX format, and so long as the only column not in that format is myid. Easy enough to fix if this isn't strictly the case, but these are the limitations on the code as written now.
df.dat <- df[!names(df) == "myid"]
diverse.rows <- rowSums(
sapply(
split.default(df.dat, gsub("^([A-Z]{2})_var.*", "\\1", names(df.dat))),
function(x) apply(x, 1, function(y) any(!is.na(y)))
) ) > 1
cbind(df, div.mean=ifelse(diverse.rows, rowMeans(df.dat, na.rm=T), NA))
Produces:
AA_var1 AA_var2 myid BB_var3 BB_var4 div.mean
1 NA NA 123456 10 12 NA
2 NA 10 194200 12 NA 11
3 12 10 132200 NA NA NA
4 12 NA 132201 NA 12 12
5 NA NA 132202 NA NA NA
6 12 13 132203 14 NA 13
This solution seems a little convoluted to me, so there's probably a better way, but it should work for you.
# Here's your data:
df <- data.frame(AA_var1 = c(NA,NA,12,12,NA,12),
AA_var2 = c(NA,10,10,NA,NA,13),
BB_var3 = c(10,12,NA,NA,NA,14),
BB_var4 = c(12,NA,NA,12,NA,NA))
# calculate rowMeans for each subset of variables
a <- rowMeans(df[,grepl('AA',names(df))], na.rm=TRUE)
b <- rowMeans(df[,grepl('BB',names(df))], na.rm=TRUE)
# count non-missing values for each subset of variables
a2 <- rowSums(!is.na(df[,grepl('AA',names(df))]), na.rm=TRUE)
b2 <- rowSums(!is.na(df[,grepl('BB',names(df))]), na.rm=TRUE)
# calculate means:
rowSums(cbind(a*a2,b*b2)) /
rowSums(!is.na(df[,grepl('[AA]|[BB]',names(df))]), na.rm=TRUE)
Result:
> df$rowMeanIfDiverseData <- rowSums(cbind(a*a2,b*b2)) /
+ rowSums(!is.na(df[,grepl('[AA]|[BB]',names(df))]), na.rm=TRUE)
> df
AA_var1 AA_var2 BB_var3 BB_var4 rowMeanIfDiverseData
1 NA NA 10 12 NaN
2 NA 10 12 NA 11
3 12 10 NA NA NaN
4 12 NA NA 12 12
5 NA NA NA NA NaN
6 12 13 14 NA 13
And a little cleanup to exactly match your intended output:
> df$rowMeanIfDiverseData[is.nan(df$rowMeanIfDiverseData)] <- NA
> df
AA_var1 AA_var2 BB_var3 BB_var4 rowMeanIfDiverseData
1 NA NA 10 12 NA
2 NA 10 12 NA 11
3 12 10 NA NA NA
4 12 NA NA 12 12
5 NA NA NA NA NA
6 12 13 14 NA 13
My attempt, somewhat longwinded.....
dat<-data.frame(AA_var1=c(NA,NA,12,12,NA,12),
AA_var2=c(NA,10,10,NA,NA,13),
myid=1:6,
BB_var3=c(10,12,NA,NA,NA,14),
BB_var4=c(12,NA,NA,12,NA,NA))
#what columns are associated with variables used in our mean
varcols<-grep("*var[1-9]",names(dat),value=T)
#which rows have the requisite diversification of non-nulls
#i assume these columns will start with capitals and folloowed by underscore
meanrow<-apply(!is.na(dat[,varcols]),1,function(x){n<-varcols[x]
1<length(unique(regmatches(n,regexpr("[A-Z]+_",n))))
})
#do the row mean for all
dat$meanval<-rowMeans(dat[,varcols],na.rm=T)
#null out for those without diversification (i.e. !meanrow)
dat[!meanrow,"meanval"]<-NA
I think some of the answers are making this seem more complicated than it is. This will do it:
df$means = ifelse(rowSums(!is.na(df[, grep('AA_var', names(df))])) &
rowSums(!is.na(df[, grep('BB_var', names(df))])),
rowMeans(df[, grep('_var', names(df))], na.rm = T), NA)
# AA_var1 AA_var2 myid BB_var3 BB_var4 means
#1 NA NA 123456 10 12 NA
#2 NA 10 194200 12 NA 11
#3 12 10 132200 NA NA NA
#4 12 NA 132201 NA 12 12
#5 NA NA 132202 NA NA NA
#6 12 13 132203 14 NA 13
Here's a generalization of the above, given the comment, assuming unique id's (if they're not, create a unique index instead):
library(data.table)
library(reshape2)
dt = data.table(df)
setkey(dt, myid) # not strictly necessary, but makes life easier
# find the conditional
cond = melt(dt, id.var = 'myid')[,
sum(!is.na(value)), by = list(myid, sub('_var.*', '', variable))][,
all(V1 != 0), keyby = myid]$V1
# fill in the means (could also do a join, but will rely on ordering instead)
dt[cond, means := rowMeans(.SD, na.rm = T), .SDcols = grep('_var', names(dt))]
dt
# AA_var1 AA_var2 myid BB_var3 BB_var4 means
#1: NA NA 123456 10 12 NA
#2: 12 10 132200 NA NA NA
#3: 12 NA 132201 NA 12 12
#4: NA NA 132202 NA NA NA
#5: 12 13 132203 14 NA 13
#6: NA 10 194200 12 NA 11
fun <- function(x) {
MEAN <- mean(c(x[1], x[2], x[4], x[5]), na.rm=TRUE)
CHECK <- sum(!is.na(c(x[1], x[2]))) > 0 & sum(!is.na(c(x[4], x[5])) > 0)
MEAN * ifelse(CHECK, 1, NaN)
}
df$rowMeanIfDiverseData <- apply(df, 1, fun)
df
AA_var1 AA_var2 myid BB_var3 BB_var4 rowMeanIfDiverseData
1 NA NA 123456 10 12 NaN
2 NA 10 194200 12 NA 11
3 12 10 132200 NA NA NaN
4 12 NA 132201 NA 12 12
5 NA NA 132202 NA NA NaN
6 12 13 132203 14 NA 13

Resources