Given a data frame in R how do I determine the number of non blank values per row.
col1 col2 col3 rowCounts
1 3 2
1 6 2
1 1
0
This is how I did it in python:
df['rowCounts'] = df.apply(lambda x: x.count(), axis=1)
What is the R Code for this?
In base R, we can use (assuming NA as blank) rowSums as a vectorized option on the logical matrix (!is.na(df)) where TRUE (->1 i.e. non-NA) values will be added for each row with rowSums
df$rowCounts <- rowSums(!is.na(df))
-output
df
# col1 col2 col3 rowCounts
#1 1 3 NA 2
#2 NA 1 6 2
#3 NA NA 1 1
#4 NA NA NA 0
If the blank is ""
df$rowCounts <- rowSums(df != "", na.rm = TRUE)
Or with apply and MARGIN = 1 as a similar syntax to Python (though it will be slower compared to rowSums)
df$rowCounts <- apply(df, 1, function(x) sum(!is.na(x)))
data
df <- structure(list(col1 = c(1L, NA, NA, NA), col2 = c(3L, 1L, NA,
NA), col3 = c(NA, 6L, 1L, NA)), class = "data.frame", row.names = c(NA,
-4L))
Related
I want to loop through one of the columns in my data frame and check a condition, then replace 0 or 1. The code is :
for (i in v$R){
if( is.na(v$R) ==TRUE ){v$V5 = 0}else{v$V5=1}
}
But I get an error. The data frame named 'v' is as follow. The V5 has NA values and I want to replace with 0 if values in R columns are NA, and else replace with 1. How can I do that?
A B R V5
1 2 3 NA
4 5 NA NA
You can try ifelse like below
df <- within(df,V5 <- ifelse(is.na(R),0,1))
or + (which converts logical value to numerical ones)
df <- within(df,V5 <- +!is.na(R))
such that
> df
A B R V5
1 1 2 3 1
2 4 5 NA 0
If you would like to use loops, you can try
for (i in seq_along(df$R)){
if( is.na(df$R[i]) ==TRUE ){df$V5[i] = 0}else{df$V5[i]=1}
}
DATA
df <- structure(list(A = c(1L, 4L), B = c(2L, 5L), R = c(3L, NA), V5 = c(NA,
NA)), class = "data.frame", row.names = c(NA, -2L))
Try this:
v$V5 <- ifelse(is.na(v$R), 0, 1)
I am trying to do rowSums but I got zero for the last row and I need it to be "NA".
My df is
a b c sum
1 1 4 7 12
2 2 NA 8 10
3 3 5 NA 8
4 NA NA NA NA
I used this code based on this link; Sum of two Columns of Data Frame with NA Values
df$sum<-rowSums(df[,c("a", "b", "c")], na.rm=T)
Any advice will be greatly appreciated
For each row check if it is all NA and if so return NA; otherwise, apply sum. We have selected columns a, b and c even though that is all the columns because the poster indicated that there might be additional ones.
sum_or_na <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
transform(df, sum = apply(df[c("a", "b", "c")], 1, sum_or_na))
giving:
a b c sum
1 1 4 7 12
2 2 NA 8 10
3 3 5 NA 8
4 NA NA NA NA
Note
df in reproducible form is assumed to be:
df <- structure(list(a = c(1L, 2L, 3L, NA), b = c(4L, NA, 5L, NA),
c = c(7L, 8L, NA, NA)),
row.names = c("1", "2", "3", "4"), class = "data.frame")
I have a dataset:data1 which have ME and PDR columns.
I want to create this third column: case which would look like this:
ME PDR case
1 2 2
NA 1 1
NA 1 1
1 2 2
NA NA NA
I tried to use this command but it doesn't return me 1 when I have 1 in either columns and no 2 in any of them.
data1$case=ifelse(data1$ME==2 | data1$PDR==2 ,2,ifelse(data1$ME==NA & data1$PDR==NA,NA,1))
We can use pmax
data1$case <- do.call(pmax, c(data1, na.rm = TRUE))
data1$case
#[1] 2 1 1 2 NA
Regarding the OP's case with NA, the == returns NA for any element that is an NA. So, we need to take care of the NA with adding a condition (& !is.na(ME) - for both columns)
with(data1, ifelse((ME == 2 & !is.na(ME)) | (PDR == 2 & !is.na(PDR)),
2, ifelse(is.na(ME) &is.na(PDR), NA, 1)))
#[1] 2 1 1 2 NA
NOTE: The == for checking NA is not recommended as there are functions to get a logical vector when there are missing values (is.na, complete.cases)
data
data1 <- structure(list(ME = c(1L, NA, NA, 1L, NA), PDR = c(2L, 1L, 1L,
2L, NA)), class = "data.frame", row.names = c(NA, -5L))
Say I have this data frame in R.
df <- data.frame( col1 = c(3,4,'NA','NA'), col2 = c('NA','NA',1,5))
col1 col2
1 3 NA
2 4 NA
3 NA 1
4 NA 5
I would like to have new column like this
col1 col2 col3
1 3 NA 3
2 4 NA 4
3 NA 1 1
4 NA 5 5
How shall I do that?
At the moment your df does not contains true NA but rather the strings 'NA'. You probably want to have true NA, as per #G5W comment.
Once we have true NA we can use:
df$col3 <- ifelse(is.na(df$col1), df$col2, df$col1)
or, with dplyr:
library(dplyr)
df$col3 <- coalesce(df$col1, df$col2)
We can use pmax or pmin to do this (from base R)
df$col3 <- do.call(pmax, c(df, na.rm=TRUE))
df$col3
#[1] 3 4 1 5
data
df <- structure(list(col1 = c(3L, 4L, NA, NA), col2 = c(NA, NA, 1L,
5L)), .Names = c("col1", "col2"), class = "data.frame", row.names = c("1",
"2", "3", "4"))
I have a dataset like this (but this is just a subset; the real dataset has hundreds of ID_Desc variables), where each data point has a person's gender, and whether they checked off a number of descriptors (1) or not (NA):
Gender ID1_Desc_1 ID1_Desc_2 ID1_Desc_3 ID2_Desc_1 ID2_Desc_2 ID2_Desc_3 ID3_Desc_1 ID3_Desc_2 ID3_Desc_3
1 NA NA 1 NA NA 1 NA NA NA
2 NA 1 1 NA NA NA 1 1 NA
1 1 1 1 NA 1 NA NA NA NA
I'm trying to write a loop that will (1) check their gender, (2) based on their gender, check whether they checked off the same descriptor in the first list they saw (lists ID1 and ID2 for Gender=1 and lists ID1 and ID3 for Gender=2), and (3) create a new variable (Same#) that indicates whether they checked off the same descriptor in both lists (by writing a 1) or not (by writing a 0).
I've been working with this code, which seems to be checking their gender ok and creating the new variables (Same#), but it's writing 0's for everything, which is not correct:
for (i in 1:3){
assign(paste("Same",i,sep=""),
ifelse(Gender=="1",
ifelse(paste("ID1_Desc_",i,sep="")==paste("ID2_Desc_",i,sep=""),1,0),
ifelse(paste("ID1_Desc_",i,sep="")==paste("ID3_Desc_",i,sep=""),1,0)
)
)
}
Based on the data I provided, Same1 should be 0 0 1 (since Gender=1 and they chose Desc_3 in both the ID1 and ID2 lists), Same2 should be 0 1 0 (since Gender=2 and they chose Desc_2 in both the ID1 and ID3 lists), and Same3 should be 0 1 0 (since Gender=1 and they chose Desc_2 in both the ID1 and ID2 lists) but right now, all 3 come out as 0 0 0.
I know using loops may not be the best way to do this, but I'd really like to know how to do it with loop if it's possible. If not, anything that works would be incredibly appreciated. Thanks.
You may try this
ind1 <- grep("^ID1", colnames(df))
ind2 <- grep("^ID2", colnames(df))
ind3 <- grep("^ID3", colnames(df))
cond1 <- do.call(cbind,Map(`==` , df[ind1], df[ind2]))
cond2 <- do.call(cbind,Map(`==` , df[ind1], df[ind3]))
Finalind <- do.call(cbind, Map(`|`, as.data.frame(t(cond1)),
as.data.frame(t(cond2))))
res <- (!is.na(Finalind))+0
rownames(res) <- paste0("Same", 1:3)
t(res)
# Same1 Same2 Same3
#V1 0 0 1
#V2 0 1 0
#V3 0 1 0
cbind(df, t(res))
data
df <- structure(list(Gender = c(1L, 2L, 1L), ID1_Desc_1 = c(NA, NA,
1L), ID1_Desc_2 = c(NA, 1L, 1L), ID1_Desc_3 = c(1L, 1L, 1L),
ID2_Desc_1 = c(NA, NA, NA), ID2_Desc_2 = c(NA, NA, 1L), ID2_Desc_3 = c(1L,
NA, NA), ID3_Desc_1 = c(NA, 1L, NA), ID3_Desc_2 = c(NA, 1L,
NA), ID3_Desc_3 = c(NA, NA, NA)), .Names = c("Gender", "ID1_Desc_1",
"ID1_Desc_2", "ID1_Desc_3", "ID2_Desc_1", "ID2_Desc_2", "ID2_Desc_3",
"ID3_Desc_1", "ID3_Desc_2", "ID3_Desc_3"), class = "data.frame",
row.names = c(NA, -3L))