I want to loop through one of the columns in my data frame and check a condition, then replace 0 or 1. The code is :
for (i in v$R){
if( is.na(v$R) ==TRUE ){v$V5 = 0}else{v$V5=1}
}
But I get an error. The data frame named 'v' is as follow. The V5 has NA values and I want to replace with 0 if values in R columns are NA, and else replace with 1. How can I do that?
A B R V5
1 2 3 NA
4 5 NA NA
You can try ifelse like below
df <- within(df,V5 <- ifelse(is.na(R),0,1))
or + (which converts logical value to numerical ones)
df <- within(df,V5 <- +!is.na(R))
such that
> df
A B R V5
1 1 2 3 1
2 4 5 NA 0
If you would like to use loops, you can try
for (i in seq_along(df$R)){
if( is.na(df$R[i]) ==TRUE ){df$V5[i] = 0}else{df$V5[i]=1}
}
DATA
df <- structure(list(A = c(1L, 4L), B = c(2L, 5L), R = c(3L, NA), V5 = c(NA,
NA)), class = "data.frame", row.names = c(NA, -2L))
Try this:
v$V5 <- ifelse(is.na(v$R), 0, 1)
Related
Given a data frame in R how do I determine the number of non blank values per row.
col1 col2 col3 rowCounts
1 3 2
1 6 2
1 1
0
This is how I did it in python:
df['rowCounts'] = df.apply(lambda x: x.count(), axis=1)
What is the R Code for this?
In base R, we can use (assuming NA as blank) rowSums as a vectorized option on the logical matrix (!is.na(df)) where TRUE (->1 i.e. non-NA) values will be added for each row with rowSums
df$rowCounts <- rowSums(!is.na(df))
-output
df
# col1 col2 col3 rowCounts
#1 1 3 NA 2
#2 NA 1 6 2
#3 NA NA 1 1
#4 NA NA NA 0
If the blank is ""
df$rowCounts <- rowSums(df != "", na.rm = TRUE)
Or with apply and MARGIN = 1 as a similar syntax to Python (though it will be slower compared to rowSums)
df$rowCounts <- apply(df, 1, function(x) sum(!is.na(x)))
data
df <- structure(list(col1 = c(1L, NA, NA, NA), col2 = c(3L, 1L, NA,
NA), col3 = c(NA, 6L, 1L, NA)), class = "data.frame", row.names = c(NA,
-4L))
Suppose I have a data frame like this:
1 8
2 12
3 2
5 -6
6 1
8 5
I want to add a row in the places where the 4 and 7 would have gone in the first column and have the second column for these new rows be 0, so adding these rows:
4 0
7 0
I have no idea how to do this in R.
In excel, I could use a vlookup inside an iferror. Is there a similar combo of functions in R to make this happen?
Edit: also, suppose that row 1 was missing and needed to be filled in similarly. Would this require another solution? What if I wanted to add rows until I reached ten rows?
Use tidyr::complete to fill in the missing sequence between min and max values.
library(tidyr)
library(rlang)
complete(df, V1 = min(V1):max(V1), fill = list(V2 = 0))
#Or using `seq`
#complete(df, V1 = seq(min(V1), max(V1)), fill = list(V2 = 0))
# V1 V2
# <int> <dbl>
#1 1 8
#2 2 12
#3 3 2
#4 4 0
#5 5 -6
#6 6 1
#7 7 0
#8 8 5
If we already know min and max of the dataframe we can use them directly. Let's say we want data from V1 = 1 to 10, we can do.
complete(df, V1 = 1:10, fill = list(V2 = 0))
If we don't know the column names beforehand, we can do something like :
col1 <- names(df)[1]
col2 <- names(df)[2]
complete(df, !!sym(col1) := 1:10, fill = as.list(setNames(0, col2)))
data
df <- structure(list(V1 = c(1L, 2L, 3L, 5L, 6L, 8L), V2 = c(8L, 12L,
2L, -6L, 1L, 5L)), class = "data.frame", row.names = c(NA, -6L))
Say I have a list c of three data frames:
> c
$first
a b
1 1 2
2 2 3
3 3 4
$second
a b
1 2 4
2 4 6
3 6 8
$third
a b
1 3 6
2 6 9
3 9 12
I want to run an lapply on c that will do a custom function on each data frame.
The custom function depends on three numbers and I want the function to use a different number depending on which data frame it's evaluating.
I was thinking of utilizing the names 'first', 'second', and 'third', but I'm unsure how to get those names once they're inside the lapply function. It would look something like this:
lapply(c, function(list, num1 = 1, num2 = -1, num3 = 0) {num <- ifelse(names(list) == "first", num1, ifelse(names(list) == "second", num2, num3)); return(list*num)})
So the result I would want would be first multiplied by 1, second multiplied by -1, and third multiplied by 0.
The names function gives the values a and b (the column names) instead of the name of the data frame itself, so that doesn't work. Is there a function that would be able to give me the 'first', 'second', and 'third' values I need?
Or alternatively, is there a better way of doing this in a lapply function?
May be, it would be easier with Map. We pass the number of interest in the order we want and do a simple multiplication
Map(`*`, lst1, c(1, -1, 0))
If the numbers are named
num1 <- setNames(c(1, -1, 0), c("first", "third", "second"))
then, match with the names of the list
Map(`*`, lst1, num1[names(lst1)])
#$first
# a b
#1 1 2
#2 2 3
#3 3 4
#$second
# a b
#1 0 0
#2 0 0
#3 0 0
#$third
# a b
#1 -3 -6
#2 -6 -9
#3 -9 -12
Or if we decide to go with lapply, loop over the names of the list , extract the list element based on the name as well as the corresponding vector element (named vector)
lapply(names(lst1), function(nm) lst1[[nm]] * num1[nm])
Or with sapply
sapply(names(lst1), function(nm) lst1[[nm]] * num1[nm], simplify = FALSE)
Or another option is map2 from purrr
library(purrr)
map2(lst1, num1[names(lst1)], `*`)
Note: c is a function name and it is not recommended to create object names with function names
data
lst1 <- list(first = structure(list(a = 1:3, b = 2:4), class = "data.frame",
row.names = c("1",
"2", "3")), second = structure(list(a = c(2L, 4L, 6L), b = c(4L,
6L, 8L)), class = "data.frame", row.names = c("1", "2", "3")),
third = structure(list(a = c(3L, 6L, 9L), b = c(6L, 9L, 12L
)), class = "data.frame", row.names = c("1", "2", "3")))
Besides the solutions by #akrun, you can also try the following code
mapply(`*`, lst1, c(1, -1, 0),SIMPLIFY = F)
or
lapply(seq_along(lst1), function(k) lst1[[k]]*c(1,-1,0)[k])
I have a dataset:data1 which have ME and PDR columns.
I want to create this third column: case which would look like this:
ME PDR case
1 2 2
NA 1 1
NA 1 1
1 2 2
NA NA NA
I tried to use this command but it doesn't return me 1 when I have 1 in either columns and no 2 in any of them.
data1$case=ifelse(data1$ME==2 | data1$PDR==2 ,2,ifelse(data1$ME==NA & data1$PDR==NA,NA,1))
We can use pmax
data1$case <- do.call(pmax, c(data1, na.rm = TRUE))
data1$case
#[1] 2 1 1 2 NA
Regarding the OP's case with NA, the == returns NA for any element that is an NA. So, we need to take care of the NA with adding a condition (& !is.na(ME) - for both columns)
with(data1, ifelse((ME == 2 & !is.na(ME)) | (PDR == 2 & !is.na(PDR)),
2, ifelse(is.na(ME) &is.na(PDR), NA, 1)))
#[1] 2 1 1 2 NA
NOTE: The == for checking NA is not recommended as there are functions to get a logical vector when there are missing values (is.na, complete.cases)
data
data1 <- structure(list(ME = c(1L, NA, NA, 1L, NA), PDR = c(2L, 1L, 1L,
2L, NA)), class = "data.frame", row.names = c(NA, -5L))
I have a dataset like this (but this is just a subset; the real dataset has hundreds of ID_Desc variables), where each data point has a person's gender, and whether they checked off a number of descriptors (1) or not (NA):
Gender ID1_Desc_1 ID1_Desc_2 ID1_Desc_3 ID2_Desc_1 ID2_Desc_2 ID2_Desc_3 ID3_Desc_1 ID3_Desc_2 ID3_Desc_3
1 NA NA 1 NA NA 1 NA NA NA
2 NA 1 1 NA NA NA 1 1 NA
1 1 1 1 NA 1 NA NA NA NA
I'm trying to write a loop that will (1) check their gender, (2) based on their gender, check whether they checked off the same descriptor in the first list they saw (lists ID1 and ID2 for Gender=1 and lists ID1 and ID3 for Gender=2), and (3) create a new variable (Same#) that indicates whether they checked off the same descriptor in both lists (by writing a 1) or not (by writing a 0).
I've been working with this code, which seems to be checking their gender ok and creating the new variables (Same#), but it's writing 0's for everything, which is not correct:
for (i in 1:3){
assign(paste("Same",i,sep=""),
ifelse(Gender=="1",
ifelse(paste("ID1_Desc_",i,sep="")==paste("ID2_Desc_",i,sep=""),1,0),
ifelse(paste("ID1_Desc_",i,sep="")==paste("ID3_Desc_",i,sep=""),1,0)
)
)
}
Based on the data I provided, Same1 should be 0 0 1 (since Gender=1 and they chose Desc_3 in both the ID1 and ID2 lists), Same2 should be 0 1 0 (since Gender=2 and they chose Desc_2 in both the ID1 and ID3 lists), and Same3 should be 0 1 0 (since Gender=1 and they chose Desc_2 in both the ID1 and ID2 lists) but right now, all 3 come out as 0 0 0.
I know using loops may not be the best way to do this, but I'd really like to know how to do it with loop if it's possible. If not, anything that works would be incredibly appreciated. Thanks.
You may try this
ind1 <- grep("^ID1", colnames(df))
ind2 <- grep("^ID2", colnames(df))
ind3 <- grep("^ID3", colnames(df))
cond1 <- do.call(cbind,Map(`==` , df[ind1], df[ind2]))
cond2 <- do.call(cbind,Map(`==` , df[ind1], df[ind3]))
Finalind <- do.call(cbind, Map(`|`, as.data.frame(t(cond1)),
as.data.frame(t(cond2))))
res <- (!is.na(Finalind))+0
rownames(res) <- paste0("Same", 1:3)
t(res)
# Same1 Same2 Same3
#V1 0 0 1
#V2 0 1 0
#V3 0 1 0
cbind(df, t(res))
data
df <- structure(list(Gender = c(1L, 2L, 1L), ID1_Desc_1 = c(NA, NA,
1L), ID1_Desc_2 = c(NA, 1L, 1L), ID1_Desc_3 = c(1L, 1L, 1L),
ID2_Desc_1 = c(NA, NA, NA), ID2_Desc_2 = c(NA, NA, 1L), ID2_Desc_3 = c(1L,
NA, NA), ID3_Desc_1 = c(NA, 1L, NA), ID3_Desc_2 = c(NA, 1L,
NA), ID3_Desc_3 = c(NA, NA, NA)), .Names = c("Gender", "ID1_Desc_1",
"ID1_Desc_2", "ID1_Desc_3", "ID2_Desc_1", "ID2_Desc_2", "ID2_Desc_3",
"ID3_Desc_1", "ID3_Desc_2", "ID3_Desc_3"), class = "data.frame",
row.names = c(NA, -3L))