I have these columns
utility pass
2 None
3 NA
-1 None
-2 NA
indicator is 1 if : pass=None and utility>0
output
I have these columns
utility pass indicator
2 None 1
3 NA 0
-1 None 0
-2 NA 0
One possibility could be:
with(df, +(grepl("None", pass, fixed = TRUE) * utility > 0))
[1] 1 0 0 0
Assuming that NA is an NA and not a character, we can achieve the desired output as follows :
With dplyr(can use case_when):
df %>%
mutate(indicator = ifelse( !is.na(pass) & utility >0 , 1, 0))
utility pass indicator
1 2 None 1
2 3 <NA> 0
3 -1 None 0
4 -2 <NA> 0
Without relying on external packages, we can do the following with the base package:
df$indicator <- ifelse( !is.na(df$pass) & df$utility >0 , 1, 0)
Using within:
within(df, {
indicator <- ifelse(!is.na(pass) & utility >0, 1, 0)
})
utility pass indicator
1 2 None 1
2 3 <NA> 0
3 -1 None 0
4 -2 <NA> 0
Data:
df <- structure(list(utility = c(2L, 3L, -1L, -2L), pass = structure(c(1L,
NA, 1L, NA), .Label = "None", class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
Related
I have a data set of boolean variables and I am trying to generate a new variable based on 3 of the existing booleans using ifelse().
The rules I'd like to implement are:
If any of the three columns have value 1, 1
If all of the three columns have value 0, 0
If all of the three columns have value NA, NA
If the three columns have some combination of 0 and NA, 0
Here is the code to generate a sample with 3 variables that I want to use to create a fourth:
df <- structure(list(var1 = c(NA, NA, NA, 0,1),
var2 = c(1, NA, 0,0, 1),
var3 = c(NA, NA, NA,0,1)), class = "data.frame", row.names = c(NA, -5L))
I have tried the following to generate the new variable according to my desired rules:
df$newvar1 <-ifelse(df$var1 == 1 | df$var2 == 1 |df$var3 == 1, 1,
ifelse((is.na(df$var1) & is.na(df$var2) & is.na(df$var3)), NA,0))
df$newvar2 <- ifelse((is.na(df$var1)|df$var1==0) &
(is.na(df$var2)|df$var2==0) &
(is.na(df$var3)|df$var3==0),0,
ifelse(df$var1 == 1 | df$var2 == 1 |df$var3 == 1, 1,
ifelse(is.na(df$var1) & is.na(df$var2) & is.na(df$var3), NA,NA)))
df$newvar3 <-ifelse(df$var1 == 1 | df$var2 == 1 |df$var3 == 1, 1,
ifelse((is.na(df$var1) & is.na(df$var2) & is.na(df$var3)), NA,
ifelse((is.na(df$var1)|df$var1==0) &
(is.na(df$var2)|df$var2==0) &
(is.na(df$var3)|df$var3==0),0,0)))
I don't understand why newvar1 and newvar3 have NA values corresponding to combinations of NAs and 0s when both examples use "&" between the na specifications (row 3 in the results).
I am assuming that NAs don't show up in newvar2 because the first ifelse() function takes precedent.
Any insight to the ifelse() function or advice on how to get the results I'm looking for would be really helpful.
Here is another possible option using rowSums:
df$newvar <- +(rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0) > 0)
# var1 var2 var3 newvar
#1 NA 1 NA 1
#2 NA NA NA NA
#3 NA 0 NA 0
#4 0 0 0 0
#5 1 1 1 1
This gives your expected results:
df$newvar <- 0
df$newvar[Reduce(`|`, lapply(df[1:3], `%in%`, 1))] <- 1
df$newvar[Reduce(`&`, lapply(df[1:3], is.na))] <- NA
df
# var1 var2 var3 newvar
# 1 NA 1 NA 1
# 2 NA NA NA NA
# 3 NA 0 NA 0
# 4 0 0 0 0
# 5 1 1 1 1
This defaults to 0 and only changes values with known conditions, which means that if there are any rows with NA and 1 (with or without 0), it will be assigned 0. It's not difficult to test for this, but it wasn't in your logic.
I'm translating Stata code to R code, but now I'm having some n00b troubles like this one.
This is my Stata code:
gen aposentadofam=1 if proprendaposent > 0 & proprendaposent ~=.;
replace aposentadofam=0 if proprendaposent == 0 | proprendaposent ==.;
And this is what I tried to do in R:
# pemg <- mutate(pemg, aposentadofam = NA_real_)
# pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent >0 & !is.na(proprendaposent) ~ 1, TRUE ~ aposentadofam))
# pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent==0 | is.na(proprendaposent) ~ 0, TRUE ~ aposentadofam))
The line with is.na() seems to be running correctly, but the one with !is.na() does not. It gives me this error message:
LHS of case 1 (`proprendaposent > 0 & !is.na(proprendaposent) ~ 1`) must be a logical vector, not a `formula` object.
What should I do?
Not enough reputation to comment (yet!) but I just ran the following using your example code (in R) with no issues. How exactly does your data/code differ?
library(dplyr)
pemg <- data.frame(c(1, 2, 3.1, 4, 5.5, 0, 0, 0, NA))
colnames(pemg) <- "proprendaposent"
pemg <- mutate(pemg, aposentadofam = NA_real_)
pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent >0 & !is.na(proprendaposent) ~ 1, TRUE ~ aposentadofam))
pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent==0 | is.na(proprendaposent) ~ 0, TRUE ~ aposentadofam))
pemg
which outputs:
proprendaposent aposentadofam
1 1.0 1
2 2.0 1
3 3.1 1
4 4.0 1
5 5.5 1
6 0.0 0
7 0.0 0
8 0.0 0
9 NA 0
Often, within() is most illustrative.
dat <- within(dat, {
aposentadofam <- NA
aposentadofam[proprendaposent > 0 & !is.na(proprendaposent)] <- 1
aposentadofam[proprendaposent == 0 | is.na(proprendaposent)] <- 0
})
Or using transform().
dat <- transform(dat, aposentadofam=ifelse(proprendaposent %in% c(0, NA), 0, 1))
Both functions come with base R, so you won't need any extra packages (which is rather rarely the case anyway).
# proprendaposent aposentadofam
# 1 0 0
# 2 4 1
# 3 0 0
# 4 0 0
# 5 1 1
# 6 3 1
# 7 1 1
# 8 1 1
# 9 0 0
# 10 NA 0
# 11 NA 0
# 12 3 1
Data
dat <- structure(list(proprendaposent = c(0L, 4L, 0L, 0L, 1L, 3L, 1L,
1L, 0L, NA, NA, 3L)), class = "data.frame", row.names = c(NA,
-12L))
I have two continuous variables that I dummy coded into a categorical variable with 2 levels. Each of these variables are coded either 0 or 1 for low and high levels of this variable. Both variables were z-scored to know if they fell below or above the mean.
MeanAboveAvo <- ifelse(Dataframeforstudy2$avo < 0, 0, 1)
MeanAboveAnx <- ifelse(Dataframeforstudy2$anx < 0, 0 , 1)
My question is how do I dummy code these two variables together? I want to create a single variable with 4 different levels using these two variables (MeanAboveAvo & MeanAboveAnx). I want a single variable that is coded with either 1,2,3,4 and the 1 is (0,0), 2 is (0,1), 3 is (1,0) and 4 is (1,1).
My code is this:
stats <- while(MeanAboveAnx = 0 || MeanAboveAvx = 1) {
if(MeanAboveAnx = 0 & MeanAboveAvo = 0 ){
1
}
else if (MeanAboveAnx = 0 & MeanAboveAvo = 1){
2
}
else if(MeanAboveAnx = 1 & MeanAboveAvo = 0){
3
}
else {
4
}}
It is not coding it at all and I am getting an error message. What can I do differently to get the results I want?
Thank you for your help in advance!
Base R has function interaction precisely for this type of problem. The code below can become a one-liner, I leave it like this in order to make it more clear.
f <- with(df, interaction(anx, avo, lex.order = TRUE))
as.integer(f)
# [1] 1 2 1 1 2 3 3 3 4 2
Edit.
I was using the data in TomasIsCoding's answer, here is a solution more to the question's problem, with anx and avo as z-scores. Thanks to #KonradRudolph for his comment.
f <- with(df, interaction(as.integer(anx < 0),
as.integer(avo < 0),
lex.order = TRUE))
f
# [1] 1.1 0.1 0.1 1.0 0.0 0.1 1.1 1.1 1.1 1.0
#Levels: 0.0 0.1 1.0 1.1
as.integer(f)
# [1] 4 2 2 3 1 2 4 4 4 3
Data.
set.seed(1234)
df <- data.frame(anx = rnorm(10), avo = rnorm(10))
Categorical variables in in R don’t need to be numeric (and making them so has several drawbacks!): there’s consequently no need for your ifelse:
MeanAboveAvo <- Dataframeforstudy2$avo < 0
MeanAboveAnx <- Dataframeforstudy2$anx < 0
Next, the code using these encodings contains multiple mistakes:
It’s not clear what the while here is supposed to mean.
All = signs need to be converted to == because you’re performing comparisons.
if, unlike ifelse, isn’t vectorised so you cannot use it to assign its result to a vector of length > 1.
If I understand you correctly, then the following is one (canonical) way of encoding the stats:
stats <- paste(MeanAboveAvo, MeanAboveAnx)
This converts the logical vectors into character vectors and concatenates them element-wise. Once again, it is unnecessary (and unconventional!) in R to convert these categories into a numeric variable; though it may make sense to convert it to a factor via as.factor.
From the mapping rule to code the anx and avo, you actually don't need while loop, since yours is a shifted mapping from binary to decimal. In this case, you can do it like below
df <- within(df,code <- 2*anx + avo + 1)
such that
> df
anx avo code
1 0 0 1
2 0 1 2
3 0 0 1
4 0 0 1
5 0 1 2
6 1 0 3
7 1 0 3
8 1 0 3
9 1 1 4
10 0 1 2
Dummy Data
df <- structure(list(anx = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L
), avo = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
Try this:
as.integer(factor(paste0(MeanAboveAvo, MeanAboveAnx)))
For example:
set.seed(123)
x <- sample(0:1, 10, T) # [1] 0 0 0 1 0 1 1 1 0 0
y <- sample(0:1, 10, T) # [1] 1 1 1 0 1 0 1 0 0 0
as.integer(factor(paste0(x, y)))
# [1] 2 2 2 3 2 3 4 3 1 1
I have a very simple problem. I'm am trying to set the value of column X to 0 if column Y[n,] does not equal column Y[n-1,]. My issue is that I do not know how to reference a previous row value in R, and then use that value to set the value of another column.
As an example:
Y X
1 5
1 1
2 0
2 2
X[3,2] is 0 because Y[3,1] does not equal Y[2,1].
I need to basically find all instance of this in a large data-set and set the corresponding X value to 0.
data$X <- 0 if data$Y[n] =! data$Y[n-1]
Is there a simple solution to this in R? It really feels as though there should be.
Thank you
Similarly to the post from #markus, with dplyr you can do:
df %>%
mutate(X = (Y == lag(Y, default = first(Y))) * X)
Y X
1 1 5
2 1 1
3 2 0
4 2 2
Given
Y <- c(1, 1, 2, 2)
X <- c(5, 1, 10, 2)
an option would be diff
X * (c(0, diff(Y)) == 0)
# [1] 5 1 0 2
The idea is to check if x[i] - x[i -1] equals zero which gives a logical vector that we multiply by X
Another base R option
with(df, X * c(TRUE, !(Y[-1] - Y[-length(Y)])))
#[1] 5 1 0 2
Or using dplyr
library(dplyr)
df %>%
mutate(X = c(X[1], ((duplicated(Y) * X)[-1])))
# Y X
#1 1 5
#2 1 1
#3 2 0
#4 2 2
data
df <- structure(list(Y = c(1L, 1L, 2L, 2L), X = c(5L, 1L, 0L, 2L)),
class = "data.frame", row.names = c(NA, -4L))
I would like to compare the previous row value whether it is same as the current one (for more than 1 variables and also using list of values). In this case how do I perform write code. I read 'apply' functions can be used.
I searched this topic here before posting this question found somewhat similar but unable to find the exact one. I'm quite new to R.
Here is my sample table: (Flag needs to be done based on conditions)
Ticket No V1 V2 Flag
Tkt10256 1 X 0
Tkt10257 1 aa 0
Tkt10257 2 bb 1
Tkt10257 3 x 0
Tkt10260 1 cc 0
Tkt10260 2 aa 1
Tkt10262 3 bb 0
I have to Flag based on the below conditions (if all the conditions are satisfied then mark as 1)
Variable 2 should be the following one of 4 names (aa, bb, cc, dd)
Variable 1 should be the different from previous row
Ticket number has to be the same as previous row
Thanks in advance for the help !
An approach without looping:
indx1 <- with(df, V2 %in% paste0(letters[1:4], letters[1:4]) )
indx2 <- with(df, c(TRUE,V1[-1]!=V1[-length(V1)]))
indx3 <- with(df, c(FALSE,Ticket.No[-1]==Ticket.No[-nrow(df)]))
df$Flag <- (indx1 & indx2 & indx3)+0
df$Flag
#[1] 0 0 1 0 0 1 0
data
df <- structure(list(Ticket.No = c("Tkt10256", "Tkt10257", "Tkt10257",
"Tkt10257", "Tkt10260", "Tkt10260", "Tkt10262"), V1 = c(1L, 1L,
2L, 3L, 1L, 2L, 3L), V2 = c("X", "aa", "bb", "x", "cc", "aa",
"bb"), Flag = c(0L, 0L, 1L, 1L, 0L, 1L, 0L)), .Names = c("Ticket.No",
"V1", "V2", "Flag"), class = "data.frame", row.names = c(NA,
-7L))
One more:
Check this on your larger data. I'm not exactly sure if duplicated is the right function to use there. If the numbers in the TicketNo column are increasing (i.e. the Xs in TktXXXXX), then it should work fine.
> dat2 <- dat[dat$V2 %in% c("aa", "bb", "cc", "dd"),]
> rn <- rownames(dat2)[duplicated(dat2[[1]]) & !c(FALSE, diff(dat2[[2]]) == 0)]
> dat$Flag <- (rownames(dat) %in% rn)+0
> dat
# TicketNo V1 V2 Flag
# 1 Tkt10256 1 X 0
# 2 Tkt10257 1 aa 0
# 3 Tkt10257 2 bb 1
# 4 Tkt10257 3 x 0
# 5 Tkt10260 1 cc 0
# 6 Tkt10260 2 aa 1
# 7 Tkt10262 3 bb 0
A variation on #Akrun's answer:
with(df,
V2 %in% c("aa","bb","cc","dd") &
c(FALSE,diff(V1) != 0) &
c(FALSE,head(Ticket.No, -1)) == Ticket.No
) + 0
#[1] 0 0 1 0 0 1 0
Try:
for(i in 2:nrow(ddf)){
ddf$Flag[i] = ifelse( ddf$V2[i] %in% c('aa', 'bb', 'cc', 'dd')
&& ddf$V1[i] != ddf$V1[(i-1)]
&& ddf$TicketNo[i] == ddf$TicketNo[(i-1)]
,1,0)
}
ddf
TicketNo V1 V2 Flag
1 Tkt10256 1 X 0
2 Tkt10257 1 aa 0
3 Tkt10257 2 bb 1
4 Tkt10257 3 x 0
5 Tkt10260 1 cc 0
6 Tkt10260 2 aa 1
7 Tkt10262 3 bb 0