Using If/Else on a data frame - r

I have a data set which looks something like
data<-c(0,1,2,3,4,2,3,1,4,3,2,4,0,1,2,0,2,1,2,0,4)
frame<-as.data.frame(data)
I now want to create a new variable within this data frame. If the column "data" reports a number of 2 or more, I want it to have "2" in that row, and if there is a 1 or 0 (e.g. the first two observations), I want the new variable to have a "1" for that observation.
I am trying to do this using the following code:
frame$twohouses<- if (any(frame$data>=2)) {frame$twohouses=2} else {frame$twohouses=1}
However if I run these 3 lines of script, every observation in the column "twohouses" is coded with a 2. However a number of them should be coded with a 1.
So my question: what am I doing wrong with my if else line or script? Or is there an alternative way to do this.
My question is similar to this one:
Using ifelse on factor in R
ut no one has answered that question.

Use ifelse:
frame$twohouses <- ifelse(frame$data>=2, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
...
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2
The difference between if and ifelse:
if is a control flow statement, taking a single logical value as an argument
ifelse is a vectorised function, taking vectors as all its arguments.
The help page for if, accessible via ?"if" will also point you to ?ifelse

Try this
frame$twohouses <- ifelse(frame$data>1, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
6 2 2
7 3 2
8 1 1
9 4 2
10 3 2
11 2 2
12 4 2
13 0 1
14 1 1
15 2 2
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2

Related

How to use an if statement to fill two columns related to number of occurencies of interested values [duplicate]

I have a data set which looks something like
data<-c(0,1,2,3,4,2,3,1,4,3,2,4,0,1,2,0,2,1,2,0,4)
frame<-as.data.frame(data)
I now want to create a new variable within this data frame. If the column "data" reports a number of 2 or more, I want it to have "2" in that row, and if there is a 1 or 0 (e.g. the first two observations), I want the new variable to have a "1" for that observation.
I am trying to do this using the following code:
frame$twohouses<- if (any(frame$data>=2)) {frame$twohouses=2} else {frame$twohouses=1}
However if I run these 3 lines of script, every observation in the column "twohouses" is coded with a 2. However a number of them should be coded with a 1.
So my question: what am I doing wrong with my if else line or script? Or is there an alternative way to do this.
My question is similar to this one:
Using ifelse on factor in R
ut no one has answered that question.
Use ifelse:
frame$twohouses <- ifelse(frame$data>=2, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
...
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2
The difference between if and ifelse:
if is a control flow statement, taking a single logical value as an argument
ifelse is a vectorised function, taking vectors as all its arguments.
The help page for if, accessible via ?"if" will also point you to ?ifelse
Try this
frame$twohouses <- ifelse(frame$data>1, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
6 2 2
7 3 2
8 1 1
9 4 2
10 3 2
11 2 2
12 4 2
13 0 1
14 1 1
15 2 2
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2

Create dataframe with repeating string from scratch in R

I would like to create a dataframe that essentially would look something like this
Repeating the period from 1 to 10 and assigning the ID 42,574 times
so that I would end up with a 425,740 row dataframe.
I tried to create a dataframe using the following code
periodstring <- as.numeric(gl(10, 42574))
periods <- as.data.frame(periodstring)
but that sorts the numbers and other approaches did not quiete work. Is there a simple way to do this?
Thanks in advance.
Another option using rep:
data.frame(Period=rep(1:10,times=42574),
ID=rep(1:42574,each=10))
Output sample:
Period ID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
11 1 2
12 2 2
13 3 2
14 4 2
15 5 2
16 6 2
17 7 2
18 8 2
19 9 2
20 10 2

Retain Max Value of Vector until vector Catches up

I have some cumulative count data. Because of reporting innacuracies, sometimes the cumulative sum decreases such as 0 1 2 2 3 3 2 4 5.
I would like to created a new vector that retains the largest value reported and carries it forward until the cumulative count data catches up. So the corrected version of the above would be 0 1 2 2 3 3 3 4 5
I tried the following
mydf <- data.frame(ts1 = c(0,1,1,1,2,3,2,2,3,4,4,5))
mydf$lag1 <- lag(mydf[,1])
mydf$corrected <- ifelse(is.na(mydf[,2]),mydf[,1],
ifelse(mydf[,2] > mydf[,1], mydf[,2], mydf[,1]))
which returns:
ts1 lag1 corrected
1 0 NA 0
2 1 0 1
3 1 1 1
4 1 1 1
5 2 1 2
6 3 2 3
7 2 3 3
8 2 2 2
9 3 2 3
10 4 3 4
11 4 4 4
12 5 4 5
This worked for the case of the first time that the next value was smaller than the previous value(line7) but it fails for the second time(line 8).
I thought there must be a better way of doing this. New Vector that is equal to input vector unless value decreases in which case it retains prior value until input vector exceeds that retained value.
You are looking for cummax :
cummax(mydf$ts1)
#[1] 0 1 1 1 2 3 3 3 3 4 4 5

Create new variable based on the value of several other variables

So I have a data set that has multiple variables that I want to use to create a new variable. I have seen other questions like this that use the ifelse statement, but this would be extremely insufficient since the new variable is based on 32 other variables. The variables are coded with values of 1, 2, 3, or NA, and I am wanting the new variable to be coded as 1 if 2 or more of the 32 variables take on a value of 1, and 2 otherwise. Here is a small example of what I have been trying to do.
df <- data.frame(id = 1:10, v1 = c(1,2,2,2,3,NA,2,2,2,2), v2 = c(2,2,2,2,2,1,2,1,2,2),
v3 = c(1,2,2,2,2,3,2,2,2,2), v4 = c(2,2,2,2,2,1,2,2,2,3))
and the result I am looking for is this:
id v1 v2 v3 v4 new
1 1 1 2 1 2 1
2 2 2 2 2 2 2
3 3 2 2 2 2 2
4 4 2 2 2 2 2
5 5 3 2 2 2 1
6 6 NA 1 3 1 2
7 7 2 2 2 2 2
8 8 2 1 2 2 2
9 9 2 2 2 2 2
10 10 2 2 2 3 2
I have also tried using rowSums within the if else statement, but with the missing values this doesn't work for all observations unless I recode the NAs to another value which I want to avoid doing, and besides that I feel like there would be a much more efficient way of doing this.
I feel like it is likely that this question has been answered before, but I couldn't find anything on it. So help or direction to a previous answer would be appreciated.
It looks like you were very close to getting your desired output, but you were probably missing the na.rm = TRUE argument as part of your rowSums() call. This will remove any NAs before rowSums does its calculations.
Anyway, using your data frame from above, I created a new variable that counts the number of times 1 appears across the variables, while ignoring NA values. Note that I've subsetted the data to exclude the id column:
df$count <- rowSums(df[-1] == 1, na.rm = TRUE)
Then I created another variable using an ifelse statement that returns a 1 if the count is 2 or more or a 2 otherwise.
df$var <- ifelse(df$count >= 2, 1, 2)
The returned output:
id v1 v2 v3 v4 count var
1 1 1 2 1 2 2 1
2 2 2 2 2 2 0 2
3 3 2 2 2 2 0 2
4 4 2 2 2 2 0 2
5 5 3 2 2 2 0 2
6 6 NA 1 3 1 2 1
7 7 2 2 2 2 0 2
8 8 2 1 2 2 1 2
9 9 2 2 2 2 0 2
10 10 2 2 2 3 0 2
UPDATE / EDIT: As mentioned by Gregor in the comments, you can also just wrap the rowSums function in the ifelse statement for one line of code.

Creating a fractional factorial design in R without prohibited pairs

I'm trying to write R code for a choice-based conjoint study.
I can create a factorial design using AlgDesign or conjoint - however, there are combinations of attribute levels that should not be together
Using an example from the web:
#Creating a full factorial design
library(AlgDesign)
ffd <- gen.factorial(c(2,2,4), varNames=c("Discount","Amount","Price"), factors="all")
ffd
Discount Amount Price
1 1 1 1
2 2 1 1
3 1 2 1
4 2 2 1
5 1 1 2
6 2 1 2
7 1 2 2
8 2 2 2
9 1 1 3
10 2 1 3
11 1 2 3
12 2 2 3
13 1 1 4
14 2 1 4
15 1 2 4
16 2 2 4
But what if "Discount" 2 ("no discount") should never be paired with "Amount" 1 ("20% discount")
Is there a way to tell AlgDesign or conjoint or some other factorial design to remove any prohibited pairs from the design?
Any advice would be appreciated.
You could always generate ffd as you did there, and then remove rows which meet your criteria, e.g. ffd$Discount == 2 & ffd$Amount==1 . The easy-ish way is to keep all the rows which do not meet the condition:
ffd<-ffd[(ffd$Discount != 2 | ffd$Amount != 1),]
Repeat for each condition you want to reject.

Resources