Use if statement for each cell of a dataframe in R - r

I have two dataframes, A and B, each with 64 rows and 431 columns. Each dataframe contains values of zeros and ones. I need to create a new dataframe C that has values of 1 when the cell of A is equal to the cell of B, and a value of 0 when a cell of A is different to the cell of B. How to apply the if statement to each cell of the two dataframes?
Example of dataframes
A <- data.frame(replicate(431,sample(0:1,64,rep=TRUE)))
B <- data.frame(replicate(431,sample(0:1,64,rep=TRUE)))
Example rows from A
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 0 1 0 1 0 0 1
2 1 1 0 1 1 0 0 0 0 0
3 1 0 0 0 1 0 0 1 1 0
4 0 0 0 0 1 1 1 1 1 0
5 1 0 1 1 0 0 0 1 1 1
Example rows from B
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 0 1 0 0 1 0 1 0 1
2 0 0 0 1 0 1 1 1 1 1
3 1 0 1 1 1 1 0 0 0 0
4 1 0 0 0 0 1 1 0 0 0
5 0 0 0 0 1 1 1 1 1 0
Output I would like to obtain, dataframe C
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 0 1 0 0 0 0 0 1 0
2 0 0 1 1 0 0 0 0 0 0
3 1 1 0 0 1 0 1 0 0 1
4 0 1 1 1 0 1 1 0 0 1
5 0 1 0 0 0 0 0 1 0 0

Because of R's behind the scenes magic, you don't even need to use an if statement. You can just do this:
C <- (A == B) * 1
The first part (A == B) goes through every cell of A and B and compares them directly. The result is a bunch of TRUE and FALSE values. Multiplying everything by 1 forces the TRUE values to become 1 and FALSE to become 0.

You assess whether A and B are the same (cell-wise) and then transform the TRUE / FALSE values into binary by multiplying it by 1:
df <- (A == B) * 1

The previous answers are correct. If you really want to use an if statement, then you can use this:
C <- ifelse(A == B, 1, 0)

Basic operations on R matrix-like data structures tend to be cell-wise. Logicals mixed with numbers in operations tend to coerced into the number themselves, 0 (FALSE) and 1 (TRUE) so the (A == B) + 0 would do what you want to cell-wise, however to make sure that the result is a data.frame and not a matrix you need to call as.data.frame:
C = as.data.frame((A == B) + 0)

Related

Change values in multiple columns with unique value and merge into single column (R)

Let's say I have a dataset (ds) with 4 rows with 3 variables as seen below:
ds
x1 x2 x3
1 0 0
0 0 1
0 1 0
0 0 1
How do I change the "1" to a unique value for each column and combine them into a single column?
So, the first step:
x1 x2 x3
1 0 0
0 0 3
0 2 0
0 0 3
Then, the second step (creating x4):
x1 x2 x3 x4
1 0 0 1
0 0 3 3
0 2 0 2
0 0 3 3
I have a lot more variables than this, I just want to know how to minimize the number of lines I write so it's not like 10+ lines.
You could do this:
df <- read.table(text="x1 x2 x3
1 0 0
0 0 1
0 1 0
0 0 1", header=TRUE, stringsAsFactors=FALSE)
df <- df*col(df)
df$x4 <- rowSums(df)
x1 x2 x3 x4
1 1 0 0 1
2 0 0 3 3
3 0 2 0 2
4 0 0 3 3

undefined columns error when trying to subset

subset_car_data <- car_data[car_data, car_data$Car_Type == "N" & car_data$Term == 60 & car_data$FICO>=675 & car_data$FICO<=725 & car_data$Amount>=30000 & car_data$Amount<=40000]
this is my code. I am attempting to create a subset subset_car_data from car_data with specific conditions. However, I keep getting the error:
df <- data.frame(replicate(10,sample(0:1,10,rep=TRUE)))
df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 0 0 1 1 1 0 1 0 1
2 0 1 1 1 0 0 1 1 0 0
3 0 1 1 0 0 0 1 0 0 0
4 0 0 1 0 1 1 1 1 1 0
5 0 0 1 0 0 1 0 1 0 0
6 1 0 0 1 1 0 1 1 1 0
7 1 0 1 0 1 0 1 1 1 0
8 0 0 0 1 0 0 1 0 0 1
9 0 0 0 0 1 0 1 0 1 1
10 0 0 1 0 0 0 1 1 1 1
You should do something like:
subset_df <- df[df$X1 == 1 & df$X2 == 1 & df$X3 == 1,]
subset_df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
8 1 1 1 0 0 1 0 0 0 0
10 1 1 1 1 0 1 0 0 0 1
Instead of:
subset_df <- df[df,df$X1 == 1 & df$X2 == 1 & df$X3 == 1]

R: rowsum function changes order of groups after aggregation

I've got this data frame which has duplicates (same ID but different numbers):
ID X1 X2 X3 X4 X5
45 1 0 0 1 0
45 0 1 0 0 1
15 1 0 1 0 0
7 1 0 1 1 0
7 0 1 0 0 0
I want to sum the vectors that have the same ID so I've used rowsum:
m <- rowsum(m, m$ID)
However it messes up with the order of the rows showing something like this:
ID X1 X2 X3 X4 X5
15 1 0 1 0 0
45 1 1 0 1 1
7 1 1 1 1 0
Instead of what I want:
ID X1 X2 X3 X4 X5
45 1 1 0 1 1
15 1 0 1 0 0
7 1 1 1 1 0
Anyone knows how to fix this?
Put reorder = FALSE in rowsum.
From ?rowsum:
reorder: if ‘TRUE’, then the result will be in order of
‘sort(unique(group))’, if ‘FALSE’, it will be in the order
that groups were encountered.

Looking to multiply all rows of all columns by a constant

I am trying to find a simple (1 line of code or so) to multiply all rows of all columns of a dataframe by 100 for example.
df <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 0 0 0 0 1 0 1
2 0 0 1 0 0 1 0 1 0 0
3 0 0 1 1 0 0 1 1 0 0
4 0 1 0 1 1 1 0 0 0 1
5 1 0 1 1 0 0 0 1 0 0
6 0 0 0 0 1 0 0 1 1 1
The way I am currently doing it;
dfX1 <- as.data.frame(df$X1 * 100)
But this way I would have to do this 10 times... and then use the cbind function to bind them all back together again.
dfFULL <- cbind(dfX1, dfX2, dfX3...)
Anybody know of a cleaner way?

How to change data to binary in R and keep the row names column?

I have a data frame that looks like this
Site <- c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")
A <- c(0,0,1,2,4,5,6,7,13,56)
B <- c(1,0,0,0,0,4,5,7,7,8)
C <- c(2,3,0,0,4,5,67,8,43,21)
D <- c(134,0,0,2,0,0,9,0,45,55)
mydata <- data.frame(Site,A,B,C,D,stringsAsFactors=FALSE)
I want to convert all values > 0 to be 1 (i.e. binary), without jeopardising the column and row names.
I have tried mydata[mydata>=1]<-1 but it also changed my first column (the row names) to 1 as well:
head(mydata)
Site A B C D
1 1 0 1 1 1
2 1 0 0 1 0
3 1 1 0 0 0
4 1 1 0 0 1
5 1 1 0 1 0
6 1 1 1 1 0
So how do I change just the values to binary, not the row names?
We can create a logical matrix and coerce to binary
mydata[-1] <- +(mydata[-1] > 0)
As an alternative to the answer given by #akrun (+1), we can also try using sapply() to logically convert any non-zero number to 1 or else 0:
mydata[-1] <- sapply(mydata[-1], function(x) { as.numeric(x > 0) })
mydata
Site A B C D
1 X1 0 1 1 1
2 X2 0 0 1 0
3 X3 1 0 0 0
4 X4 1 0 0 1
5 X5 1 0 1 0
6 X6 1 1 1 0
7 X7 1 1 1 1
8 X8 1 1 1 0
9 X9 1 1 1 1
10 X10 1 1 1 1
If we weren't sure about the relative positioning of the columns, we could also address the numeric columns using mydata[c("A", "B", "C", "D")] or something similar.
You could also try this which disregards if the number is negative or positive:
mydata[-1] <- (!is.na(mydata[-1]/mydata[-1]))*1
ifelse function allows you to assign a new data if the value agrees or not your condition. Works for vectors but data frames also. I bind the Site column with the transformed ones.
myBinData <- data.frame(Site = mydata$Site, ifelse(mydata[, -1] == 0, 0, 1))
Site A B C D
1 X1 0 1 1 1
2 X2 0 0 1 0
3 X3 1 0 0 0
4 X4 1 0 0 1
5 X5 1 0 1 0
6 X6 1 1 1 0
7 X7 1 1 1 1
8 X8 1 1 1 0
9 X9 1 1 1 1
10 X10 1 1 1 1

Resources