Looking to multiply all rows of all columns by a constant - r

I am trying to find a simple (1 line of code or so) to multiply all rows of all columns of a dataframe by 100 for example.
df <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 0 0 0 0 1 0 1
2 0 0 1 0 0 1 0 1 0 0
3 0 0 1 1 0 0 1 1 0 0
4 0 1 0 1 1 1 0 0 0 1
5 1 0 1 1 0 0 0 1 0 0
6 0 0 0 0 1 0 0 1 1 1
The way I am currently doing it;
dfX1 <- as.data.frame(df$X1 * 100)
But this way I would have to do this 10 times... and then use the cbind function to bind them all back together again.
dfFULL <- cbind(dfX1, dfX2, dfX3...)
Anybody know of a cleaner way?

Related

R check equality of one column to rowSums of other columns

I have a dataframe like this:
x
y
x1
y1
x2
y2
x3
y3
1
0
1
0
0
0
0
0
0
0
3
0
0
0
0
0
2
0
0
0
0
0
2
0
1
0
0
0
1
0
0
0
I want to find rows that x=x1+x2+x3 and rows that y=y1+y2+y3.
Here is my code to check x=x1+x2+x3:
col_x = c(3,5,7)
df[df$x == rowSums(df[col_x])]
Suppose return row 1,3,4, but it returned
x x1 y1 x2 x3 y3
1 1 1 0 0 0 0
2 0 3 0 0 0 0
3 2 0 0 0 2 0
4 1 0 0 1 0 0
I also tried
col_x = c(3,5,7)
df[df$x == apply(df[col_x],1,sum)]
Which also give me:
x x1 y1 x2 x3 y3
1 1 1 0 0 0 0
2 0 3 0 0 0 0
3 2 0 0 0 2 0
4 1 0 0 1 0 0
I can't figure out why it returned all rows and it had skip column y2.
You are just missing a comma.
col_x = c(3,5,7)
df[df$x == rowSums(df[col_x]),]
x y x1 y1 x2 y2 x3 y3
1 1 0 1 0 0 0 0 0
3 2 0 0 0 0 0 2 0
4 1 0 0 0 1 0 0 0
A possible solution:
library(dplyr)
df %>%
filter(x == rowSums(across(matches("x\\d$"))) &
y == rowSums(across(matches("y\\d$"))))
#> x y x1 y1 x2 y2 x3 y3
#> 1 1 0 1 0 0 0 0 0
#> 2 2 0 0 0 0 0 2 0
#> 3 1 0 0 0 1 0 0 0

R: Merge rows by names in one column adding 1 whenever is present [duplicate]

This question already has answers here:
How to get the maximum value by group
(5 answers)
Closed 2 years ago.
I have a large dataset with one column with genes names, four columns with the detection methods (X1-X4) and three columns with type of mutation (Y5-Y7). I would like to merge the rows by the name of the gene and that the gene contain 1 whenever there is a 1 in one of the columns. Example of the table:
GENE X1 X2 X3 X4 Y5 Y6 Y7
AKT1 1 0 0 0 0 1 0
AKT1 0 0 1 0 0 0 1
AKT1 0 0 1 0 0 1 0
CENPF 0 1 0 0 0 1 0
CENPF 0 0 1 0 0 1 0
FOXA1 1 0 0 0 0 1 0
FOXA1 0 1 0 0 0 1 0
KMT2C 0 1 0 0 1 0 0
KMT2C 0 0 1 0 1 0 0
Example of the table results using the information of the above table.
GENE X1 X2 X3 X4 Y5 Y6 Y7
AKT1 1 0 1 0 0 1 1
CENPF 0 1 1 0 0 1 0
FOXA1 1 1 0 0 0 1 0
KMT2C 0 1 1 0 1 0 0
Thanks for your help
You can use rowsum to merge by GENE. rowsum sums up the values and with > 0 you get FALSE / TRUE in case it is larger than 0 and with + you get back values 0 or 1.
+(rowsum(x[-1], x$GENE) > 0)
# X1 X2 X3 X4 Y5 Y6 Y7
#AKT1 1 0 1 0 0 1 1
#CENPF 0 1 1 0 0 1 0
#FOXA1 1 1 0 0 0 1 0
#KMT2C 0 1 1 0 1 0 0
Data:
x <- read.table(header=TRUE, text="
GENE X1 X2 X3 X4 Y5 Y6 Y7
AKT1 1 0 0 0 0 1 0
AKT1 0 0 1 0 0 0 1
AKT1 0 0 1 0 0 1 0
CENPF 0 1 0 0 0 1 0
CENPF 0 0 1 0 0 1 0
FOXA1 1 0 0 0 0 1 0
FOXA1 0 1 0 0 0 1 0
KMT2C 0 1 0 0 1 0 0
KMT2C 0 0 1 0 1 0 0")
One way would be to take max for all the columns for each GENE.
This can be done in base R :
result <- aggregate(.~GENE, df, max, na.rm = TRUE)
result
# GENE X1 X2 X3 X4 Y5 Y6 Y7
#1 AKT1 1 0 1 0 0 1 1
#2 CENPF 0 1 1 0 0 1 0
#3 FOXA1 1 1 0 0 0 1 0
#4 KMT2C 0 1 1 0 1 0 0
dplyr :
library(dplyr)
df %>% group_by(GENE) %>% summarise(across(X1:Y7, max, na.rm = TRUE))
and data.table :
library(data.table)
setDT(df)[, lapply(.SD, max), GENE, .SDcols = X1:Y7]
Does this work:
library(dplyr)
dat %>% group_by(GENE) %>% summarise(across(X1:Y7, ~ case_when(1 %in% . ~ 1, TRUE ~ 0)))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 8
GENE X1 X2 X3 X4 Y5 Y6 Y7
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AKT1 1 0 1 0 0 1 1
2 CENPF 0 1 1 0 0 1 0
3 FOXA1 1 1 0 0 0 1 0
4 KMT2C 0 1 1 0 1 0 0
Data used:
dat
# A tibble: 9 x 8
GENE X1 X2 X3 X4 Y5 Y6 Y7
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AKT1 1 0 0 0 0 1 0
2 AKT1 0 0 1 0 0 0 1
3 AKT1 0 0 1 0 0 1 0
4 CENPF 0 1 0 0 0 1 0
5 CENPF 0 0 1 0 0 1 0
6 FOXA1 1 0 0 0 0 1 0
7 FOXA1 0 1 0 0 0 1 0
8 KMT2C 0 1 0 0 1 0 0
9 KMT2C 0 0 1 0 1 0 0

Change values in multiple columns with unique value and merge into single column (R)

Let's say I have a dataset (ds) with 4 rows with 3 variables as seen below:
ds
x1 x2 x3
1 0 0
0 0 1
0 1 0
0 0 1
How do I change the "1" to a unique value for each column and combine them into a single column?
So, the first step:
x1 x2 x3
1 0 0
0 0 3
0 2 0
0 0 3
Then, the second step (creating x4):
x1 x2 x3 x4
1 0 0 1
0 0 3 3
0 2 0 2
0 0 3 3
I have a lot more variables than this, I just want to know how to minimize the number of lines I write so it's not like 10+ lines.
You could do this:
df <- read.table(text="x1 x2 x3
1 0 0
0 0 1
0 1 0
0 0 1", header=TRUE, stringsAsFactors=FALSE)
df <- df*col(df)
df$x4 <- rowSums(df)
x1 x2 x3 x4
1 1 0 0 1
2 0 0 3 3
3 0 2 0 2
4 0 0 3 3

undefined columns error when trying to subset

subset_car_data <- car_data[car_data, car_data$Car_Type == "N" & car_data$Term == 60 & car_data$FICO>=675 & car_data$FICO<=725 & car_data$Amount>=30000 & car_data$Amount<=40000]
this is my code. I am attempting to create a subset subset_car_data from car_data with specific conditions. However, I keep getting the error:
df <- data.frame(replicate(10,sample(0:1,10,rep=TRUE)))
df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 0 0 1 1 1 0 1 0 1
2 0 1 1 1 0 0 1 1 0 0
3 0 1 1 0 0 0 1 0 0 0
4 0 0 1 0 1 1 1 1 1 0
5 0 0 1 0 0 1 0 1 0 0
6 1 0 0 1 1 0 1 1 1 0
7 1 0 1 0 1 0 1 1 1 0
8 0 0 0 1 0 0 1 0 0 1
9 0 0 0 0 1 0 1 0 1 1
10 0 0 1 0 0 0 1 1 1 1
You should do something like:
subset_df <- df[df$X1 == 1 & df$X2 == 1 & df$X3 == 1,]
subset_df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
8 1 1 1 0 0 1 0 0 0 0
10 1 1 1 1 0 1 0 0 0 1
Instead of:
subset_df <- df[df,df$X1 == 1 & df$X2 == 1 & df$X3 == 1]

Use if statement for each cell of a dataframe in R

I have two dataframes, A and B, each with 64 rows and 431 columns. Each dataframe contains values of zeros and ones. I need to create a new dataframe C that has values of 1 when the cell of A is equal to the cell of B, and a value of 0 when a cell of A is different to the cell of B. How to apply the if statement to each cell of the two dataframes?
Example of dataframes
A <- data.frame(replicate(431,sample(0:1,64,rep=TRUE)))
B <- data.frame(replicate(431,sample(0:1,64,rep=TRUE)))
Example rows from A
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 0 1 0 1 0 0 1
2 1 1 0 1 1 0 0 0 0 0
3 1 0 0 0 1 0 0 1 1 0
4 0 0 0 0 1 1 1 1 1 0
5 1 0 1 1 0 0 0 1 1 1
Example rows from B
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 0 1 0 0 1 0 1 0 1
2 0 0 0 1 0 1 1 1 1 1
3 1 0 1 1 1 1 0 0 0 0
4 1 0 0 0 0 1 1 0 0 0
5 0 0 0 0 1 1 1 1 1 0
Output I would like to obtain, dataframe C
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 0 1 0 0 0 0 0 1 0
2 0 0 1 1 0 0 0 0 0 0
3 1 1 0 0 1 0 1 0 0 1
4 0 1 1 1 0 1 1 0 0 1
5 0 1 0 0 0 0 0 1 0 0
Because of R's behind the scenes magic, you don't even need to use an if statement. You can just do this:
C <- (A == B) * 1
The first part (A == B) goes through every cell of A and B and compares them directly. The result is a bunch of TRUE and FALSE values. Multiplying everything by 1 forces the TRUE values to become 1 and FALSE to become 0.
You assess whether A and B are the same (cell-wise) and then transform the TRUE / FALSE values into binary by multiplying it by 1:
df <- (A == B) * 1
The previous answers are correct. If you really want to use an if statement, then you can use this:
C <- ifelse(A == B, 1, 0)
Basic operations on R matrix-like data structures tend to be cell-wise. Logicals mixed with numbers in operations tend to coerced into the number themselves, 0 (FALSE) and 1 (TRUE) so the (A == B) + 0 would do what you want to cell-wise, however to make sure that the result is a data.frame and not a matrix you need to call as.data.frame:
C = as.data.frame((A == B) + 0)

Resources