I am looking to subtract multiple rows from the same row within a dataframe.
For example:
Group A B C
A 3 1 2
B 4 0 3
C 4 1 1
D 2 1 2
This is what I want it to look like:
Group A B C
B 1 -1 1
C 1 0 -1
D -1 0 0
So in other words:
Row B - Row A
Row C - Row A
Row D - Row A
Thank you!
Here's a dplyr solution:
library(dplyr)
df %>%
mutate(across(A:C, ~ . - .[1])) %>%
filter(Group != "A")
This gives us:
Group A B C
1: B 1 -1 1
2: C 1 0 -1
3: D -1 0 0
Here's an approach with base R:
data[-1] <- do.call(rbind,
apply(data[-1],1,function(x) x - data[1,-1])
)
data[-1,]
# Group A B C
#2 B 1 -1 1
#3 C 1 0 -1
#4 D -1 0 0
Data:
data <- structure(list(Group = c("A", "B", "C", "D"), A = c(3L, 4L, 4L,
2L), B = c(1L, 0L, 1L, 1L), C = c(2L, 3L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-4L))
We could also replicate the first row and substract from the rest
cbind(data[-1, 1, drop = FALSE], data[-1, -1] - data[1, -1][col(data[-1, -1])])
-output
# Group A B C
#2 B 1 -1 1
#3 C 1 0 -1
#4 D -1 0 0
Related
I have a dataframe df which looks like this
ID A B C D E F G
1 0 0 1 -1 1 0 0
2 1 1 1 0 0 0 0
3 -1 0 1 0 -1 -1 0
.
.
.
I want to add two column at the end of each row showing the sum of positive values and the sum of negative values so df would look like this
ID A B C D E F G pos neg
1 0 0 1 -1 1 0 0 2 -1
2 1 1 1 0 0 0 0 3 0
3 -1 0 1 0 -1 -1 0 1 -3
.
.
.
I can't figure out how to do this. I have tried the following which turns the df into a list
df$neg <- rowSums(df < 0)
I have also tried the following which throws up an error message:
Error in df[, c("A", "B", "C", :
subscript out of bounds
df$neg <- rowSums(df[, c("A", "B", "C", "D", "E", "F", "G")] < 0)
Any help would be really appreciated, thanks!
We can try this
cbind(
df,
pos = rowSums(df[-1] * (df[-1] > 0)),
neg = rowSums(df[-1] * (df[-1] < 0))
)
which gives
ID A B C D E F G pos neg
1 1 0 0 1 -1 1 0 0 2 -1
2 2 1 1 1 0 0 0 0 3 0
3 3 -1 0 1 0 -1 -1 0 1 -3
Data
> dput(df)
structure(list(ID = 1:3, A = c(0L, 1L, -1L), B = c(0L, 1L, 0L
), C = c(1L, 1L, 1L), D = c(-1L, 0L, 0L), E = 1:-1, F = c(0L,
0L, -1L), G = c(0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
Using dplyr:
df %>%
mutate(pos = rowSums(replace(.[-1],.[-1]<0,0)),
neg = rowSums(replace(.[-1],.[-1]>0,0)))
I want to merge them and find the values of one dataframe that would like to be added to the existing values of the other based on the same columns.
For example:
df1
No
A
B
C
D
1
1
0
1
0
2
0
1
2
1
3
0
0
1
0
df2
No
A
B
E
F
1
1
0
1
1
2
0
1
2
1
3
2
1
1
0
Finally, I want the output table like this.
df
No
A
B
C
D
E
F
1
2
0
1
0
1
1
2
0
2
2
1
2
1
3
2
1
1
0
1
0
Note: I did try merge(), but in this case, it did not work.
Any help/suggestion would be appreciated.
Reproducible sample data
df1 <-
structure(list(No = 1:3, A = c(1L, 0L, 0L), B = c(0L, 1L, 0L),
C = c(1L, 2L, 1L), D = c(0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
df2 <-
structure(list(No = 1:3, A = c(1L, 0L, 2L), B = c(0L, 1L, 1L),
E = c(1L, 2L, 1L), F = c(1L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
You can also carry out this operation by left_joining these two data frames:
library(dplyr)
library(stringr)
df1 %>%
left_join(df2, by = "No") %>%
mutate(across(ends_with(".x"), ~ .x + get(str_replace(cur_column(), "\\.x", "\\.y")))) %>%
rename_with(~ str_replace(., "\\.x", ""), ends_with(".x")) %>%
select(!ends_with(".y"))
No A B C D E F
1 1 2 0 1 0 1 1
2 2 0 2 2 1 2 1
3 3 2 1 1 0 1 0
You can first row-bind the two dataframes and then compute the sum of each column while 'grouping' by the No column. This can be done like so:
library(dplyr)
bind_rows(df1, df2) %>%
group_by(No) %>%
summarise(across(c(A, B, C, D, E, `F`), sum, na.rm = TRUE),
.groups = "drop")
If a particular column doesn't exist in one dataframe (i.e. columns E and F), values will be padded with NA. Adding the na.rm = TRUE argument (to be passed to sum()) means that these values will get treated like zeros.
Using data.table :
library(data.table)
rbindlist(list(df1, df2), fill = TRUE)[, lapply(.SD, sum, na.rm = TRUE), No]
# No A B C D E F
#1: 1 2 0 1 0 1 1
#2: 2 0 2 2 1 2 1
#3: 3 2 1 1 0 1 0
We can use base R (with R 4.1.0). Get the values of the objects in a list ('lst1'). Then, find the union of the column names ('nm1'). Loop over the list assign to create 0 value columns with setdiff in each list element, rbind them and use aggregate to get the sum grouped by 'No'
lst1 <- mget(ls(pattern= '^df\\d+$'))
nm1 <- lapply(lst1, names) |>
{\(x) Reduce(union, x)}()
lapply(lst1, \(x) {x[setdiff(nm1, names(x))] <- 0; x}) |>
{\(x) do.call(rbind, x)}() |>
{\(dat) aggregate(.~ No, data = dat, FUN = sum, na.rm = TRUE,
na.action = na.pass)}()
# No A B C D E F
#1 1 2 0 1 0 1 1
#2 2 0 2 2 1 2 1
#3 3 2 1 1 0 1 0
I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5.
My dataset looks like this
var1 var2 var3
1 1 NA
4 3 4
3 4 5
2 5 3
So I would like it to be like this:
var1 var2 var3
0 0 NA
1 0 1
0 1 1
0 1 0
I tried to do a function and to call it
making_binary <- function (var){
var <- factor(var >= 4, labels = c(0, 1))
return(var)
}
df <- lapply(df, making_binary)
But I had an error : incorrect labels : length 2 must be 1 or 1
Where did I go wrong?
Thank you very much for your answers!
You can use :
df[] <- +(df == 4 | df == 5)
df
# var1 var2 var3
#1 0 0 NA
#2 1 0 1
#3 0 1 1
#4 0 1 0
Comparison of df == 4 | df == 5 returns logical values (TRUE/FALSE), + here turns those logical values to integer values (1/0) respectively.
If you want to apply this for selected columns you can subset the columns by position or by name.
cols <- 1:3 #Position
#cols <- grep('var', names(df)) #Name
df[cols] <- +(df[cols] == 4 | df[cols] == 5)
As far as your function is concerned you can do :
making_binary <- function (var){
var <- as.integer(var >= 4)
#which is faster version of
#var <- ifelse(var >= 4, 1, 0)
return(var)
}
df[] <- lapply(df, making_binary)
data
df <- structure(list(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L,
5L), var3 = c(NA, 4L, 5L, 3L)), class = "data.frame", row.names = c(NA, -4L))
I think ifelse would fit the problem well:
df[] <- lapply(df, function(x) ifelse(x >=1 & x <=3, 0, x))
df
var1 var2 var3
1 0 0 NA
2 4 0 4
3 0 4 5
4 0 5 0
df[] <- lapply(df, function(x) ifelse(x >=4 & x <=5, 1, x))
df
var1 var2 var3
1 0 0 NA
2 1 0 1
3 0 1 1
4 0 1 0
If you need to do the two steps at once, you can look at dplyr::case_when() or data.table::fcase().
You can simply test if the value is larger than 3, which will return TRUE and FALSE and cast this to a number:
+(x>3)
# var1 var2 var3
#[1,] 0 0 NA
#[2,] 1 0 1
#[3,] 0 1 1
#[4,] 0 1 0
In case you want this only for some columns, you have to subset them. E.g. for column 1 and 2:
+(x[1:2]>3)
#+(x[c("var1","var2")]>3) #Alternative
# var1 var2
#[1,] 0 0
#[2,] 1 0
#[3,] 0 1
#[4,] 0 1
Data:
x <- data.frame(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 5L)
, var3 = c(NA, 4L, 5L, 3L))
I have a dataset in R in long format. Each ID does not appear the same number of times (i.e. one ID might be one row, another might appear 79 rows).
e.g.
ID V1 V2
1 B 0
1 A 1
1 C 0
2 C 0
3 A 0
3 C 0
I want to create a variable which, if any of the rows for a given ID have Var2 == 1, then 1 repeats for every row of that ID
e.g.
ID V1 V2 V3
1 B 0 1
1 A 1 1
1 C 0 1
2 C 0 0
3 A 0 0
3 C 0 0
In base R we can use any - and ave for the grouping.
DF$V3 <- with(DF, ave(V2, ID, FUN = function(x) any(x == 1)))
DF
# ID V1 V2 V3
#1 1 B 0 1
#2 1 A 1 1
#3 1 C 0 1
#4 2 C 0 0
#5 3 A 0 0
#6 3 C 0 0
data
DF <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), V1 = c("B", "A",
"C", "C", "A", "C"), V2 = c(0L, 1L, 0L, 0L, 0L, 0L)), .Names = c("ID",
"V1", "V2"), class = "data.frame", row.names = c(NA, -6L))
Here's a tidyverse solution.
If V2 can only be 0 or 1:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(V3 = max(V2))
If you want to check that V2 is exactly 1.
df %>%
group_by(ID) %>%
mutate(V3 = as.numeric(any(V2 == 1)))
Another base R option is
df$V3 <- with(df, +(ID %in% which(rowsum(V2, ID) > 0)))
I am very new to R, and I sincerely appreciate your help.
The following is part of my data:
subjectID A B C D E F G H I J
S001 1 1 1 1 1 0 0
S002 1 1 1 0 0 0 0
I want to sum the rows from A to J, and so the data will look like this:
subjectID A B C D E F G H I J TOTAL
S001 1 1 1 1 1 0 0 5
S002 1 1 1 0 0 0 0 3
Thank you so much! I would like sum if variable A to J == 1.
As suggested, I post here my answers.
This is is with apply. the df[-1] is to exclude the first column (which is not numeric), the x[x == 1] is to subset the elements of x (a single row due to the 1 of the apply) with only values of 1.
df$TOTAL <- apply(df[-1], 1, function(x) sum(x[x == 1], na.rm = T))
Another (I bet much faster and) easier to code way in base R is:
df$TOTAL <- rowSums(df[-1] == 1, na.rm = T)
both have as a result this
df
subjectID A B C D E F G H I J TOTAL
1 S001 1 1 1 1 1 0 0 NA NA NA 5
2 S002 1 1 1 0 0 0 0 NA NA NA 3
Data
df <- structure(list(subjectID = structure(1:2, .Label = c("S001",
"S002"), class = "factor"), A = c(1L, 1L), B = c(1L, 1L), C = c(1L,
1L), D = c(1L, 0L), E = c(1L, 0L), F = c(0L, 0L), G = c(0L, 0L
), H = c(NA, NA), I = c(NA, NA), J = c(NA, NA)), .Names = c("subjectID",
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), class = "data.frame", row.names = c(NA,
-2L))
Another similar option to the one posted by SabDeM but using sapply to sum only numeric columns
df$Total <- rowSums(df[ ,sapply(df, is.numeric)])
Output:
subjectID A B C D E F G H I J Total
1 S001 1 1 1 1 1 0 0 NA NA NA 5
2 S002 1 1 1 0 0 0 0 NA NA NA 3