R how to do the partial row sums - r

I am very new to R, and I sincerely appreciate your help.
The following is part of my data:
subjectID A B C D E F G H I J
S001 1 1 1 1 1 0 0
S002 1 1 1 0 0 0 0
I want to sum the rows from A to J, and so the data will look like this:
subjectID A B C D E F G H I J TOTAL
S001 1 1 1 1 1 0 0 5
S002 1 1 1 0 0 0 0 3
Thank you so much! I would like sum if variable A to J == 1.

As suggested, I post here my answers.
This is is with apply. the df[-1] is to exclude the first column (which is not numeric), the x[x == 1] is to subset the elements of x (a single row due to the 1 of the apply) with only values of 1.
df$TOTAL <- apply(df[-1], 1, function(x) sum(x[x == 1], na.rm = T))
Another (I bet much faster and) easier to code way in base R is:
df$TOTAL <- rowSums(df[-1] == 1, na.rm = T)
both have as a result this
df
subjectID A B C D E F G H I J TOTAL
1 S001 1 1 1 1 1 0 0 NA NA NA 5
2 S002 1 1 1 0 0 0 0 NA NA NA 3
Data
df <- structure(list(subjectID = structure(1:2, .Label = c("S001",
"S002"), class = "factor"), A = c(1L, 1L), B = c(1L, 1L), C = c(1L,
1L), D = c(1L, 0L), E = c(1L, 0L), F = c(0L, 0L), G = c(0L, 0L
), H = c(NA, NA), I = c(NA, NA), J = c(NA, NA)), .Names = c("subjectID",
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), class = "data.frame", row.names = c(NA,
-2L))

Another similar option to the one posted by SabDeM but using sapply to sum only numeric columns
df$Total <- rowSums(df[ ,sapply(df, is.numeric)])
Output:
subjectID A B C D E F G H I J Total
1 S001 1 1 1 1 1 0 0 NA NA NA 5
2 S002 1 1 1 0 0 0 0 NA NA NA 3

Related

Add new columns to a dataframe containing sum of positive values in a row and sum of negative values in a row - R

I have a dataframe df which looks like this
ID A B C D E F G
1 0 0 1 -1 1 0 0
2 1 1 1 0 0 0 0
3 -1 0 1 0 -1 -1 0
.
.
.
I want to add two column at the end of each row showing the sum of positive values and the sum of negative values so df would look like this
ID A B C D E F G pos neg
1 0 0 1 -1 1 0 0 2 -1
2 1 1 1 0 0 0 0 3 0
3 -1 0 1 0 -1 -1 0 1 -3
.
.
.
I can't figure out how to do this. I have tried the following which turns the df into a list
df$neg <- rowSums(df < 0)
I have also tried the following which throws up an error message:
Error in df[, c("A", "B", "C", :
subscript out of bounds
df$neg <- rowSums(df[, c("A", "B", "C", "D", "E", "F", "G")] < 0)
Any help would be really appreciated, thanks!
We can try this
cbind(
df,
pos = rowSums(df[-1] * (df[-1] > 0)),
neg = rowSums(df[-1] * (df[-1] < 0))
)
which gives
ID A B C D E F G pos neg
1 1 0 0 1 -1 1 0 0 2 -1
2 2 1 1 1 0 0 0 0 3 0
3 3 -1 0 1 0 -1 -1 0 1 -3
Data
> dput(df)
structure(list(ID = 1:3, A = c(0L, 1L, -1L), B = c(0L, 1L, 0L
), C = c(1L, 1L, 1L), D = c(-1L, 0L, 0L), E = 1:-1, F = c(0L,
0L, -1L), G = c(0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
Using dplyr:
df %>%
mutate(pos = rowSums(replace(.[-1],.[-1]<0,0)),
neg = rowSums(replace(.[-1],.[-1]>0,0)))

Subtracting multiple rows from the same row in R

I am looking to subtract multiple rows from the same row within a dataframe.
For example:
Group A B C
A 3 1 2
B 4 0 3
C 4 1 1
D 2 1 2
This is what I want it to look like:
Group A B C
B 1 -1 1
C 1 0 -1
D -1 0 0
So in other words:
Row B - Row A
Row C - Row A
Row D - Row A
Thank you!
Here's a dplyr solution:
library(dplyr)
df %>%
mutate(across(A:C, ~ . - .[1])) %>%
filter(Group != "A")
This gives us:
Group A B C
1: B 1 -1 1
2: C 1 0 -1
3: D -1 0 0
Here's an approach with base R:
data[-1] <- do.call(rbind,
apply(data[-1],1,function(x) x - data[1,-1])
)
data[-1,]
# Group A B C
#2 B 1 -1 1
#3 C 1 0 -1
#4 D -1 0 0
Data:
data <- structure(list(Group = c("A", "B", "C", "D"), A = c(3L, 4L, 4L,
2L), B = c(1L, 0L, 1L, 1L), C = c(2L, 3L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-4L))
We could also replicate the first row and substract from the rest
cbind(data[-1, 1, drop = FALSE], data[-1, -1] - data[1, -1][col(data[-1, -1])])
-output
# Group A B C
#2 B 1 -1 1
#3 C 1 0 -1
#4 D -1 0 0

merge multiple columns in one table?

I have a table with several columns, I would like to make a column by combining 'R1,R2 and R3' columns in a table.
DF:
ID R1 T1 R2 T2 R3 T3
rs1 A 1 NA . NA 0
rs21 NA 0 C 1 C 1
rs32 A 1 A 1 A 0
rs25 NA 2 NA 0 A 0
Desired output:
ID R1 T1 R2 T2 R3 T3 New_R
rs1 A 1 NA . NA 0 A
rs21 NA 0 C 1 C 1 C
rs32 A 1 A 1 A 0 A
rs25 NA 2 NA 0 A 0 A
We can use tidyverse
library(tidyverse)
DF %>%
mutate(New_R = pmap_chr(select(., starts_with("R")), ~c(...) %>%
na.omit %>%
unique %>%
str_c(collape="")))
#. ID R1 T1 R2 T2 R3 T3 New_R
#1 rs1 A 1 <NA> . <NA> 0 A
#2 rs21 <NA> 0 C 1 C 1 C
#3 rs32 A 1 A 1 A 0 A
#4 rs25 <NA> 2 <NA> 0 A 0 A
If there is only one non-NA element per row, we can use coalecse
DF %>%
mutate(New_R = coalesce(!!! select(., starts_with("R"))))
Or in base R
DF$New_R <- do.call(pmin, c(DF[grep("^R\\d+", names(DF))], na.rm = TRUE))
data
DF <- structure(list(ID = c("rs1", "rs21", "rs32", "rs25"), R1 = c("A",
NA, "A", NA), T1 = c(1L, 0L, 1L, 2L), R2 = c(NA, "C", "A", NA
), T2 = c(".", "1", "1", "0"), R3 = c(NA, "C", "A", "A"), T3 = c(0L,
1L, 0L, 0L)), class = "data.frame", row.names = c(NA, -4L))
you can use the ifelse function in a nested way:
DF$New_R <- ifelse(!is.na(DF$R1), DF$R1,
ifelse(!is.na(DF$R2), DF$R2,
ifelse(!is.na(DF$R3), DF$R3, NA)))
ifelse takes three arguments, a condition, what to do if the condition is fulfilled, and what to do if the condition is not fulfilled. It can be applied to data frame column treating each raw separately. In my example it will pick the first non NA value found.
We can use apply row-wise, remove NA values and keeping only unique values.
cols <- paste0("R", 1:3)
df$New_R <- apply(df[cols], 1, function(x)
paste0(unique(na.omit(x)), collapse = ""))
df
# ID R1 T1 R2 T2 R3 T3 New_R
#1 rs1 A 1 <NA> . <NA> 0 A
#2 rs21 <NA> 0 C 1 C 1 C
#3 rs32 A 1 A 1 A 0 A
#4 rs25 <NA> 2 <NA> 0 A 0 A

Reshapping data in R to a singular matrix

This can be little difficult on what exactly I want , but I would try my best
Say here is my data in R
R1 R2 R3 R4
a b a a
b d c b
e
I want to reshape the data frame so that it would have the data in kind of a singular matrix form,like this
a b c d e
R1 1 1 0 0 0
R2 0 1 0 1 0
R3 1 0 1 0 0
R4 1 1 0 0 1
I assume this is straight forward as it seems easy but my limited knowledge on R is making this a hassle for me
Thanks for your time
What about this?
un <- sort(unique(c(as.matrix(df))))
res <- apply(df, 2, function(x) un %in% x)
rownames(res) <- un
res[] <- as.numeric(res)
t(res)
a b c d e
R1 1 1 0 0 0
R2 0 1 0 1 0
R3 1 0 1 0 0
R4 1 1 0 0 1
The following uses the plyr library's ldply function which is for transforming a list with the result being a data.frame.
data_as_list = list(R1=c('a', 'b'), R2=c('b', 'd'), R3=c('a', 'c'), R4=c('a', 'b', 'e'))
result <- ldply(data_as_list, function(item) {
sapply(letters[1:5], function(letter) letter %in% item)*1})
Given a list of character vectors, we generate a row of the resulting data.frame from each item in the list by asking whether the first 5 letters (a-e) appear in the vector (item). Multiplying by 1 is a hack to convert a boolean vector to a 1-or-0 integer vector, if that's really what you want.
Results:
.id a b c d e
1 R1 1 1 0 0 0
2 R2 0 1 0 1 0
3 R3 1 0 1 0 0
4 R4 1 1 0 0 1
To fix up the row names:
rownames(result) <- result$.id
result <- result[, -which(colnames(result)=='.id')]
Now you have:
a b c d e
R1 1 1 0 0 0
R2 0 1 0 1 0
R3 1 0 1 0 0
R4 1 1 0 0 1
Base R solution:
data_as_list = list(R1=c('a', 'b'), R2=c('b', 'd'), R3=c('a', 'c'), R4=c('a', 'b', 'e'))
stack(data_as_list)
#-----------
values ind
1 a R1
2 b R1
3 b R2
4 d R2
5 a R3
6 c R3
7 a R4
8 b R4
9 e R4
#---------
xtabs( ~ values+ind, data=stack(data_as_list) )
#-----------
ind
values R1 R2 R3 R4
a 1 0 1 1
b 1 1 0 1
c 0 0 1 0
d 0 1 0 0
e 0 0 0 1
xtabs( ~ ind+values, data=stack(data_as_list) )
#----------
values
ind a b c d e
R1 1 1 0 0 0
R2 0 1 0 1 0
R3 1 0 1 0 0
R4 1 1 0 0 1
Another approach is to use mtabulate from the "qdapTools" package. This will work for either a data.frame or a list... which should make sense, of course :-)
library(qdapTools)
x <- mtabulate(df)
x[] <- as.numeric(x > 0)
x
# V1 a b d c e
# R1 1 1 1 0 0 0
# R2 0 0 1 1 0 0
# R3 1 1 0 0 1 0
# R4 0 1 1 0 0 1
Since there are two "d" values in "R2", we use the as.numeric(x > 0) to convert to just ones and zeroes. You can drop the first column, which has counted the blanks.
I've used the sample data provided by #barerd:
df <- structure(list(R1 = structure(c(2L, 3L, 1L), .Label = c("", "a",
"b"), class = "factor"), R2 = structure(c(2L, 2L, 1L), .Label = c("b",
"d"), class = "factor"), R3 = structure(c(2L, 3L, 1L), .Label = c("",
"a", "c"), class = "factor"), R4 = structure(1:3, .Label = c("a",
"b", "e"), class = "factor")), .Names = c("R1", "R2", "R3", "R4"
), row.names = c(NA, -3L), class = "data.frame")
Here is a possibility. This could be improved to scale better.
matrix(as.numeric(rbind( ae %in% R1,
ae %in% R2,
ae %in% R3,
ae %in% R4)),4,5)
x1<-as.character(grep("[a-z]",unique(unlist(df)),value=TRUE)) #df is data
x2<-data.frame(do.call(rbind,lapply(1:ncol(df),function(i){ifelse(x1 %in% df[,i],1,0)})))
colnames(x2)<-x1
row.names(x2)<-names(df)
x2
a b d c e
R1 1 1 0 0 0
R2 0 1 1 0 0
R3 1 0 0 1 0
R4 1 1 0 0 1
First of all, I suppose this is data from a csv file or a table, which you can read into R with read.table() or read.csv().
And you should put it with dput() like:
structure(list(R1 = structure(c(2L, 3L, 1L), .Label = c("", "a",
"b"), class = "factor"), R2 = structure(c(2L, 2L, 1L), .Label = c("b",
"d"), class = "factor"), R3 = structure(c(2L, 3L, 1L), .Label = c("",
"a", "c"), class = "factor"), R4 = structure(1:3, .Label = c("a",
"b", "e"), class = "factor")), .Names = c("R1", "R2", "R3", "R4"
), row.names = c(NA, -3L), class = "data.frame")
so that we can put it into R easily.
You can rshape your data with the "reshape" library. There are many documents for reshaping data in R, including the help page, but basically you can transpose() your data, so that columns become rows. You can melt() it, so that each row becomes a unique id-variable combination like:
X1 X2 value
1 R1 1 a
2 R2 1 d
3 R3 1 a
4 R4 1 a
5 R1 2 b
6 R2 2 d
7 R3 2 c
8 R4 2 b
9 R1 3
10 R2 3 b
11 R3 3
12 R4 3 e
and then, you can cast(data, formula, function) the melted data into any shape. Since you wanted to see the distribution of the values according to R* stuff, I used the following formula:
t(cast(melt(t(t), id=c("a", "b", "c", "d", "e")), value~X1, ))[, c(2:6]
and got:
a b c d e
R1 1 1 0 0 0
R2 0 1 0 2 0
R3 1 0 1 0 0
R4 1 1 0 0 1

How to transport columns from one matrix to another matrix according to first column values

I have the following matrix:
1 a d
2 s c
4 d 0
7 f t
I want to have the following:
1 a d
2 s c
3 0 0
4 d 0
5 0 0
6 0 0
7 f t
Moreover, I would like it to be done in a way where I would not have to specify each column...
Thank you,
G
Or use merge
df2 <- merge(data.frame(V1 = seq_len(max(df[, 1]))), df, by = "V1", all.x = TRUE)
df2[is.na(df2)] <- 0
# V1 V2 V3
# 1 1 a d
# 2 2 s c
# 3 3 0 0
# 4 4 d 0
# 5 5 0 0
# 6 6 0 0
# 7 7 f t
Where df is
df <- structure(list(V1 = c(1L, 2L, 4L, 7L), V2 = c("a", "s", "d",
"f"), V3 = c("d", "c", "0", "t")), .Names = c("V1", "V2", "V3"
), class = "data.frame", row.names = c(NA, -4L))
If dat is data.frame (It is better to store mixed class columns in data.frame than in matrix)
dat2 <- as.data.frame(matrix(0, ncol=ncol(dat), nrow=max(dat$V1)))
dat2$V1 <- 1:nrow(dat2)
dat2[dat2$V1 %in% dat$V1,-1] <- unlist(dat[,-1])
dat2
# V1 V2 V3
#1 1 a d
#2 2 s c
#3 3 0 0
#4 4 d 0
#5 5 0 0
#6 6 0 0
#7 7 f t
Or you could do
dat1 <- transform(dat[rep(1:nrow(dat),c(1,diff(dat$V1))),], V1=seq_along(V1))
dat1[duplicated(dat1[,-1], fromLast=TRUE),-1] <- 0
data
dat <- structure(list(V1 = c(1L, 2L, 4L, 7L), V2 = c("a", "s", "d",
"f"), V3 = c("d", "c", "0", "t")), .Names = c("V1", "V2", "V3"
), class = "data.frame", row.names = c(NA, -4L))

Resources