Multiplying and combine two data frames in R - r

I have two data frames
data frame 1
A B C
1 1 0
0 0 0
1 1 0
data frame 2
1 2
0 4
100 0
100 4
I need to multiply and combine the columns to obtain
A1 A2 B1 B2 C1 C2
0 4 0 4 0 0
0 0 0 0 0 0
100 4 100 4 0 0

Here's one approach:
do.call(cbind, lapply(df1, "*", as.matrix(df2)))
1 2 1 2 1 2
[1,] 0 4 0 4 0 0
[2,] 0 0 0 0 0 0
[3,] 100 4 100 4 0 0
This returns a matrix. You can use as.data.frame to turn it into a data frame if it's necessary.
This is based on the following data:
df1 <- data.frame(A = c(1,0,1), B = c(1,0,1), C = 0)
df2 <- data.frame("1" = c(0,100,100), "2" = c(4,0,4),
check.names = FALSE)

Related

adding together multiple sets of columns in r

I'm trying to add several sets of columns together.
Example df:
df <- data.frame(
key = 1:5,
ab0 = c(1,0,0,0,1),
ab1 = c(0,2,1,0,0),
ab5 = c(1,0,0,0,1),
bc0 = c(0,1,0,2,0),
bc1 = c(2,0,0,0,0),
bc5 = c(0,2,1,0,1),
df0 = c(0,0,0,1,0),
df1 = c(1,0,3,0,0),
df5 = c(1,0,0,0,6)
)
Giving me:
key ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5
1 1 1 0 1 0 2 0 0 1 1
2 2 0 2 0 1 0 2 0 0 0
3 3 0 1 0 0 0 1 0 3 0
4 4 0 0 0 2 0 0 1 0 0
5 5 1 0 1 0 0 1 0 0 6
I want to add all sets of columns with 0s and 5s in them together and place them in the 0 column.
So the end result would be:
key ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5
1 1 2 0 1 0 2 0 0 1 1
2 2 0 2 0 3 0 2 0 0 0
3 3 0 1 0 1 0 1 0 3 0
4 4 0 0 0 2 0 0 2 0 0
5 5 2 0 1 1 0 1 0 0 6
I could add the columns together using 3 lines:
df$ab0 <- df$ab0 + df$ab5
df$bc0 <- df$bc0 + df$bc5
df$df0 <- df$df0 + df$df5
But my real example has over a hundred columns so I'd like to iterate over them and use apply.
The column names of the first set are contained in col0 and the names of the second set are in col5.
col0 <- c("ab0","bc0","df0")
col5 <- c("ab5","bc5","df5")
I created a function to add the columns to gether using mapply:
fun1 <- function(df,x,y) {
df[,x] <- df[,x] + df[,y]
}
mapply(fun1,df,col0,col5)
But I get an error: Error in df[, x] : incorrect number of dimensions
Thoughts?
Simply add two data frames together by their subsetted columns, assuming they will be the same length. No loops needed. All vectorized operation.
final_df <- df[grep("0", names(df))] + df[grep("5", names(df))]
final_df <- cbind(final_df, df[grep("0", names(df), invert=TRUE)])
final_df <- final_df[order(names(final_df))]
final_df
# ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5 key
# 1 2 0 1 0 2 0 1 1 1 1
# 2 0 2 0 3 0 2 0 0 0 2
# 3 0 1 0 1 0 1 0 3 0 3
# 4 0 0 0 2 0 0 1 0 0 4
# 5 2 0 1 1 0 1 6 0 6 5
Rextester demo
You could use map2 from the purrr package to iterate over the two vectors at once:
df <- data.frame(
key = 1:5,
ab0 = c(1,0,0,0,1),
ab1 = c(0,2,1,0,0),
ab5 = c(1,0,0,0,1),
bc0 = c(0,1,0,2,0),
bc1 = c(2,0,0,0,0),
bc5 = c(0,2,1,0,1),
df0 = c(0,0,0,1,0),
df1 = c(1,0,3,0,0),
df5 = c(1,0,0,0,6)
)
col0 <- c("ab0","bc0","df0")
col5 <- c("ab5","bc5","df5")
purrr::map2(col0, col5, function(x, y) {
df[[x]] <<- df[[x]] + df[[y]]
})
> df
key ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5
1 1 2 0 1 0 2 0 1 1 1
2 2 0 2 0 3 0 2 0 0 0
3 3 0 1 0 1 0 1 0 3 0
4 4 0 0 0 2 0 0 1 0 0
5 5 2 0 1 1 0 1 6 0 6
Here's an approach using tidyr and dplyr from the tidyverse meta-package.
First, I bring the table into long ("tidy") format, and split out the column into two components, and spread by the number part of those components.
Then I do the calculation you describe.
Finally, I bring it back into the original format using the inverse of step 1.
library(tidyverse)
df_tidy <- df %>%
# Step 1
gather(col, value, -key) %>%
separate(col, into = c("grp", "num"), 2) %>%
spread(num, value) %>%
# Step 2
mutate(`0` = `0` + `5`) %>%
# Step 3, which is just the inverse of Step 1.
gather(num, value, -key, - grp) %>%
unite(col, c("grp", "num")) %>%
spread(col, value)
df_tidy
key ab_0 ab_1 ab_5 bc_0 bc_1 bc_5 df_0 df_1 df_5
1 1 2 0 1 0 2 0 1 1 1
2 2 0 2 0 3 0 2 0 0 0
3 3 0 1 0 1 0 1 0 3 0
4 4 0 0 0 2 0 0 1 0 0
5 5 2 0 1 1 0 1 6 0 6

R: Generating sparse matrix with all elements as rows and columns

I have a data set with user to user. It doesn't have all users as col and row. For example,
U1 U2 T
1 3 1
1 6 1
2 4 1
3 5 1
u1 and u2 represent users of the dataset. When I create a sparse matrix using following code, (df- keep all data of above dataset as a dataframe)
trustmatrix <- xtabs(T~U1+U2,df,sparse = TRUE)
3 4 5 6
1 1 0 0 1
2 0 1 0 0
3 0 0 1 0
Because this matrix doesn't have all the users in row and columns as below.
1 2 3 4 5 6
1 0 0 1 0 0 1
2 0 0 0 1 0 0
3 0 0 0 0 1 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
If I want to get above matrix after sparse matrix, How can I do so in R?
We can convert the columns to factor with levels as 1 through 6 and then use xtabs
df1[1:2] <- lapply(df1[1:2], factor, levels = 1:6)
as.matrix(xtabs(T~U1+U2,df1,sparse = TRUE))
# U2
#U1 1 2 3 4 5 6
# 1 0 0 1 0 0 1
# 2 0 0 0 1 0 0
# 3 0 0 0 0 1 0
# 4 0 0 0 0 0 0
# 5 0 0 0 0 0 0
# 6 0 0 0 0 0 0
Or another option is to get the expanded index filled with 0s and then use sparseMatrix
library(tidyverse)
library(Matrix)
df2 <- crossing(U1 = 1:6, U2 = 1:6) %>%
left_join(df1) %>%
mutate(T = replace(T, is.na(T), 0))
sparseMatrix(i = df2$U1, j = df2$U2, x = df2$T)
Or use spread
spread(df2, U2, T)

R Join dataframe column to a partially matching grid

I have a data frame object where combinations of variables are represented by 1, but which is sparsely populated in that I do not have all combinations mapped out.
e.g.
A B C Outcome
1 0 0 700
0 1 0 900
0 0 1 450
1 1 0 280
0 1 1 100
... which is missing the potential combinations [101] and [111]
From this, I'd like to expand out all combinations of A, B, and C, taking the outcome value where the combination exists, and where not, populate Outcome with a zero.
e.g.
A B C Outcome
1 0 0 700
1 1 0 280
1 0 1 0 <- new row
1 1 1 0 <- new row
0 1 0 900
0 1 1 100
0 0 1 450
I'm afraid I don't really have any idea how to do this functionally. I've had a look at expand.grid() - for example the following also using the plyr package
expand.grid(rlply(n, c(0,1)))
which for n=3 gives
Var1 Var2 Var3
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
which pretty much gives me the grid I'm after, but I'm not clear now how to join my "Outcome" values to this grid, particularly where n is large (say 60 or 70 variables).
Any help gratefully received!
df <- read.table(text =
"A B C Outcome
1 0 0 700
0 1 0 900
0 0 1 450
1 1 0 280
0 1 1 100",
header = TRUE)
res <-
merge(
x = do.call(what = "expand.grid", lapply(head(as.list(df), - 1), unique)),
y = df,
all.x = TRUE
)
res$Outcome[is.na(res$Outcome)] <- 0
res
# A B C Outcome
# 1 0 0 0 0
# 2 0 0 1 450
# 3 0 1 0 900
# 4 0 1 1 100
# 5 1 0 0 700
# 6 1 0 1 0
# 7 1 1 0 280
# 8 1 1 1 0
Edit:
Not sure whether it should go in a separate answer, but here is a more elegant way with the tidyr package:
library(tidyr)
complete(df, A, B, C, fill = list(Outcome = 0))
If you want to avoid typing all 60 or 70 column names:
complete_(df, cols = setdiff(names(df), "Outcome"), fill = list(Outcome = 0))

How to sum and combine two data frames?

I have two data frames:
DATA1:
ID com_alc_cd com_liv_cd com_hyee_cd
A 1 0 0
B 0 0 1
D 0 0 0
C 0 1 0
DATA2:
ID com_alc_dd com_liv_dd com_hyee_dd
B 0 2 0
A 1 0 2
C 0 1 0
D 0 1 0
I want to combine the two data frames, so as to obtain the sum of the two:
SUM(DATA1, DATA2):
ID com_alc com_liv com_hyee
A 2 0 2
B 0 2 1
C 0 2 0
D 0 1 0
Try this for example( assuming that your data.frames are matrix of the same size)
d1 <- DATA1[order(DATA1$ID),]
d2 <- DATA2[order(DATA2$ID),]
data.frame(ID=d1$ID,as.matrix(subset(d1,select=-ID)) +
as.matrix(subset(d2,select=-ID)))
ID com_alc_cd com_liv_cd com_hyee_cd
1 A 2 0 2
2 B 0 2 1
4 C 0 2 0
3 D 0 1 0
EDIT general solution
library(reshape2)
## put the data in the long format
res <- do.call(rbind,lapply(list(DATA1,DATA2),melt,id.vars='ID'))
## polish names
res$variable <- gsub('(.*_.*)_.*','\\1',res$variable)
## wide format and aggregate using sum
dcast(ID~variable,data=res,fun.aggregate=sum)
ID com_alc com_hyee com_liv
1 A 2 2 0
2 B 0 1 2
3 C 0 0 2
4 D 0 0 1
You can also use aggregate
names(df1) <- names(df2)
df3 <- rbind(df1, df2)
res <- aggregate(df3[,-1], by=list(df3$ID), sum)

Binning and Naming New Columns with Mean of Binned Columns

This probably has been asked already, but I could not find it. I have a data set, where column names are numbers, and row names are sample names (see below).
"599.773" "599.781" "599.789" "599.797" "599.804" "599.812" "599.82" "599.828"
"A" 0 0 0 0 0 2 1 4
"B" 0 0 0 0 0 1 0 3
"C" 0 0 0 0 2 1 0 1
"D" 3 0 0 0 3 1 0 0
I want to bin the columns, say every 4 columns, by summation, and then name the new columns with the mean of the binned columns. For the above table I would end up with:
"599.785" "599.816"
"A" 0 7
"B" 0 4
"C" 0 4
"D" 3 4
The new column names, 599.785 and 599.816, are average of the column names that were binned. I think something like cut would work for a vector of numbers, but I am not sure how to implement it for large data frames. Thanks for any help!
colnames <- c("599.773", "599.781", "599.789", "599.797",
"599.804", "599.812" ,"599.82" ,"599.828" )
mat <- matrix(scan(), nrow=4, byrow=TRUE)
0 0 0 0 0 2 1 4
0 0 0 0 0 1 0 3
0 0 0 0 2 1 0 1
3 0 0 0 3 1 0 0
colnames(mat)=colnames
rownames(mat) = LETTERS[1:4]
sRows <- function(mat, cols) rowSums(mat[, cols])
sapply(1:(dim(mat)[2]/4), function(base) sRows(mat, base:(base+4)) )
[,1] [,2]
A 0 2
B 0 1
C 2 3
D 6 4
accum <- sapply(1:(dim(mat)[2]/4), function(base)
sRows(mat, base:(base+4)) )
colnames(accum) <- sapply(1:(dim(mat)[2]/4),
function(base)
mean(as.numeric(colnames(mat)[ base:(base+4)] )) )
accum
#-------
599.7888 599.7966
A 0 2
B 0 1
C 2 3
D 6 4
First of all Using numeric values as columns names is not a good/standard habit.
Even I am here giving a solution as the desired OP.
## read data without checking names
dt <- read.table(text='
"599.773" "599.781" "599.789" "599.797" "599.804" "599.812" "599.82" "599.828"
"A" 0 0 0 0 0 2 1 4
"B" 0 0 0 0 0 1 0 3
"C" 0 0 0 0 2 1 0 1
"D" 3 0 0 0 3 1 0 0',header=TRUE, check.names =FALSE)
cols <- as.numeric(colnames(dt))
## create a factor to groups columns
ff <- rep(c(TRUE,FALSE),each=length(cols)/2)
## using tapply to group operations by ff
vals <- do.call(cbind,tapply(cols,ff,
function(x)
rowSums(dt[,paste0(x)])))
nn <- tapply(cols,ff,mean)
## names columns with means
colnames(vals) <- nn[colnames(vals)]
vals
599.816 599.785
A 7 0
B 4 0
C 4 0
D 4 3

Resources