Let the following be the dataset:
What I need to do is to create new columns wherein I need to multiply all a columns with b columns and name the newly created column as
a1_b1, a1_b2........ a1_b4, a2_b1, a2_b2 as shown in the figure.
I am using R for data analysis. Even though I have stated only two columns by two columns, in reality, it is 1600 by 25. Hence the question.
This might be fast enough:
set.seed(42)
DF <- data.frame(a1 = sample(1:10),
a2 = sample(1:10),
b1 = sample(1:10),
b2 = sample(1:10))
a <- grep("a", names(DF))
b <- grep("b", names(DF))
combs <- expand.grid(a, b)
res <- do.call(mapply, c(list(FUN = \(...) do.call(`*`, DF[, c(...)])), combs))
colnames(res) <- paste(names(DF)[combs[[1]]], names(DF)[combs[[2]]], sep = "_")
cbind(DF, res)
# a1 a2 b1 b2 a1_b1 a2_b1 a1_b2 a2_b2
#1 1 8 9 3 9 72 3 24
#2 5 7 10 1 50 70 5 7
#3 10 4 3 2 30 12 20 8
#4 8 1 4 6 32 4 48 6
#5 2 5 5 10 10 25 20 50
#6 4 10 6 8 24 60 32 80
#7 6 2 1 4 6 2 24 8
#8 9 6 2 5 18 12 45 30
#9 7 9 8 7 56 72 49 63
#10 3 3 7 9 21 21 27 27
The operation in the question is the transpose of the KhatriRao product. We use the Matrix package which comes with R so it does not have to be installed. Using the input in the Note at the end,
pick out the two portions, transpose them, use KhatriRao and transpose back giving a sparse matrix (class "dgCMatrix"). We can use as.matrix to convert to a dense matrix as shown or as.data.frame(as.matrix(...)) to convert to a data.frame.
library(Matrix)
rownames(dat) <- 1:nrow(dat)
ix <- grep("a", colnames(dat))
as.matrix(t(KhatriRao(t(dat[, -ix]), t(dat[, ix]), make.dimnames = TRUE)))
giving:
a1:b1 a2:b1 a1:b2 a2:b2
1 101 838.3 108.3 898.89
2 204 1050.6 220.6 1136.09
3 309 1957.0 357.0 2261.00
4 416 1664.0 464.0 1856.00
5 525 1638.0 578.0 1803.36
6 749 2118.6 838.6 2372.04
Note
dat <- setNames(cbind(BOD, BOD + 100), c("a1", "a2", "b1", "b2"))
dat
giving
a1 a2 b1 b2
1 1 8.3 101 108.3
2 2 10.3 102 110.3
3 3 19.0 103 119.0
4 4 16.0 104 116.0
5 5 15.6 105 115.6
6 7 19.8 107 119.8
Related
I wrote a simple function that produces all combinations of the input (a vector). Here the input vector is basically a sequence of 4 coordinates (x, y) as mentioned inside the function as a, b,c, and d.
intervals<-function(x1,y1,x2,y2,x3,y3,x4,y4){
a<-c(x1,y1)
b<-c(x2,y2)
c<-c(x3,y3)
d<-c(x4,y4)
union<-expand.grid(a,b,c,d)
union
}
intervals(2,10,3,90,6,50,82,7)
> intervals(2,10,3,90,6,50,82,7)
Var1 Var2 Var3 Var4
1 2 3 6 82
2 10 3 6 82
3 2 90 6 82
4 10 90 6 82
5 2 3 50 82
6 10 3 50 82
7 2 90 50 82
8 10 90 50 82
9 2 3 6 7
10 10 3 6 7
11 2 90 6 7
12 10 90 6 7
13 2 3 50 7
14 10 3 50 7
15 2 90 50 7
16 10 90 50 7
>
Now I want to find (max of x) and (min of y) for each row of the given output. E.g. row 2: we have 4 values (10, 3, 6, 82). Here (3,6,82) are from x (x2,x3,x4) and 10 is basically from y (y1). Thus max of x is 82, and the min of y is 10.
So what I want is two values from each row.
I do not actually know how to approach this kind of logical command. Any idea or suggestions?
You can pass x and y vector separately to the function. Use expand.grid to create all combinations of the vector and get max of x and min of y from each row.
intervals<-function(x, y){
tmp <- do.call(expand.grid, rbind.data.frame(x, y))
names(tmp) <- paste0('col', seq_along(tmp))
result <- t(apply(tmp, 1, function(p) {
suppressWarnings(c(max(p[p %in% x]), min(p[p %in% y])))
}))
result[is.infinite(result)] <- NA
result <- as.data.frame(result)
names(result) <- c('max_x', 'min_x')
result
}
intervals(c(2,3,6,82), c(10, 90, 50, 7))
# max_x min_x
#1 82 NA
#2 82 10
#3 82 90
#4 82 10
#5 82 50
#6 82 10
#7 82 50
#8 82 10
#9 6 7
#10 6 7
#11 6 7
#12 6 7
#13 3 7
#14 3 7
#15 2 7
#16 NA 7
Is there a way to create multiple variables in a loop. For example, if I have a variable, called 'test' among others, in my data frame, how can I create a series of new variables called say 'test1', 'test2', ... 'testn' that are defined as test^1, test^2... test^n
As an example
mynum <- 1:10
myletters <- letters[1:10]
mydf <- data.frame(mynum, myletters)
mydf
mynum myletters
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
for (i in 1:5)
{paste0(var, i) <- mynum^i
}
But it errors out.
I am trying to create variables like var1, var2, var3 etc which are mynum^1, mynum^2, mynum^3 etc.
Best regards
Deepak
You can use lapply to create new columns and combine them using do.call + cbind.
n <- 1:5
mydf[paste0('var', n)] <- do.call(cbind, lapply(n, function(x) mydf$mynum^x))
mydf
# mynum myletters var1 var2 var3 var4 var5
#1 1 a 1 1 1 1 1
#2 2 b 2 4 8 16 32
#3 3 c 3 9 27 81 243
#4 4 d 4 16 64 256 1024
#5 5 e 5 25 125 625 3125
#6 6 f 6 36 216 1296 7776
#7 7 g 7 49 343 2401 16807
#8 8 h 8 64 512 4096 32768
#9 9 i 9 81 729 6561 59049
#10 10 j 10 100 1000 10000 100000
Or with purrr's map_dfc
mydf[paste0('var', n)] <- purrr::map_dfc(n, ~mydf$mynum^.x)
Try this, you have to take into account that you have to move the position of the new variables. That is why I use i+2 in the loop. Here the code:
#Data
mynum <- 1:10
myletters <- letters[1:10]
mydf <- data.frame(mynum, myletters,stringsAsFactors = F)
The loop:
#Loop
for (i in 1:5)
{
mydf[,i+2] <- mydf[,'mynum']^i
names(mydf)[i+2] <- paste0('var',i)
}
Output:
mynum myletters var1 var2 var3 var4 var5
1 1 a 1 1 1 1 1
2 2 b 2 4 8 16 32
3 3 c 3 9 27 81 243
4 4 d 4 16 64 256 1024
5 5 e 5 25 125 625 3125
6 6 f 6 36 216 1296 7776
7 7 g 7 49 343 2401 16807
8 8 h 8 64 512 4096 32768
9 9 i 9 81 729 6561 59049
10 10 j 10 100 1000 10000 100000
An option with map
library(dplyr)
library(purrr)
map_dfc(1:5, ~ mydf$mynum^.x) %>%
rename_all(~ str_replace(., '\\.+', 'var')) %>%
bind_cols(mydf, .)
I have a dataframe with multiple columns. I have another dataframe with two columns, factor and coefficient. I want to create a new column in the initial dataframe (mydata) that is the sum of multiplying each element in each row of mydata(a:e) by the coefficients (a:e) in df. The result for the first row in the newcol should be 64 (1*1 + 2*2 + 3*3 + 4*4 + 7*5). Ideally, I would be able to somehow replicate this 20+ times with different coefficients.
mydata <- data.frame(a = 1:10, b = 2:11, c = 3:12, d = 4:13, d_1 = 5:14, d_2 = 6:15, d_3 = 7:16, e = 8:17)
df <- data.frame(factor = c('a','b','c','d','e'), coefficient = 1:5)
mydata$newcol <- mydata[,c("a","b","c","d","e")] %*% df$coefficient
mydata$newcol2 <- mydata[,c("a","b","c","d_1","e")] %*% df$coefficient
Any advice would be helpful!
We can use sweep here, subset mydata based on factor column in df and multiply it with coefficient for each element and then take rowSums to calculate the sum.
mydata$newcol <- rowSums(sweep(mydata[as.character(df$factor)], 2,df$coefficient, `*`))
mydata
# a b c d d_1 d_2 d_3 e newcol
#1 1 2 3 4 5 6 7 8 70
#2 2 3 4 5 6 7 8 9 85
#3 3 4 5 6 7 8 9 10 100
#4 4 5 6 7 8 9 10 11 115
#5 5 6 7 8 9 10 11 12 130
#6 6 7 8 9 10 11 12 13 145
#7 7 8 9 10 11 12 13 14 160
#8 8 9 10 11 12 13 14 15 175
#9 9 10 11 12 13 14 15 16 190
#10 10 11 12 13 14 15 16 17 205
Or we can also transpose mydata and multiply the coefficient and get colSums.
colSums(t(mydata[as.character(df$factor)]) * df$coefficient)
I'm trying to combine data frames (hundreds of them), but they have different numbers of rows.
df1 <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))
df2 <- data.frame(c(7,5,3,4,5,8,5), c(43,56,23,78,89,45,78))
df3 <- data.frame(c(7,5,3,4,5,8,5,6,7), c(43,56,23,78,89,45,78,56,67))
colnames(df1) <- c("xVar1","xVar2")
colnames(df2) <- c("yVar1","yVar2")
colnames(df3) <- c("zVar1","zVar2")
a1 <- list(df1,df2,df3)
a1 is what is my initial data actually looks like when I get it.
Now if I do:
b1 <- as.data.frame(a1)
I get an error, because the # of rows is not the same in the data (this would work fine if the # of rows was the same).
How do I make the # of rows equal or work around this issue?
I would like to be able to merge the data in this way (here is a working example with the same # of rows):
df1b <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))
df2b <- data.frame(c(7,5,3,4,6), c(43,56,24,48,89))
df3b <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))
colnames(df1b) <- c("xVar1","xVar2")
colnames(df2b) <- c("yVar1","yVar2")
colnames(df3b) <- c("zVar1","zVar2")
a2 <- list(df1b,df2b,df3b)
b2 <- as.data.frame(a2)
Thanks!
cbind.fill from rowr provides functionality for this and fills missing elements with NA:
library(purrr)
library(rowr)
b1 <- purrr::reduce(a1,cbind.fill,fill=NA)
One can add a key (row count as variable value in this case) to each dataframe then merge by the key.
# get list of dfs (should prob import data into a list of dfs instead)
list_df<-mget(ls(pattern = "df[0-9]"))
#add newcolumn -- "key"
list_df<-lapply(list_df, function(df, newcol) {
df[[newcol]]<-seq(nrow(df))
return(df)
}, "key")
#merge function
MergeAllf <- function(x, y){
df <- merge(x, y, by= "key", all.x= T, all.y= T)
}
#pass list to merge funct
library(tidyverse)
data <- Reduce(MergeAllf, list_df)%>%
select(key, everything())#reorder or can drop "key"
data
key xVar1 xVar2 yVar1 yVar2 zVar1 zVar2
1 1 7 43 7 43 7 43
2 2 5 56 5 56 5 56
3 3 3 23 3 23 3 23
4 4 4 78 4 78 4 78
5 5 5 89 5 89 5 89
6 6 NA NA 8 45 8 45
7 7 NA NA 5 78 5 78
8 8 NA NA NA NA 6 56
9 9 NA NA NA NA 7 67
Solution 1
You can achieve this with rbindlist(). Note that the column names will be the column names of the first data frame in the list:
library(data.table)
b1 = data.frame(rbindlist(a1))
> b1
xVar1 xVar2
1 7 43
2 5 56
3 3 23
4 4 78
5 5 89
6 7 43
7 5 56
8 3 23
9 4 78
10 5 89
11 8 45
12 5 78
13 7 43
14 5 56
15 3 23
16 4 78
17 5 89
18 8 45
19 5 78
20 6 56
21 7 67
Solution 2
Alternatively, you make all the columns have the same name, then bind by row:
b1 = lapply(a1, setNames, c("Var1","Var2"))
Now you can bind by rows:
b1 = do.call(dplyr::bind_rows, b1)
> b1
Var1 Var2
1 7 43
2 5 56
3 3 23
4 4 78
5 5 89
6 7 43
7 5 56
8 3 23
9 4 78
10 5 89
11 8 45
12 5 78
13 7 43
14 5 56
15 3 23
16 4 78
17 5 89
18 8 45
19 5 78
20 6 56
21 7 67
I have a column with 10 random numbers, from that I want to create a new column that have switched places for every pair, see example for how I mean. How would you do that?
column newcolumn
1 5
5 1
7 6
6 7
25 67
67 25
-10 2
2 -10
-50 36
36 -50
Taking advantage of the fact that R will replicate smaller vectors when adding them to larger vectors, you can:
a <- data.frame(column=c(1,5,7,6,25,67,-10,2,50,36))
a$newColumn <- a$column[seq(nrow(a)) + c(1, -1)]
Something like this.
a <- data.frame(column=c(1,5,7,6,25,67,-10,2,50,36))
a$newColumn <- 0
a[seq(1,nrow(a),by=2),"newColumn"]<-a[seq(2,nrow(a),by=2),"column"]
a[seq(2,nrow(a),by=2),"newColumn"]<-a[seq(1,nrow(a),by=2),"column"]
# results
column newColumn
1 1 5
2 5 1
3 7 6
4 6 7
5 25 67
6 67 25
7 -10 2
8 2 -10
9 50 36
10 36 50
Here is a base R one-liner: We can cast column as 2 x nrow(df)/2 matrix, swap rows, and recast as vector.
df$newcolumn <- c(matrix(df$column, ncol = nrow(df) / 2)[c(2,1), ]);
# column newcolumn
#1 1 5
#2 5 1
#3 7 6
#4 6 7
#5 25 67
#6 67 25
#7 -10 2
#8 2 -10
#9 -50 36
#10 36 -50
Sample data
df <- read.table(text =
"column
1
5
7
6
25
67
-10
2
-50
36", header = T)
Another option would be to use ave and rev
transform(df, newCol = ave(x = df$column, rep(1:5, each = 2), FUN = rev))
# column newCol
#1 1 5
#2 5 1
#3 7 6
#4 6 7
#5 25 67
#6 67 25
#7 -10 2
#8 2 -10
#9 -50 36
#10 36 -50
The part rep(1:5, each = 2) creates a grouping variable ("pairs") for each of which we reverse the elements.
Here's a compact way:
a$new_col <- c(matrix(a$column,2)[2:1,])
# column new_col
# 1 1 5
# 2 5 1
# 3 7 6
# 4 6 7
# 5 25 67
# 6 67 25
# 7 -10 2
# 8 2 -10
# 9 50 36
# 10 36 50
The idea is to write in a 2 row matrix, switch the rows, and unfold back in a vector.