RevoScaleR rxDataStep rowselection fails when using variable - r

I am trying to perform a selection on a xdf file with rxDataStep.
I am using rowSelection and it works when I use explicit values but not when I use a variable, e.g.:
this works:
tmp <- rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT, UNIT_PRICE = RT_VALUE / TT_AMOUNT), varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == 1646041))
but this does not:
x <- 1646041
tmp <- rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT, UNIT_PRICE = RT_VALUE / TT_AMOUNT), varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == x))
it fails with the message:
ERROR: The sample data set for the analysis has no variables.
Caught exception in file: CxAnalysis.cpp, line: 3848. ThreadID: 31156 Rethrowing.
Caught exception in file: CxAnalysis.cpp, line: 5375. ThreadID: 31156 Rethrowing.
What is wrong here? I've been strugling with this for hours, tried every single sintax I found on the web.
Thanks.

We may need to pass it on the transformObjects
library(RevoScaleR)
rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT,
UNIT_PRICE = RT_VALUE / TT_AMOUNT),
varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == x1), transformObjects = list(x1=x))
Using a reproducible example
set.seed(100)
myData <- data.frame(x = 1:100, y = rep(c("a", "b", "c", "d"), 25),
z = rnorm(100), w = runif(100))
z1 <- 2
myDataSubset <- rxDataStep(inData = myData,
varsToKeep = c("x", "w", "z"),
rowSelection = z > zNew,
transformObjects = list(zNew=z1))
#Rows Read: 100, Total Rows Processed: 100, Total Chunk Time: 0.007 seconds
myDataSubset
# x w z
#1 20 0.03609544 2.310297
#2 64 0.79408518 2.581959
#3 96 0.07123327 2.445683
This can be also done with dplyr
library(dplyr)
myData %>%
select(x, w, z) %>%
filter(z > z1)
# x w z
#1 20 0.03609544 2.310297
#2 64 0.79408518 2.581959
#3 96 0.07123327 2.445683

Related

Using a loop to create table with results of ICC in r

I created a loop to calculate the icc between two raters.
For each rater (R1, R2) I have a data frame of the 75 variables in columns and 125 observations.
library(irr)
for (i in 1:75) {
icc <- icc(cbind.data.frame(R1[,i],R2[,i]), model="twoway", type="agreement",
unit="single")
print(icc)
}
icc returns as a list of results icc for each variable.
I tried to integrate in the loop a function that will generate a data frame for the objects of icc that interest me (value, lower and upper bounder of the 95% confident interval) but it returns in different ways separate tables:
With this first attempt it returns 75 data frames of only one line each one, even if I used an rbind command
for (i in 1:75) {
icc <- icc(cbind.data.frame(R1[,i],R2[,i]), model="twoway", type="agreement",
unit="single")
print(rbind.data.frame(cbind.data.frame(icc$value,icc$lbound,icc$ubound)))
}
in the second case it returns 75 different data frames filled each one of the icc'objects of one variable.
for (i in 1:75) {
icc <- icc(cbind.data.frame(R1[,i],R2[,i]), model="twoway", type="agreement",
unit="single")
name_lines_are_variables <- names(L1)
name_columns <- c("ICC","Low CI 95%","Up CI 95%)
tab <- matrix(c(icc$value,icc$conf.level),nrow=38,ncol=2)
dimnames(tab) <- list(name_lines_are_variables,name_columns)
print(tab)
I appreciate your help
If I've understood your post correctly, then the problem with your code is that it the results from the icc() function are not being accumulated.
You can solve this problem by declaring an empty data.frame before the for loop, and then using rbind() to append the latest results to the existing results in this data.frame.
Please refer to the code below for an implementation (refer to the comments for clarifications):
rm(list = ls())
#Packages
library(irr)
#Dummy data
R1 <- data.frame(matrix(sample(1:100, 75*125, replace = TRUE), nrow = 75, ncol = 125))
R2 <- data.frame(matrix(sample(1:100, 75*125, replace = TRUE), nrow = 75, ncol = 125))
#Data frame that will accumulate the ICC results
#Initialized with zero rows (but has named columns)
my_icc <- data.frame(R1_col = character(), R2_col = character(),
icc_val = double(), icc_lb = double(),
icc_ub = double(), icc_conflvl = double(),
icc_pval = double(),
stringsAsFactors = FALSE)
#For loop
#Iterates through each COLUMN in R1 and R2
#And calculates ICC values with these as inputs
#Each R1[, i]-R2[, j] combination's results are stored
#as a row each in the my_icc data frame initialized above
for (i in 1:ncol(R1)){
for (j in 1:ncol(R2)){
#tmpdat is just a temporary variable to hold the current calculation's data
tmpdat <- irr::icc(cbind.data.frame(R1[, i], R2[, j]), model = "twoway", type = "agreement", unit = "single")
#Results from current cauculation being appended to the my_icc data frame
my_icc <- rbind(my_icc,
data.frame(R1_col = colnames(R1)[i], R2_col = colnames(R2)[j],
icc_val = tmpdat$value, icc_lb = tmpdat$lbound,
icc_ub = tmpdat$ubound, icc_conflvl = tmpdat$conf.level,
icc_pval = tmpdat$p.value,
stringsAsFactors = FALSE))
}
}
head(my_icc)
# R1_col R2_col icc_val icc_lb icc_ub icc_conflvl icc_pval
# 1 X1 X1 0.14109954 -0.09028373 0.3570681 0.95 0.1147396
# 2 X1 X2 0.07171398 -0.15100798 0.2893685 0.95 0.2646890
# 3 X1 X3 -0.02357068 -0.25117399 0.2052619 0.95 0.5791774
# 4 X1 X4 0.07881817 -0.15179084 0.3004977 0.95 0.2511141
# 5 X1 X5 -0.12332146 -0.34387645 0.1083129 0.95 0.8521741
# 6 X1 X6 -0.17319598 -0.38833452 0.0578834 0.95 0.9297514
Thank you a lot for your help #Dunois. I just had to keep the same variable in the for() loop, because I have to compare the same variables columns for each rater, so the final code :
library(irr)
R1 <- data.frame(matrix(sample(1:100, 75*125, replace = TRUE), nrow = 75, ncol = 125))
R2 <- data.frame(matrix(sample(1:100, 75*125, replace = TRUE), nrow = 75, ncol = 125))
my_icc <- data.frame(R1_col = character(), R2_col = character(),
icc_val = double(), icc_lb = double(),
icc_ub = double(), icc_conflvl = double(),
icc_pval = double(),
stringsAsFactors = FALSE)
for (i in 1:ncol(R1)){
tmpdat <- irr::icc(cbind.data.frame(R1[, i], R2[, i]), model = "twoway", type = "agreement", unit = "single")
my_icc <- rbind(my_icc,
data.frame(R1_col = colnames(R1)[i], R2_col = colnames(R2)[i],
icc_val = tmpdat$value, icc_lb = tmpdat$lbound,
icc_ub = tmpdat$ubound, icc_conflvl = tmpdat$conf.level,
icc_pval = tmpdat$p.value,
stringsAsFactors = FALSE))
}
head(my_icc)
#R1_col R2_col icc_val icc_lb icc_ub icc_conflvl icc_pval
#1 X1 X1 0.116928667 -0.1147526 0.33551788 0.95 0.1601141
#2 X2 X2 0.006627921 -0.2200660 0.23238172 0.95 0.4773967
#3 X3 X3 -0.184898902 -0.3980084 0.04542289 0.95 0.9427605
#4 X4 X4 0.066504226 -0.1646006 0.28963006 0.95 0.2862440
#5 X5 X5 -0.035662755 -0.2603757 0.19227801 0.95 0.6196883
#6 X6 X6 -0.055329309 -0.2808315 0.17466685 0.95 0.6805675

issue with disag_model() function from disaggregation R package

I was trying to use the disaggregation package to evaluate if it could be used on the dataset I have. My original data are disaggregated, so I've aggregated them to use the disag_model function from disaggregation package and compare "fitted values" with actual values.
However when I run the function the R session aborts.
I tried to execute the disag_model function step by step and I saw that the problem is due to the use of nlminb() to optimize the a posteriori density function, but I cannot understand why it's happening and how to solve it.
Thanks for your help.
You can find the data I used at this link: https://www.dropbox.com/sh/au7l0e11trzfo19/AACpfRSUpd4gRCveUsh5JX6Ea?dl=0
Please download the folder to run the code.
This is the code I used:
library(tidyverse)
library(raster)
library(disaggregation)
library(sp)
path<- "yourPath/Data"
load(file.path(path, "myRS"))
load(file.path(path, "RAST"))
Data <- read.csv(file = paste(path, "/sim_data.csv", sep = ""))
Data$HasRes <- ifelse(Data$PN50 > runif(nrow(Data)), 1, 0)
for (i in 1:nlayers(myRS)) {
myRS#layers[[i]]#file#name<-file.path(path, "predStackl10")
}
DFCov <-
as.data.frame(raster::extract(myRS, Data[c("XCoord", "YCoord")]))
Data <- cbind(Data, DFCov)
# Remove NA
NAs <- which(is.na(rowSums(Data[names(myRS)])))
Data <- Data[-NAs, ]
Data$ISO3 <- as.factor(Data$ISO3)
world_shape <-
shapefile(file.path(path, "World.shp"))
lmic_shape <-
world_shape[(world_shape#data$ISO3 %in% levels(Data$ISO3)),]
plot(lmic_shape)
# I would like to convert Data in a SpatialPointsDataFrame object
PN50 <- Data
coordinates(PN50) <- c("XCoord", "YCoord")
is.projected(PN50) # see if a projection is defined
proj4string(PN50) <- CRS("+proj=longlat +datum=WGS84")
# compute the mean P50 within each state
PN50_mean <- aggregate(x = PN50,
by = list(Data$ISO3),
FUN = mean)
# compute the centroid of the observations coordinates for each state
PN50_centroid <-
Data %>% group_by(ISO3) %>% summarise(meanX = mean(XCoord), meanY = mean(YCoord))
# assign to each mean the centroid coordinates
PN50_agg <-
as.data.frame(
cbind(
PN50_mean = PN50_mean#data$PN50,
XCoord = PN50_centroid$meanX,
YCoord = PN50_centroid$meanY
)
)
PN50_agg$XCoord <- as.numeric(PN50_agg$XCoord)
PN50_agg$YCoord <- as.numeric(PN50_agg$YCoord)
PN50_agg$ISO3 <- as.character(PN50_centroid$ISO3)
samsiz <-
Data %>% group_by(ISO3) %>% summarise(sz = sum(SampleSize))
PN50_agg$sample_size <- as.numeric(samsiz$sz)
PN50_agg$case <- round(PN50_agg$PN50_mean * PN50_agg$sample_size)
# I would like having data in a SpatialPolygonsDataFrame format to use the disaggrgation package
library(sp)
coordinates(PN50_agg) <- c("XCoord", "YCoord")
proj4string(PN50_agg) <- CRS("+proj=longlat +datum=WGS84")
PN50_polyg <- lmic_shape
PN50_polyg#data <-
full_join(PN50_polyg#data, PN50_agg#data, by = "ISO3")
# covariates raster
covariate_stack <-
getCovariateRasters(path, shape = raster(x = paste0(path, '/multi.tif')))
names(covariate_stack)
covariate_stack2 <- dropLayer(covariate_stack, nlayers(covariate_stack))
names(covariate_stack2)
plot(covariate_stack2)
covariate_stack2 <- raster::stack(covariate_stack2)
covariate_stack2<-brick(covariate_stack2)
# population raster
extracted <- raster::extract(raster(x = paste0(path, '/multi.tif')), PN50_polyg)
n_cells <- sapply(extracted, length)
PN50_polyg#data$pop_per_cell <- PN50_polyg#data$sample_size / n_cells
population_raster <-
rasterize(PN50_polyg, covariate_stack2, field = 'pop_per_cell')
# prepare data for disag_model()
dis_data <- prepare_data(
polygon_shapefile = PN50_polyg,
covariate_rasters = covariate_stack2,
aggregation_raster = population_raster,
mesh.args = list(
max.edge = c(5, 40),
cut = 0.0005,
offset = 1
),
id_var = "ISO3",
response_var = "case",
sample_size_var = "sample_size",
na.action = TRUE,
ncores = 8
)
# Rho and p(Rho<Rho_min)
dist <- pointDistance(PN50_agg#coords, lonlat = F, allpairs = T)
rownames(dist) <- PN50_agg$ISO3
colnames(dist) <- PN50_agg$ISO3
flattenDist <- function(dist) {
up <- upper.tri(dist)
flat <- data_frame(row = rownames(dist)[row(dist)[up]],
column = rownames(dist)[col(dist)[up]],
dist = dist[up])
return(flat)
}
pair_dist <- flattenDist(dist)
d <- pair_dist$dist
k <- 0.036
CorMatern <- k * d * besselK(k * d, 1)
limits <- sp::bbox(PN50_polyg)
hypontenuse <-
sqrt((limits[1, 2] - limits[1, 1]) ^ 2 + (limits[2, 2] - limits[2, 1]) ^
2)
prior_rho <- hypontenuse / 3
p_rho <- sum(d[CorMatern <= 0.1] < prior_rho) / length(d[CorMatern <= 0.1])
# sigma and p(sigma>sigma_max)
sigma_boost <- function(data, i) {
sd(data[i] / mean(data[i]))
}
sigma <-
boot(data = dis_data$polygon_data$response,
statistic = sigma_boost,
10000)
prior_sigma <- sigma$t0
p_sigma <- sum(sigma$t >= sigma$t0) / length(sigma$t)
default_priors <-
list(
priormean_intercept = 0,
priorsd_intercept = 4,
priormean_slope = 0,
priorsd_slope = 2,
prior_rho_min = prior_rho,
prior_rho_prob = p_rho,
prior_sigma_max = prior_sigma,
prior_sigma_prob = p_sigma,
prior_iideffect_sd_max = 0.1,
prior_iideffect_sd_prob = 0.01
)
fitted_model <- disag_model(
data = dis_data,
iterations = 1000,
family = "binomial",
link = "logit",
# priors = default_priors,
field = TRUE,
iid = TRUE,
silent = TRUE
)
I was able to run the disag_model function using your dis_data object. There were no errors or crashes. I ran the following lines.
fitted_model <- disag_model(
data = dis_data,
iterations = 1000,
family = "binomial",
link = "logit",
field = TRUE,
iid = TRUE,
silent = TRUE
)
I am running on a Windows machine with 64GB RAM and 8 cores. It took over an hour and used all of my RAM for a while and up to 50% of my CPU, which is not surprising as you are fitting 5.5M pixels over the whole world. Therefore, I suspect it is related to your computer running out of resources. I suggest you try a smaller example to test it out first. Try fewer polygons and fewer pixels in each polygon.

In R & dabestr, how do I get grouped differences correctly?

Using dabestr package I'm trying to get the differences between two sets of control & test data. Moifying slightly example from help file I tried:
library(dabestr)
N <- 70
c1 <- rnorm(N, mean = 50, sd = 20)
t1 <- rnorm(N, mean = 200, sd = 20)
ID <- seq(1:N)
long.data <- tibble::tibble(ID = ID, Control1 = c1, Test1 = t1)
meandiff1 <- long.data %>%
tidyr::gather(key = Group, value = Measurement, Control1:Test1)
ID <- seq(1:N) + N
c2 <- rnorm(N, mean = 100, sd = 70)
t2 <- rnorm(N, mean = 100, sd = 70)
long.data <- tibble::tibble(ID = ID, Control2 = c2, Test2 = t2)
meandiff2 <- long.data %>%
tidyr::gather(key = Group, value = Measurement, Control2:Test2)
meandiff <- dplyr::bind_rows(meandiff1, meandiff2)
paired_mean_diff <-
dabest(meandiff, x = Group, y = Measurement,
idx = c("Control1", "Test1", "Control2", "Test2"),
paired = TRUE,
id.col = ID)
plot(paired_mean_diff)
I get these results:
So not only is everything compared to Control1 but also the paired = TRUE option seems to have no effect. I was hoping to get something similar to examples from the package page:
Any pointers on how to achieve that?
For a paired plot, you want to nest the idx keyword option as such:
paired_mean_diff <-
dabest(meandiff, x = Group, y = Measurement,
idx = list(c("Control1", "Test1"),
c("Control2", "Test2")),
paired = TRUE,
id.col = ID)

Optimize the for loop in R

DUMMY DATA SET: (difference from my data set is item_code is string in my case)
in_cluster <- data.frame(item_code = c(1:500))
in_cluster$cluster <-
sample(5, size = nrow(in_cluster), replace = TRUE)
real_sales <- data.frame(item_code = numeric(0), sales = numeric(0))
real_sales <-
data.frame(
item_code = sample(500, size = 100000, replace = TRUE),
sales = sample(500, size = 100000, replace = TRUE)
)
mean_trajectory <- data.frame(sales = c(1:52))
mean_trajectory$sales <- sample(500, size = 52, replace = TRUE)
training_df <- data.frame(
LTF_t_minus_1 = numeric(0),
LTF_t = numeric(0),
LTF_t_plus_1 = numeric(0),
RS_t_minus_1 = numeric(0),
RS_t = numeric(0),
STF_t_plus_1 = numeric(0)
)
training_df[nrow(training_df) + 1, ] <-
c(0, 0, mean_trajectory$sales[[1]], 0, 0, 19) # week 0
week = 2
I have a simple function in R in which all I do is:
system.time({
for (r in 1:nrow(in_cluster)) {
item <- in_cluster[r,]
sale_row <-
dplyr::filter(real_sales, item_code == item$item_code)
if (nrow(sale_row) > 2) {
new_df <- data.frame(
LTF_t_minus_1 = mean_trajectory$sales[[week - 1]],
LTF_t = mean_trajectory$sales[[week]],
LTF_t_plus_1 = mean_trajectory$sales[[week + 1]],
RS_t_minus_1 = sale_row$sales[[week - 1]],
RS_t = sale_row$sales[[week]],
STF_t_plus_1 = sale_row$sales[[week + 1]]
)
training_df <-
bind_rows(training_df, new_df)
}
}
})
I am quite new to R and found this really weird looking at how small the data really is yet how long (421.59 seconds to loop through 500 rows) it is taking to loop through the data frame.
EDIT_IMPORTANT: However for above given dummy data set all it took was 1.10 seconds to get the output > could this be because of having string for item_code? does it take that much time to process a string item_code. (I didn't use string for dummy data sets because I do not know how to have 500 unique strings for item_code in in_cluster, and have the same strings in real_sales as item_code)
I read through few other articles which suggested ways to optimize the R code and used bind_rows instead of rbind or using:
training_df[nrow(training_df) + 1,] <-
c(mean_trajectory$sales[[week-1]], mean_trajectory$sales[[week]], mean_trajectory$sales[[week+1]], sale_row$sales[[week-1]], sale_row$sales[[week]], sale_row$sales[[week+1]])
using bind_rows seems to have improved the performance by 36 seconds when looping through 500 rows of data frame in_cluster
Is it possible to use lapply in this scenario? I tried code below and got an error:
Error in filter_impl(.data, dots) : $ operator is invalid for
atomic vectors
myfun <- function(item, sales, mean_trajectory, week) {
sale_row<- filter(sales, item_code == item$item_code)
data.frame(
LTF_t_minus_1 = mean_trajectory$sales[[week-1]],
LTF_t = mean_trajectory$sales[[week]],
LTF_t_plus_1 = mean_trajectory$sales[[week+1]],
RS_t_minus_1 = sale_row$sales[[week-1]],
RS_t = sale_row$sales[[week]],
STF_t_plus_1 = sale_row$sales[[week+1]])
}
system.time({
lapply(in_cluster, myfun, sales= sales, mean_trajectory = mean_trajectory) %>% bind_rows()
})
Help with lapply would be appreciated, however my main target is to speed up the loop.
Ok, so there a lot of bad practices in your code.
You are operating per row
You are creating 2(!) new data frames per row (very expensive)
You are growing objects in a loop )that training_df <- bind_rows(training_df, new_df) keeps growing in each iteration while running a pretty expensive operation (bind_rows))
You are running the same operation over and over again when you could just run them once (why are you running mean_trajectory$sales[[week-1]] and al per row while mean_trajectory has nothing to do with the loop? You could just assign it afterwards).
And the list goes on...
I would suggest an alternative simple data.table solution which will perform much better. The idea is to first make a binary join between in_cluster and real_sales (and run all the operations while joining without creating extra data frames and then binding them). Then, run all the mean_trajectoryrelated lines only once. (I ignored the training_df[nrow(training_df) + 1, ] <- c(0, 0, mean_trajectory$sales[[1]], 0, 0, 19) initialization as it's irrelevant here and you can just add it afterwards using and rbind)
library(data.table) #v1.10.4
## First step
res <-
setDT(real_sales)[setDT(in_cluster), # binary join
if(.N > 2) .(RS_t_minus_1 = sales[week - 1], # The stuff you want to do
RS_t = sales[week], # by condition
STF_t_plus_1 = sales[week + 1]),
on = "item_code", # The join key
by = .EACHI] # Do the operations per each join
## Second step (run the `mean_trajectory` only once)
res[, `:=`(LTF_t_minus_1 = mean_trajectory$sales[week - 1],
LTF_t = mean_trajectory$sales[week],
LTF_t_plus_1 = mean_trajectory$sales[week + 1])]
Some benchmarks:
### Creating your data sets
set.seed(123)
N <- 1e5
N2 <- 5e7
in_cluster <- data.frame(item_code = c(1:N))
real_sales <-
data.frame(
item_code = sample(N, size = N2, replace = TRUE),
sales = sample(N, size = N2, replace = TRUE)
)
mean_trajectory <- data.frame(sales = sample(N, size = 25, replace = TRUE))
training_df <- data.frame(
LTF_t_minus_1 = numeric(0),
LTF_t = numeric(0),
LTF_t_plus_1 = numeric(0),
RS_t_minus_1 = numeric(0),
RS_t = numeric(0),
STF_t_plus_1 = numeric(0)
)
week = 2
###############################
################# Your solution
system.time({
for (r in 1:nrow(in_cluster)) {
item <- in_cluster[r,, drop = FALSE]
sale_row <-
dplyr::filter(real_sales, item_code == item$item_code)
if (nrow(sale_row) > 2) {
new_df <- data.frame(
LTF_t_minus_1 = mean_trajectory$sales[[week - 1]],
LTF_t = mean_trajectory$sales[[week]],
LTF_t_plus_1 = mean_trajectory$sales[[week + 1]],
RS_t_minus_1 = sale_row$sales[[week - 1]],
RS_t = sale_row$sales[[week]],
STF_t_plus_1 = sale_row$sales[[week + 1]]
)
training_df <-
bind_rows(training_df, new_df)
}
}
})
### Ran forever- I've killed it after half an hour
######################
########## My solution
library(data.table)
system.time({
res <-
setDT(real_sales)[setDT(in_cluster),
if(.N > 2) .(RS_t_minus_1 = sales[week - 1],
RS_t = sales[week],
STF_t_plus_1 = sales[week + 1]),
on = "item_code",
by = .EACHI]
res[, `:=`(LTF_t_minus_1 = mean_trajectory$sales[week - 1],
LTF_t = mean_trajectory$sales[week],
LTF_t_plus_1 = mean_trajectory$sales[week + 1])]
})
# user system elapsed
# 2.42 0.05 2.47
So for 50MM rows the data.table solution ran for about 2 secs, while your solution ran endlessly until I've killed it (after half an hour).

Example quaternion multiplication in R

My question
multiplying numbers and symbols in R was answered and here I would like to give an example of using this for quaternion multiplication. Actually, I am using this on a much larger set (a group of 256 elements) but the principle is the same. I'm very new to working with data.tables so any additional tips are appreciated.
groupMult = data.table(
e = c("i","j","k", "e"),
i = c("-e","-k","j", "i"),
j = c("k","-e","-i", "j"),
k = c("-j","i","-e", "k")
);
row.names(groupMult) = c("i", "j", "k", "e");
setkey(groupMult);
# Find X*Y with X = 2i - 3j, Y = k - 4e
X = data.table(i = 2, j = -3);
Y = data.table(k = 1, e = -4);
# reduce groupMult to the vectors we need for multiplication
multMa = groupMult[names(X), names(Y), with = F];
# repeat values of Y ncol(X) times
multY = Y[rep(seq_len(nrow(Y)), each=ncol(X)),];
# repeat values of X ncol(Y) times
multX = t(X[rep(seq_len(nrow(X)), each=ncol(Y)),]);
# coefficient matrix
multMaNum = multY*multX;
row.names(multMaNum) = names(X);
# elementwise multiplicaton of multMaNum with multMa
res = mapply(paste, multMaNum, multMa, MoreArgs=list(sep='*') )
res[] <- sapply(res , function(x) sub("(.*)([-])(.*)", "\\2\\1\\3", x));
# collapse all elements of the data.table to get final result
res = paste(lapply(res, paste, collapse = " "), collapse = " + ");
> res
[1] "-2*j + -3*i + -8*i + 12*j"

Resources