I have this table with two columns, major_activity_area and word_stem.
major_activity_area
word_stem
Youth Development
program
Youth Development
girl
Youth Development
youth
Youth Development
school
Religion Related Spiritual Development
service
Religion Related Spiritual Development
provid
Religion Related Spiritual Development
program
Religion Related Spiritual Development
hous
What I want to do is to make major_Activity_areas new columns and word_stem words to be listed under each columns. Such as:
youth development.
Religion Related Spiritual Development
program.
servic
girl.
provid
youth
program
school
hous
I would appreciate any help! :)
Try the transpose function t()
since you did not give a sample dataset dput()
I'll just create a dummy dataframe from mtcars
df.1<-mtcars%>%
head(10)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
using the transpose function in r, t(dataframe)
df.2 <- t(df.1)
which uses the first column as headers and gives the result below
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant Duster 360 Merc 240D
mpg 21.00 21.000 22.80 21.400 18.70 18.10 14.30 24.40
cyl 6.00 6.000 4.00 6.000 8.00 6.00 8.00 4.00
disp 160.00 160.000 108.00 258.000 360.00 225.00 360.00 146.70
hp 110.00 110.000 93.00 110.000 175.00 105.00 245.00 62.00
drat 3.90 3.900 3.85 3.080 3.15 2.76 3.21 3.69
wt 2.62 2.875 2.32 3.215 3.44 3.46 3.57 3.19
qsec 16.46 17.020 18.61 19.440 17.02 20.22 15.84 20.00
vs 0.00 0.000 1.00 1.000 0.00 1.00 0.00 1.00
am 1.00 1.000 1.00 0.000 0.00 0.00 0.00 0.00
gear 4.00 4.000 4.00 3.000 3.00 3.00 3.00 4.00
carb 4.00 4.000 1.00 1.000 2.00 1.00 4.00 2.00
Merc 230 Merc 280
mpg 22.80 19.20
cyl 4.00 6.00
disp 140.80 167.60
hp 95.00 123.00
drat 3.92 3.92
wt 3.15 3.44
qsec 22.90 18.30
vs 1.00 1.00
am 0.00 0.00
gear 4.00 4.00
carb 2.00 4.00
If this is not what you're looking for, kindly share a sample dataset to be able to produce exactly what you want using dput()
Assuming the str of data is tibble/ data frame, the function "pivot_wider" from tidyverse package would be a solution. I named your data as formal name (df) then calling the function. The code is like this :
library(tidyverse)
dfw <- pivot_wider( df, names_from =
'major_activity_area',values_from = 'word_stem') %>% unnest()
dfw
Related
I would like to pass quoted variables in the group argument of geom_col_wrap to the split_group function.
# I deleted the rest of the function for readability
geom_col_wrap = function(data, mapping, group, ...) {
data |>
split_group(group)
}
# This function was based on the `tidytable` package
split_group = function(data, ...) {
by_quote = as.list(substitute(...()))
by = sapply(by_quote, deparse)
split = vctrs::vec_split(data, data[c(by)])
out = split[["val"]]
names = do.call(paste, c(split[["key"]], sep = "_"))
names(out) = names
return(out)
}
split_group use substitute to quote variables, here is the problem. How can I make split_group recognize quote variables from group argument? I know it is easy to solve using rlang, but I need a R base solution.
split_group(mtcars, vs, am)
$`0_1`
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
...
$`1_1`
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
...
$`1_0`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
...
$`0_0`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
...
geom_col_wrap(
mtcars,
mapping = ggplot2::aes(x = cyl, y = hp, color = am),
group = c(vs, am)
)
Error in `[.data.frame`(data, c(by)) : undefined columns selected
This error comes from as.list(substitute(...())). It does not unquoted the group argument. Why?
Note: I cannot use dots arg to solve the problem.
Using the miraculous ...() chain, explanation is given here.
split_group <- \(x, ...) split(x, x[, sapply(substitute(...()), as.character)])
split_group(mtcars, vs, am)
# $`0.0`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
# Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.40 0 0 3 3
# ...
#
# $`1.0`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# ...
#
# $`0.1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Porsche 914-2 26 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# ...
#
# $`1.1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
You basically need the base R version of rlang's {{ group }} or !!enquo(group) workflow. Which would be using substitute() to grab your group argument, and then using .(group) inside bquote().
However bquote() helps you build the expression, we then need to use eval() to evaluate your new expression.
Another thing - you're using deparse() in split_group() which would convert c(vs, am) to "c(vs, am)". Instead we'll need to mimic tidyselect so you can use c() style selection (that also still works without c() for a single column).
Put together it looks like this.
split_group = function(data, ...) {
by_quote = as.list(substitute(...()))
# Mimic tidyselect
cols = as.list(seq_along(data))
names(cols) = names(data)
by = unlist(lapply(by_quote, eval, cols))
split = vctrs::vec_split(data, data[c(by)])
out = split[["val"]]
names = do.call(paste, c(split[["key"]], sep = "_"))
names(out) = names
return(out)
}
geom_col_wrap = function(data, mapping, group, ...) {
# Use substitute/bquote to "unquote" group arg inside split_group function
# Much like using `{{ group }}` or `!!enquo(group)` in rlang
group = substitute(group)
eval(bquote(
data |>
split_group(.(group))
))
}
geom_col_wrap(
mtcars,
mapping = ggplot2::aes(x = cyl, y = hp, color = am),
group = c(vs, am)
)
#> $`0_1`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#>
#> $`1_1`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#>
#> $`1_0`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#>
#> $`0_0`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Any reason you can't use rlang? vctrs depends on rlang so you're already sort of using it anyway.
I have a tibble
a <- tribble(~names,"|david:123|",)
and I've seen code that does the following but, not sure what it does.
a %>% split(.$names)
split splits the dataframe based on values in a column. You have provided one row data which is not helpful to demonstrate what it does. Let's consider the inbuilt mtcars dataset.
The unique values in cyl column of mtcars dataset are 6, 4, 8.
unique(mtcars$cyl)
#[1] 6 4 8
When we use mtcars %>% split(.$cyl) it divides mtcars dataset into list of length 3 where each list consists of one unique cyl value.
temp <- mtcars %>% split(.$cyl)
temp[[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#...
temp[[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#...
temp[[3]]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#...
As we can see that mtcars[[1]] has all the rows where cyl = 4, mtcars[[2]] has rows where cyl = 6, mtcars[[3]] has all cyl = 8.
Similarly, for your case, a %>% split(.$names) splits dataframe/tibble into list of unique names from the data. .$names is to extract the names column from a dataframe.
[i] indicates where I have to iterate pearsons coefficient over the columns and how to convert this into a dataframe attached onto a variable?
Code example:
*INSTEAD OF DOING THIS*
F.ReedBunting.pear<- cor.test(W_farmland_mean$Years,W_farmland_mean$ReedBunting,method='pearson')
F.Whitethroat.pear<- cor.test(W_farmland_mean$Years,W_farmland_mean$Whitethroat,method='pearson')
F.Rook.pear<- cor.test(W_farmland_mean$Years,W_farmland_mean$Rook,method='pearson')
.
.
.
*HOW CAN IT BE DONE QUICKLY WITH THIS*
workspaceone <- sapply(W_farmland_mean, function(x){
cor.test(W_farmland_mean$Years, W_farmland_mean[, 1[i]], method = 'pearson')
})
I think you should try:
result_cor <- apply(W_farmland_mean,2,function(x){cor.test(W_farmland_mean$Years,x, method = 'pearson')$estimate})
It will extract the Pearson coefficient of the comparison of each columns with the column years of your dataset.
Example
With the mtcars dataset:
df <- mtcars[c(1:10),]
> df
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
And if we apply the function:
result_cor = apply(df,2, function(x){cor.test(x,df$mpg,method ='pearson')$estimate})
And you get the following output:
> result_cor
mpg cyl disp hp drat wt qsec
1.0000000 -0.8614165 -0.7739868 -0.8937223 0.5413585 -0.5991894 0.5494131
vs am gear carb
0.4796102 0.2919683 0.6646449 -0.3711956
I am splitting the large data frame into smaller data frame each of size 5000 records. But after performing the rbind operation on each subsample I want to shuffle the subsample data. When I tried to shuffle data it is not throwing me any error or shuffling the data. Can any one help me in reshuffling the data
# splitting the dataframe into smaller dataframes
test_list <-split(New_data_zero, (seq(nrow(New_data_zero))-1) %/% 5000)
# performing the rbind to add data for all the data frames
for (i in 1: length(test_list)){
test_list[[i]] <- rbind(test_list[[i]],New_data)
}
# Trying to shuffle the each subsample but not performing the operation
for (i in 1: length(test_list)){
test_list[[i]] <- test_list[[i]][sample(1:nrow(test_list[[i]])),]
}
Try this
myfun <- function(df, numobs) {
sdf <- split(df, rep(1:ceiling(nrow(df)/numobs), each=numobs))
lapply(sdf, function(x) x[sample(nrow(x)),])
}
set.seed(1)
myfun(mtcars, 5)
Output
$`1`
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
$`2`
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 280 19.2 6 167.6 123 3.92 3.44 18.30 1 0 4 4
Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
Valiant 18.1 6 225.0 105 2.76 3.46 20.22 1 0 3 1
Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
etc
I have seen many questions that touch on this topic but haven't yet found an answer. If I have missed a question that does answer this question, please do mark this and point us to the question.
Scenario: We have a benchmark dataset, we have imputation methods, we systematically delete values from the benchmark and use two different imputation methods. Thus we have a benchmark, imputedData1 and imputedData2.
Question: Is there a function that can produce a number that represents the difference between the benchmark and imputedData1 or/and the difference between the benchmark and imputedData2. Ie function(benchmark, imputedData1) = 3.3 and function(benchmark, imputedData2) = 2.8
Note: Datasets are numerical, datasets are the same size, method should work at the data level if possible (ie not creating a regression and comparing regressions - unless it can work with ANY numerical dataset).
Reproducible datasets, they have only been changed in the first row:
benchmark:
> head(mtcars,n=10)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
imputedData1:
> head(mtcars,n=10)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 22.0 4 108.0 100 3.90 2.200 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
imputedData2:
> head(mtcars,n=10)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 18.0 6 112.0 105 3.90 2.620 16.46 0 0 3 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
I have tried to use RMSE (root mean squared error) but it didn't work very well so I am trying to find other ways to tackle this problem.
You could also check out package ftsa. It has about 20 error measures that can be calculated. In your case, a scaled error would make sense as the units differ from column to column.
library(ftsa)
error(forecast=unlist(imputedData1),true=unlist(bench),
insampletrue = unlist(bench), method = "mase")
[1] 0.035136
error(forecast=unlist(imputedData2),true=unlist(bench),
insampletrue = unlist(bench), method = "mase")
[1] 0.031151
data
bench <- read.table(text='mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4',header=TRUE,stringsAsFactors=FALSE)
imputedData1 <- read.table(text='mpg cyl disp hp drat wt qsec vs am gear carb
22.0 4 108.0 100 3.90 2.200 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4',header=TRUE,stringsAsFactors=FALSE)
imputedData2 <- read.table(text='mpg cyl disp hp drat wt qsec vs am gear carb
18.0 6 112.0 105 3.90 2.620 16.46 0 0 3 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4',header=TRUE,stringsAsFactors=FALSE)
One possible way is to calculate a norm of their difference and prefer the imputation method that minimises this value. There are different matrix norms for different purposes. I'll point you to the wikipedia as a starting point - https://en.wikipedia.org/wiki/Matrix_norm.
In the absence of any specifics about your data I can't really say which you should choose but one method could be to create your own index that averages across different matrix norms and select the imputation method that minimizes this average. Or you could just eyeball them and with any luck one of the methods is a clear winner across most or all matrix norms.
A simple implementation of what was discussed in the comments that gives a result with same order of magnitude as P Lapointe's answer, just FYI.
library(magrittr)
center_and_reduce_df <- function(df,bm){
centered <- mapply('-',df,sapply(bm,mean)) %>% as.data.frame(stringsAsFactors= FALSE)
reduced <- mapply('/',centered,sapply(bm,sd)) %>% as.data.frame(stringsAsFactors= FALSE)
}
mean((center_and_reduce_df(id1,bm) - center_and_reduce_df(bm,bm))^2) # 0.03083166
Not quite sure what you mean by "difference", but if you just want to know how much each cell differs from each cell on average (given the matrices are of the same shape and have indentical cols/rows), you could do absolute difference, or use Euclidean distance, or Kolmogorov-Smirnov distance - depending again on what you mean by "difference".
abs(head(mtcars) - (head(mtcars)*0.5)) # differences by cell
mean( as.matrix(abs(head(mtcars) - (head(mtcars)*0.5)))) # mean abs difference
dist( t(data.frame(as.vector(as.matrix(head(mtcars))), (as.vector(as.matrix(head(mtcars)*0.5)))))) # Euclidean; remove t() to see element by element
ks.test( as.vector(as.matrix(head(mtcars))), (as.vector(as.matrix(head(mtcars)*0.5))))$statistic # K-S