R: match rows of a matrices and get location specific info - r

Say you have a matrix M1 as such:
A B C D E F G H I J
353 1 0 1 0 0 1 0 0 1 1
288 1 0 1 0 0 1 1 0 1 1
275 1 0 1 0 1 1 0 0 1 1
236 0 0 1 0 0 1 0 0 1 1
235 0 0 1 0 0 1 1 0 1 1
227 1 0 1 0 1 1 1 0 1 1
the rownames are the values (they are not random they have meaning and it is what I want as I will explain).
Say you have another matrix M2 as such:
A B C D E F G H I J AA
[1,] 0 0 0 0 0 0 0 0 0 0 0
[2,] 1 0 0 0 0 0 0 0 0 0 0
[3,] 0 1 0 0 0 0 0 0 0 0 1
[4,] 1 1 0 0 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 0 0 0 1
[6,] 1 0 1 0 0 0 0 0 0 0 0
Note A to J is the same number of cols, except the 2 new cols, AA
Now, I want something like:
for (i in 1:nrow(M2)){
if(M2[i,"AA"]==1){
#-1 since I M1 doesnt have the AA column
vec = M2[i,1:(ncol(M2)-1)]
#BELOW is what I am not sure of what how to implement
#get the rowname from M1 that matches vec, and replace M2[i,"AA"] = that value
}
}
The result should be 0, since in this example there are no rows of M1 matching any rows of M2[,A:J]

Related

Automatic subsetting of a dataframe on the basis of a prediction matrix

I have created a prediction matrix for large dataset as follows:
library(mice)
dfpredm <- quickpred(df, mincor=.3)
A B C D E F G H I J
A 0 1 1 1 0 1 0 1 1 0
B 1 0 0 0 1 0 1 0 0 1
C 0 0 0 1 1 0 0 0 0 0
D 1 0 1 0 0 1 0 1 0 1
E 0 1 0 1 0 1 1 0 1 0
**F 0 0 1 0 0 0 1 0 0 0**
G 0 1 0 1 0 0 0 0 0 0
H 1 0 1 0 0 1 0 0 0 1
I 0 1 0 1 1 0 1 0 0 0
J 1 0 1 0 0 1 0 1 0 0
I would like to create a subset of the original df on the basis on dfpredm.
More specifically I would like to do the following:
Let's assume that my dependent variable is F.
According to the prediction matrix F is correlated with C and G.
In addition, C and G are best predicted by D,E and B,D respectively.
The idea is now to create a subset of df based on the dependent variable F,for which in the F row the value is 1.
Fpredictors <- df[,(dfpredm["F",]) == 1]
But also do the same for the variables where the rows in F are 1. I am thinking of first getting the column names like this:
Fpredcol <-colnames(dfpredm[,(dfpredm["c241",]) == 1])
And then doing a for loop with these column names?
For the specific example I would like to end up with the subset.
dfsub <- df[,c("F","C","G","B","E","D")]
I would however like to automate this process. Could anyone show me how to do this?
Here is one strategy that seems like it would work for you:
first_preds <- function(dat, predictor) {
cols <- which(dat[predictor, ] == 1)
names(dat)[cols]
}
# wrap first_preds() for getting best and second best predictors
first_and_second_preds <- function(dat, predictor) {
matches <- first_preds(dat, predictor)
matches <- c(matches, unlist(lapply(matches, function(x) first_preds(dat, x))))
c(predictor, matches) %>% unique()
}
dat[first_and_second_preds(dat, "F")] # order is not exactly the same as your output
F C G D E B
A 1 1 0 1 0 1
B 0 0 1 0 1 0
C 0 0 0 1 1 0
D 1 1 0 0 0 0
E 1 0 1 1 0 1
F 0 1 1 0 0 0
G 0 0 0 1 0 1
H 1 1 0 0 0 0
I 0 0 1 1 1 1
J 1 1 0 0 0 0
Not sure if the ordering in the result is important, but you could add the logic if it is.
Using dat from here (a kinder way to share small R data on SO):
dat <- read.table(
text = "A B C D E F G H I J
A 0 1 1 1 0 1 0 1 1 0
B 1 0 0 0 1 0 1 0 0 1
C 0 0 0 1 1 0 0 0 0 0
D 1 0 1 0 0 1 0 1 0 1
E 0 1 0 1 0 1 1 0 1 0
F 0 0 1 0 0 0 1 0 0 0
G 0 1 0 1 0 0 0 0 0 0
H 1 0 1 0 0 1 0 0 0 1
I 0 1 0 1 1 0 1 0 0 0
J 1 0 1 0 0 1 0 1 0 0",
header = TRUE
)
Something a little more general that would let you use self_select predictors directly:
all_preds <- function(dat, predictors) {
unlist(lapply(predictors, function(x) names(dat)[which(dat[x, ] == 1 )]))
}
dat[all_preds(dat, c("A", "B"))]
B C D F H I A E G J
A 1 1 1 1 1 1 0 0 0 0
B 0 0 0 0 0 0 1 1 1 1
C 0 0 1 0 0 0 0 1 0 0
D 0 1 0 1 1 0 1 0 0 1
E 1 0 1 1 0 1 0 0 1 0
F 0 1 0 0 0 0 0 0 1 0
G 1 0 1 0 0 0 0 0 0 0
H 0 1 0 1 0 0 1 0 0 1
I 1 0 1 0 0 0 0 1 1 0

Creating an occurrence matrix for each char from a large number of equal length strings

I am trying to create a matrix which gives me the occurrence of each element at each position, based on a large number of strings in a vector.
I have the following pet example and potential solution:
set.seed(42)
seqs <- sapply(1:10, FUN = function(x) { paste(sample(LETTERS, size = 11, replace = T), collapse = "") })
test <- lapply(seqs, FUN = function(s) {
do.call(cbind, lapply(LETTERS, FUN = function(ch) {
grepl(ch, unlist(strsplit(s, split="")))
}))
})
testR <- Reduce("+", test)
seqs
# [1] "XYHVQNTDRSL" "SYGMYZDMOXD" "ZYCNKXLVTVK" "RAVAFXPJLAZ" "LYXQZQIJKUB" "TREGNRZTOWE" "HVSGBDFMFSA" "JNAPEJQUOGC" "CHRAFYYTINT"
#[10] "QQFFKYZTTNA"
testR
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
[1,] 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 1 0 0 0
[2,] 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0
[3,] 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0
[4,] 2 0 0 0 0 1 2 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0
[5,] 0 1 0 0 1 2 0 0 0 0 2 0 0 1 0 0 1 0 0 0 0 0 0
[6,] 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0
[7,] 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0
[8,] 0 0 0 1 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 3 1 1 0
[9,] 0 0 0 0 0 1 0 0 1 0 1 1 0 0 3 0 0 1 0 2 0 0 0
[10,] 1 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 2 0 1 1 1
[11,] 2 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0
[,24] [,25] [,26]
[1,] 1 0 1
[2,] 0 4 0
[3,] 1 0 0
[4,] 0 0 0
[5,] 0 1 1
[6,] 2 2 1
[7,] 0 1 2
[8,] 0 0 0
[9,] 0 0 0
[10,] 1 0 0
[11,] 0 0 1
I am trying to force myself to not use loops but instead use the vectorised functions but I am not sure if my solution is actually a good (efficient) solution or if I have gotten confused somewhere. It is also fairly difficult to debug if real life data messes up somehow (which sadly is the case).
So my question, what is a good way to solve this problem?
EDIT: Following 989's track, I have done a benchmark test of the proposed solutions here, with a data size more representative of the problem at hand.
library(microbenchmark)
set.seed(42)
seqs <- sapply(1:10000, FUN = function(x) { paste(sample(LETTERS, size = 31, replace = T), collapse = "") })
f.posdef=function(){
test <- lapply(seqs, FUN = function(s) {
do.call(cbind, lapply(LETTERS, FUN = function(ch) {
grepl(ch, unlist(strsplit(s, split="")))
}))
})
(testR <- Reduce("+", test))
}
f.989=function() {
l <- lapply(seqs, function(x) {
m <- matrix(0, nchar(x), 26)
replace(m, cbind(seq(nchar(x)), match(strsplit(x, "")[[1]], LETTERS)), 1)
})
Reduce("+",l)
}
f.docendo1=function()
t(Reduce("+", lapply(strsplit(seqs, "", fixed = TRUE), function(xx)
table(factor(xx, levels = LETTERS), 1:31))))
f.docendo2=function()
t(table(do.call(cbind, strsplit(seqs, "", fixed = TRUE)), rep(1:31, 10000)))
f.akrun=function(){
strsplit(seqs, "") %>%
transpose %>%
map(unlist) %>%
setNames(seq_len(nchar(seqs[1]))) %>%
stack %>%
select(2:1) %>%
table
}
r <- f.posdef()
Note that the main difference between this benchmark and 989's is the sample size.
> all(r==f.989())
[1] TRUE
> all(r==f.docendo1())
[1] TRUE
> all(r==f.docendo2())
[1] TRUE
> all(r==f.akrun())
[1] FALSE
> res <- microbenchmark(f.posdef(), f.989(), f.docendo1(), f.docendo2(), f.akrun())
> autoplot(res)
As the plot shows, akrun's solution is blazing fast but seemingly inaccurate. Thus the gold medal goes to docendo's second solution. However it's probably worth noting that both solutions by docendo as well as the suggestion by 989 has assumptions regarding the length/number of the sample strings or the alphabet size in m <- matrix(0, nchar(x), 26)
In the case of size/length of sample strings (i.e. seqs) it would be an additional call to nchar which should not impact the runtime much. I am not sure how one would avoid making the assumption alphabet size, if this is not known a priori.
Here's another approach in base R which requires less looping than the approach by OP:
t(Reduce("+", lapply(strsplit(seqs, "", fixed = TRUE), function(xx)
table(factor(xx, levels = LETTERS), 1:11))))
# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 1
# 2 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 4 0
# 3 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0
# 4 2 0 0 0 0 1 2 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 0
# 5 0 1 0 0 1 2 0 0 0 0 2 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1
# 6 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 2 2 1
# 7 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 2
# 8 0 0 0 1 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 3 1 1 0 0 0 0
# 9 0 0 0 0 0 1 0 0 1 0 1 1 0 0 3 0 0 1 0 2 0 0 0 0 0 0
# 10 1 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 2 0 1 1 1 1 0 0
# 11 2 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1
Or, possibly more efficient:
t(table(do.call(cbind, strsplit(seqs, "", fixed = TRUE)), rep(1:nchar(seqs[1]), length(seqs))))
We can also use table once
library(tidyverse)
strsplit(seqs, "") %>%
transpose %>%
map(unlist) %>%
setNames(seq_len(nchar(seqs[1]))) %>%
stack %>%
select(2:1) %>%
table
# values
#ind A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 1
# 2 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 4 0
# 3 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0
# 4 2 0 0 0 0 1 2 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 0
# 5 0 1 0 0 1 2 0 0 0 0 2 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1
# 6 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 2 2 1
# 7 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 2
# 8 0 0 0 1 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 3 1 1 0 0 0 0
# 9 0 0 0 0 0 1 0 0 1 0 1 1 0 0 3 0 0 1 0 2 0 0 0 0 0 0
# 10 1 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 2 0 1 1 1 1 0 0
# 11 2 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1
Or slightly more compact is by using mtabulate from qdapTools
library(qdapTools)
strsplit(seqs, "") %>%
transpose %>%
map(unlist) %>%
mtabulate
# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
#1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 1
#2 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 4 0
#3 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0
#4 2 0 0 0 0 1 2 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 0
#5 0 1 0 0 1 2 0 0 0 0 2 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1
#6 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 2 2 1
#7 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 2
#8 0 0 0 1 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 3 1 1 0 0 0 0
#9 0 0 0 0 0 1 0 0 1 0 1 1 0 0 3 0 0 1 0 2 0 0 0 0 0 0
#10 1 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 2 0 1 1 1 1 0 0
#11 2 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1
You could go for match in base R:
l <- lapply(seqs, function(x) {
m <- matrix(0, nchar(x), 26)
replace(m, cbind(seq(nchar(x)), match(strsplit(x, "")[[1]], LETTERS)), 1)
})
all(Reduce("+",l)==testR)
#[1] TRUE
BENCHMARKING (I did not include #akrun's answers as I do not want to install the required packages)
library(microbenchmark)
set.seed(42)
seqs <- sapply(1:10, FUN = function(x) { paste(sample(LETTERS, size = 11, replace = T), collapse = "") })
fOP=function(){
test <- lapply(seqs, FUN = function(s) {
do.call(cbind, lapply(LETTERS, FUN = function(ch) {
grepl(ch, unlist(strsplit(s, split="")))
}))
})
(testR <- Reduce("+", test))
}
f989=function() {
l <- lapply(seqs, function(x) {
m <- matrix(0, nchar(x), 26)
replace(m, cbind(seq(nchar(x)), match(strsplit(x, "")[[1]], LETTERS)), 1)
})
Reduce("+",l)
}
fdocendo.discimus=function()
t(Reduce("+", lapply(strsplit(seqs, "", fixed = TRUE), function(xx)
table(factor(xx, levels = LETTERS), 1:11))))
fdocendo.discimus1=function()
t(table(do.call(cbind, strsplit(seqs, "", fixed = TRUE)), rep(1:11, 10)))
r <- fOP()
all(r==f989())
# [1] TRUE
all(r==fdocendo.discimus())
# [1] TRUE
all(r==fdocendo.discimus1())
# [1] TRUE
res <- microbenchmark(fOP(), f989(), fdocendo.discimus(), fdocendo.discimus1())
print(res, order="mean")
# Unit: microseconds
# expr min lq mean median uq max neval
# f989() 135.813 150.8360 205.3294 154.1415 159.700 4968.565 100
# fdocendo.discimus1() 391.813 405.1845 447.6911 418.2545 445.146 2418.480 100
# fdocendo.discimus() 943.775 990.9495 1090.9905 1015.5880 1062.311 3996.245 100
# fOP() 1486.725 1521.4280 1643.1604 1548.9215 1602.104 5782.838 100

Building a symmetric binary matrix

I have a matrix that is for example like this:
rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3
I want to make a Symmetric binary matrix that it's dimnames of that is the same as rownames of above matrix. I want to fill these matrix by 1 & 0 in such a way that 1 indicated placing variables that has the same number in front of it and 0 for the opposite situation.This matrix would be like
dimnames
a c b d y q i j r
a 1 0 0 0 0 0 1 1 0
c 0 1 0 0 0 0 0 0 1
b 0 0 1 0 1 0 0 0 0
d 0 0 0 1 0 1 0 0 0
y 0 0 1 0 1 0 0 0 0
q 0 0 0 1 0 1 0 0 0
i 1 0 0 0 0 0 1 1 0
j 1 0 0 0 0 0 1 1 0
r 0 1 0 0 0 0 0 0 1
Anybody know how can I do that?
Use dist:
DF <- read.table(text = "rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3", header = TRUE)
res <- as.matrix(dist(DF$V1)) == 0L
#alternatively:
#res <- !as.matrix(dist(DF$V1))
#diag(res) <- 0L #for the first version of the question, i.e. a zero diagonal
res <- +(res) #for the second version, i.e. to coerce to an integer matrix
dimnames(res) <- list(DF$rownames, DF$rownames)
# 1 2 3 4 5 6 7 8 9
#1 1 0 0 0 0 0 1 1 0
#2 0 1 0 0 0 0 0 0 1
#3 0 0 1 0 1 0 0 0 0
#4 0 0 0 1 0 1 0 0 0
#5 0 0 1 0 1 0 0 0 0
#6 0 0 0 1 0 1 0 0 0
#7 1 0 0 0 0 0 1 1 0
#8 1 0 0 0 0 0 1 1 0
#9 0 1 0 0 0 0 0 0 1
You can do this using table and crossprod.
tcrossprod(table(DF))
# rownames
# rownames a b c d i j q r y
# a 1 0 0 0 1 1 0 0 0
# b 0 1 0 0 0 0 0 0 1
# c 0 0 1 0 0 0 0 1 0
# d 0 0 0 1 0 0 1 0 0
# i 1 0 0 0 1 1 0 0 0
# j 1 0 0 0 1 1 0 0 0
# q 0 0 0 1 0 0 1 0 0
# r 0 0 1 0 0 0 0 1 0
# y 0 1 0 0 0 0 0 0 1
If you want the row and column order as they are found in the data, rather than alphanumerically, you can subset
tcrossprod(table(DF))[DF$rownames, DF$rownames]
or use factor
tcrossprod(table(factor(DF$rownames, levels=unique(DF$rownames)), DF$V1))
If your data is large or sparse, you can use the sparse matrix algebra in xtabs, with similar ways to change the order of the resulting table as before.
Matrix::tcrossprod(xtabs(data=DF, ~ rownames + V1, sparse=TRUE))

merge two rows of n dimension by considering one column

I have a matrix, that has been formed after using cbind()
! ? c e i k l t
dif 0 0 1 0 0 0
dor 1 0 0 0 0 0
dor 0 0 0 0 0 1
same 0 0 0 1 0 0
same 0 1 0 0 0 0
Suggest me a code in R that could merge the rows as below
! ? c e i k l t
same 1 1 0 1 0 0
dif 0 0 1 0 0 0
dor 1 0 0 0 0 1
Thank you..
df<-read.table(header=T,text="ID c e i k l t
dif 0 0 1 0 0 0
dor 1 0 0 0 0 0
dor 0 0 0 0 0 1
same 0 0 0 1 0 0
same 0 1 0 0 0 0")
require(plyr)
ddply(df,.(ID),function(x)colSums(x[,-1]))
ID c e i k l t
1 dif 0 0 1 0 0 0
2 dor 1 0 0 0 0 1
3 same 0 1 0 1 0 0
Command acknowledged:
aggregate(df[, -1], list(df[, 1]), function(x) {
Reduce("|", x)
})
# Group.1 c e i k l t
# 1 dif 0 0 1 0 0 0
# 2 dor 1 0 0 0 0 1
# 3 same 0 1 0 1 0 0
Do you want the sum, or do you want the logical OR:
Logical OR:
require(functional)
aggregate(. ~ ID, data=df, FUN=Compose(any, as.numeric))
ID c e i k l t
1 dif 0 0 1 0 0 0
2 dor 1 0 0 0 0 1
3 same 0 1 0 1 0 0
Sum:
aggregate(. ~ ID, data=df, FUN=sum)
The result here is the same.

Change a long table to wide table

Suppose I have a long table like this:
A <- rep(c("a","b","c","d"),each=4)
B <- rep(c("e","f","g","h"),4)
C <- rep(c("i","j"),8)
D <- rnorm(16)
df <- data.frame(A,B,C,D)
head(df)
A B C D
1 a e i -0.18984508
2 a f j -1.82703822
3 a g i -0.17307580
4 a h j -1.38104238
5 b e i 0.08699983
6 b f j -0.36442461
I would like to change to long table to a wide format so that each element in column A and B is a title of a column. Each row should be a 1 or 0 indicating if elements exists. Column C and D remains the same. The desired table is something like this:
C D a b e f g h
i -0.18984508 1 0 1 0 0 0
j -1.82703822 1 0 0 1 0 0
i -0.17307580 1 0 0 0 1 0
j -1.38104238 1 0 0 0 0 1
i 0.08699983 0 1 1 0 0 0
j -0.36442461 0 1 0 1 0 0
This is a form of reshaping which can be done with the reshape2 package.
library("reshape2")
dcast(melt(df, id.vars=c("C", "D")), C+D~value, fun.aggregate=length)
which gives
C D a b c d e f g h
1 i -1.44485242 0 1 0 0 0 0 1 0
2 i -0.80834639 0 0 0 1 0 0 1 0
3 i -0.15202085 0 0 0 1 1 0 0 0
4 i -0.05626233 1 0 0 0 1 0 0 0
5 i 0.12031754 1 0 0 0 0 0 1 0
6 i 0.62206658 0 0 1 0 0 0 1 0
7 i 0.77101891 0 1 0 0 1 0 0 0
8 i 1.38752097 0 0 1 0 1 0 0 0
9 j -2.52137154 0 0 0 1 0 0 0 1
10 j -0.53231537 0 1 0 0 0 0 0 1
11 j -0.30178539 1 0 0 0 0 0 0 1
12 j -0.29823112 1 0 0 0 0 1 0 0
13 j -0.12988540 0 1 0 0 0 1 0 0
14 j 0.00517754 0 0 1 0 0 1 0 0
15 j 0.51452289 0 0 1 0 0 0 0 1
16 j 0.53260223 0 0 0 1 0 1 0 0
The order is not the same as the original data set, but if that is important put an order column in, carry it through, and then sort on it at the end.

Resources