This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 3 years ago.
I need to build a matrix from data that is stored in several other matrices that all have a pointer in their first column. This is how the original matrices might look, with a-e being the pointers connecting the the data from all the matrices and the v-z being the data that is linked together. The arrow points to what I want my final matrix to look like.
a x x
b y y
c z z
d w w
e v v
e v v
d w w
c z z
b y y
a x x
----->
x x x x
y y y y
z z z z
w w w w
v v v v
I cant seem to write the right algorithm to do this, I am either getting subscript out of bounds errors or replacement has length zero errors. Here is what I have now but it is not working.
for(i in 1:length(matlist)){
tempmatrix = matlist[[i]] # list of matrices to be combined
genMatrix[1,i] = tempmatrix[1,2]
for(j in 2:length(tempmatrix[,1])){
index = which(indexv == tempmatrix[j,1]) #the row index for the data that needs to be match
# with an ECID
for(k in 1:length(tempmatrix[1,])){
genMatrix[index,k+i] = tempmatrix[j,k]
}
# places the data in same row as the ecid
}
}
print(genMatrix)
EDIT: I just want to clarify that my example only shows two matrices but in the list matlist there can be any number of matrices. I need to find a way of merging them without having to know how many matrices are in matlist at the time.
We can merge all the matrices in the list using Reduce and merge from base package.
as.matrix(read.table(text="a x x
b y y
c z z
d w w
e v v")) -> mat1
as.matrix(read.table(text="e v v
d w w
c z z
b y y
a x x")) -> mat2
as.matrix(read.table(text="e x z
d z w
c w v
b y x
a v y")) -> mat3
matlist <- list(mat1=mat1, mat2=mat2, mat3=mat3)
Reduce(function(m1, m2) merge(m1, m2, by = "V1", all.x = TRUE),
matlist)[,-1]
#> V2.x V3.x V2.y V3.y V2 V3
#> 1 x x x x v y
#> 2 y y y y y x
#> 3 z z z z w v
#> 4 w w w w z w
#> 5 v v v v x z
Created on 2019-06-05 by the reprex package (v0.3.0)
Or we can append all the matrices together and then use tidyr to go from long to wide and get the desired output.
library(tidyr)
library(dplyr)
bind_rows(lapply(matlist, as.data.frame), .id = "mat") %>%
gather(matkey, val, c("V2","V3")) %>%
unite(matkeyt, mat, matkey, sep = ".") %>%
spread(matkeyt, val) %>%
select(-V1)
#> mat1.V2 mat1.V3 mat2.V2 mat2.V3 mat3.V2 mat3.V3
#> 1 x x x x v y
#> 2 y y y y y x
#> 3 z z z z w v
#> 4 w w w w z w
#> 5 v v v v x z
Created on 2019-06-06 by the reprex package (v0.3.0)
Related
The problem I have using tensorflow is as follows:
For one tensor X with dims n X m
X = [[x11,x12...,x1m],[x21,x22...,x2m],...[xn1,xn2...,xnm]]
I want to get an n X m X m tensor which are n m X m matrices
Each m X m matrix is the result of:
tf.math.greater(tf.reshape(x,(-1,1)), x) where x is a row of X
In words, for every row k in X, Im trying to get the pairs i,j where xki > xkj. This gives me a matrix, and then I want to stack those matrices along the first axis, to get a n m x m cube.
Example:
X = [[1,2],[4,3], [5,7]
Result = [[[False, False],[True, False]],[[False, True],[False, False]], [[False, False],[True, False]]]
Result has shape 3 X 2 X 2
Reshaping each row is the same as reshaping all rows. Try this:
def fun(X):
n, m = X.shape
X1 = tf.expand_dims(X, -1)
X2 = tf.reshape(X, (n, 1, m))
return tf.math.greater(X1, X2)
X = tf.Variable([[1,2],[4,3], [5,7]])
print(fun(X))
Output:
tf.Tensor(
[[[False False]
[ True False]]
[[False True]
[False False]]
[[False False]
[ True False]]], shape=(3, 2, 2), dtype=bool)
I am trying to merge a list of matrices all by the first column like this:
a x x
a q q
b y y
c z z
d w w x x x x
e v v q q q q
e r r y y y y
----------> z z z z
a x x w w w w
a q q v v v v
b y y r r r r
c z z
d w w
e v v
e r r
I would like to use the first column to combine the matrices but it does not need to be in the resulting matrix. The thing that is challenging me is the fact that there are multiple instances of the same value in the first row (a and e)
I have been looking around but unable to find any solutions that account for the same values in the column that the matrices are being joined with. With my current code (shown bellow) I get something like:
x x x x
q q q q
x x x x
q q q q
x x x x
q q q q
y y y y
z z z z
w w w w
v v v v
r r r r
v v v v
r r r r
v v v v
r r r r
I cant seem to find out why the duplicate rows are appearing but it has something to do with the length of list so I am assuming it takes place in the merge function.
mergeM <- function(list){ # list is a list of matrices
len = length(list)
mat = merge(list[[1]],list[[2]],by.x = "V1", by.y = "V1", all = TRUE)
if(len >2){
for(i in 3:len){
mat = merge(mat,list[[i]],by.x = "V1", by.y = "V1", all = TRUE)
}
}
mat = mat[,-1]
return(mat)
}# end function
I have a data set for MMA bouts.
The structure currently is
Fighter 1, Fighter 2, Winner
x y x
x y x
x y x
x y x
x y x
My problem is that Fighter 1 = Winner so my model will be trained that fighter 1 always wins, which is a problem.
I need to be able to randomly swap Fighter 1 and Fighter 2 for half the data set in order to have the winner represented equally.
Ideally i would have this
Fighter 1, Fighter 2, Winner
x y x
y x x
x y y
y x x
x y y
is there a way to randomise across columns without messing up the order of the rows ??
I'm assuming your xs and ys are arbitrary and just placeholders. I'll further assume that you need the Winner column to stay the same, you just need that the winner not always be in the first column.
Sample data:
set.seed(42)
x <- data.frame(
F1 = sample(letters, size = 5),
F2 = sample(LETTERS, size = 5),
stringsAsFactors = FALSE
)
x$W <- x$F1
x
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 g D g
# 4 t P t
# 5 o W o
Choose some rows to change, randomly:
(ind <- sample(nrow(x), size = ceiling(nrow(x)/2)))
# [1] 3 5 4
This means that we expect rows 3-5 to change.
Now the random changes:
within(x, { tmp <- F1[ind]; F1[ind] = F2[ind]; F2[ind] = tmp; rm(tmp); })
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 D g g
# 4 P t t
# 5 W o o
Rows 1-2 still show the F1 as the Winner, and rows 3-5 show F2 as the Winner.
I also found that this code worked
matches_clean[, c("fighter1", "fighter2")] <- lapply(matches_clean[, c("fighter1", "fighter2")], as.character)
changeInd <- !!((match(matches_clean$fighter1, levels(as.factor(matches_clean$fighter1))) -
match(matches_clean$fighter2, levels(as.factor(matches_clean$fighter2)))) %% 2)
matches_clean[changeInd, c("fighter1", "fighter2")] <- matches_clean[changeInd, c("fighter2", "fighter1")]
while converting Long data to wide How do I provide multiple columns to timevar argumnet in reshape
`reshape(DT, idvar="Cell", timevar = "n1", direction="wide")`
like example timevar=c("n1","n2"....)
DT<-data.table(Cell = c("A","A","B","B"), n1=c("x","y","y","a"), n2=c("t","x","x","z"))
Cell n1 n2
1: A x t
2: A y x
3: B y x
4: B a z
but I need output like below:
Cell n1 n2 n3 n4
A x y t NA
B x y a z
order of elements in n1, n2, n3 columns of output doesn't matter. only unique elements from n1 and n2 cols is required. Also I have have multiples columns like n1, n2, n3,,, n in my actual DT
Here is a rough concept that seems to achieve the desired result.
foo <- function(x, y, n) {
l <- as.list(unique(c(x, y)))
if (length(l) < n) l[(length(l)+1):n] <- NA_character_
l
}
DT[, foo(n1, n2, 4), Cell]
# Cell V1 V2 V3 V4
# 1: A x y t <NA>
# 2: B y a x z
# Set the names by reference
setnames(DTw, c("Cell", paste0("n", 1:4)))
I am looking for an idiomatic way to join a column, say named 'x', which exists in every data.frame element of a list. I came up with a solution with two steps by using lapply and Reduce. The second attempt trying to use only Reduce failed. Can I actually use only Reduce with one anonymous function to do this?
#data
xs <- replicate(5, data.frame(x=sample(letters, 10, T), y =runif(10)), simplify = FALSE)
# This works, but may be still unnecessarily long
otmap = lapply(xs, function(df) df$x)
jotm = Reduce(c, otmap)
# This does not count as another solution:
jotm = Reduce(c, lapply(xs, function(df) df$x))
# Try to use only Reduce function. This produces an error
jotr =Reduce(function(a,b){c(a$x,b$x)}, xs)
# Error in a$x : $ operator is invalid for atomic vectors
We can unlist after extracting the 'x' column
unlist(lapply(xs, `[[`, 'x'))
#[1] b y y i z o q w p d f f z b h m c u f s j e i v y b w j n q e w i r h p z q f x a b v z e x l c q f
#Levels: b d i o p q w y z c f h m s u e j n v r x a l