Different behaviours between base::cbind and dplyr::bind_cols

Different behaviours between base::cbind and dplyr::bind_cols - r

When combining a data frame and a vector with different number of rows/lengths, bind_cols gives an error, whereas cbind repeats rows – why is this?
(And is it really wise to have that as a default behavior of cbind?)
See example data below.
# Example data
x10 <- c(1:10)
y10 <- c(1:10)
xy10 <- tibble(x10, y10)
z20 <- c(1:20)
# get an error
xyz20 <- dplyr::bind_cols(xy10, z20)
# why does cbind repeat rows of xy10 to suit z20?
xyz20 <- cbind(xy10, z20)
xyz20

base::cbind is a generic function. Its behavior is different for matrix and data frames.
For matrices, it does warn if objects have different number of rows (see more on Note below).
cbind(as.matrix(xy10), z20)
# x10 y10 z20
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 4 4
# [5,] 5 5 5
# [6,] 6 6 6
# [7,] 7 7 7
# [8,] 8 8 8
# [9,] 9 9 9
#[10,] 10 10 10
#Warning message:
#In cbind(as.matrix(xy10), z20) :
# number of rows of result is not a multiple of vector length (arg 2)
But for data frames, it actually creates a data frame from scratch. So the following is identical, both giving a data frame of 20 rows:
cbind(xy10, z20)
## in this way, R's recycling rule steps in
data.frame(xy10[, 1], xy10[, 2], z20)
From ?cbind:
The ‘cbind’ data frame method is just a wrapper for ‘data.frame(..., check.names = FALSE)’. This means that it will split matrix columns in data frame arguments, and convert character columns to factors unless ‘stringsAsFactors = FALSE’ is specified.
Note: In non-data.frame cases, matrices are not allowed to grow bigger. Only vectors will be recycled or truncated.
## handling two vectors
## vector of shorter length is recycled
cbind(1:2, 1:4)
# [,1] [,2]
#[1,] 1 1
#[2,] 2 2
#[3,] 1 3
#[4,] 2 4
## handling two matrices
## has strict requirement on dimensions
cbind(as.matrix(1:2), as.matrix(1:4))
#Error in cbind(as.matrix(1:2), as.matrix(1:4)) :
# number of rows of matrices must match (see arg 2)
## handling a matrix and a vector
## vector of shorter length is recycled
cbind(1:2, as.matrix(1:4))
# [,1] [,2]
#[1,] 1 1
#[2,] 2 2
#[3,] 1 3
#[4,] 2 4
## handling a matrix and a vector
## vector of longer length is truncated
cbind(as.matrix(1:2), 1:4)
# [,1] [,2]
#[1,] 1 1
#[2,] 2 2
#Warning message:
#In cbind(1:4, as.matrix(1:2)) :
# number of rows of result is not a multiple of vector length (arg 1)
From ?cbind:
If there are several matrix arguments, they must all have the same number of rows....
If all the arguments are vectors, ..., values in shorter arguments are recycled to achieve this length...
When the arguments consist of a mix of matrices and vectors, the number of rows of the result is determined by the number of rows of the matrix arguments... vectors... are recycled or subsetted to achieve this length.

Related

How to manipulate large arrays R

I have a large array with dimensions data[1:10,1:50,1:1000]. I would like to swap out the 5th row of all the matrices with new data with the dimensions new_data[1,1:50,1:1000].
So far I have tried to pull the array apart and put it back together:
data1<-data[1:4,1:50,1:1000]
data2<-data[6:10,1:50,1:1000]
combined_data<-rbind(data1,new_data,data2)
However rbind doesn't seem to be appropriate here and returns a large matrix rather than a large array with dimensions[1:10,1:50,1:1000]
On request here is a simple example:
vec1<-1:4
vec2<-c(1,2,2,4,1,2,2,4)
data_array<-array(c(vec1,vec2),dim=c(4,3,10))
data_array[,,1] # visualizing one of the 10 matrix - say they error is in row 3 where we would expect all 3s
new_data<-array(c(3,3,3),dim=c(1,3,10))
new_data[,,1] # correct data that we want to swap into row 3 of all the matrices
array2<-data_array[1:2,,] #correct data from original array
array3<-array(data_array[4,,],dim=c(1,3,10)) #correct data from original array
combined_data <- rbind(array2,new_data,array3) # attempting to combine and new_data into the correct row
However this results in a data with the dimensions [1:3,1:60], where I am aiming for the exact same dimensions as the original data_array ([1:4,1:3,1:10]) but with the new_data swapped in at row 3 of each matrix

Try with abind from "abind" package.
library(abind)
array4 <- abind(array2,new_data,along=1)
final_data <- abind(array4,array3,along=1)
The reference is as follows:
http://math.furman.edu/~dcs/courses/math47/R/library/abind/html/abind.html

Since an array is really just a vector with dimensions, you can replace every 4th value (the number of rows in each stratum), starting at the 3rd value (the row you want to replace), with the new_data
data_array[seq(3, by=dim(data_array)[1], to=length(data_array))] <- new_data
data_array
#, , 1
#
# [,1] [,2] [,3]
#[1,] 1 1 1
#[2,] 2 2 2
#[3,] 3 3 3
#[4,] 4 4 4
#
#, , 2
#
# [,1] [,2] [,3]
#[1,] 1 1 1
#[2,] 2 2 2
#[3,] 3 3 3
#[4,] 4 4 4
#...

Delete the rows of matrix with Inf [duplicate]

This question already has answers here:
R is there a way to find Inf/-Inf values?
(5 answers)
Closed 6 years ago.
I have three lists with Inf as a numeric and "NaN" as a character variable.
v1<-list(1,Inf,3,4,5,6,"NaN")
v2<-list(1,"NaN",3,4,5,6,5)
v3<-list(1,2,3,4,5,6,"NaN")
for the moment I can make a matrix with cbind, but the desired result is the code B).
A) What I got:
matrix<-cbind(v1,v2,v3)
v1 v2 v3
[1,] 1 1 1
[2,] Inf "NaN" 2
[3,] 3 3 3
[4,] 4 4 4
[5,] 5 Inf 5
[6,] 6 6 6
[7,]"NaN" 7 "NaN"
B) I want to get:
v1 v2 v3
[1,] 1 1 1
[2,] 3 3 3
[3,] 4 4 4
[4,] 6 6 6
Context:
I wanted to export into a .txt file some results located in 3 lists, the easy for me was to use cbind to get a matrix and use
write.table(matrix, file="mymatrix.txt", row.names=FALSE, col.names=FALSE)

Couple of suggestions regarding the problem.
1) It is better not to name a matrix object as matrix.
2) NaN or NA have a special meaning and are not character strings. By using quotes "NaN", it becomes difficult to apply the custom functions is.nan/is.na to do any manipulations. So, we have to resort to ==/!=
3) It is not clear why the individual list are cbinded to a matrix.
Based on the input data, we can loop through the columns of 'matrix' with apply, then loop through each of the list elements, check whether we have a finite element and is not a "NaN", get the rowSums, negate (! - converts the 0 elements to TRUE i.e. all the elements in the row are finite and all other values to FALSE). Use the logical index to subset the rows.
matrix[!rowSums(apply(matrix, 2, FUN = function(x)
sapply(x, function(y) !(is.finite(y) & y !="NaN")))),]
# v1 v2 v3
#[1,] 1 1 1
#[2,] 3 3 3
#[3,] 4 4 4
#[4,] 6 6 6

returning matrix column indices matching value(s) in R

I'm looking for a fast way to return the indices of columns of a matrix that match values provided in a vector (ideally of length 1 or the same as the number of rows in the matrix)
for instance:
mat <- matrix(1:100,10)
values <- c(11,2,23,12,35,6,97,3,9,10)
the desired function, which I call rowMatches() would return:
rowMatches(mat, values)
[1] 2 1 3 NA 4 1 10 NA 1 1
Indeed, value 11 is first found at the 2nd column of the first row, value 2 appears at the 1st column of the 2nd row, value 23 is at the 3rd column of the 3rd row, value 12 is not in the 4th row... and so on.
Since I haven't found any solution in package matrixStats, I came up with this function:
rowMatches <- function(mat,values) {
res <- integer(nrow(mat))
matches <- mat == values
for (col in ncol(mat):1) {
res[matches[,col]] <- col
}
res[res==0] <- NA
res
}
For my intended use, there will be millions of rows and few columns. So splitting the matrix into rows (in a list called, say, rows) and calling Map(match, as.list(values), rows) would be way too slow.
But I'm not satisfied by my function because there is a loop, which may be slow if there are many columns. It should be possible to use apply() on columns, but it won't make it faster.
Any ideas?

res <- arrayInd(match(values, mat), .dim = dim(mat))
res[res[, 1] != seq_len(nrow(res)), 2] <- NA
# [,1] [,2]
# [1,] 1 2
# [2,] 2 1
# [3,] 3 3
# [4,] 2 NA
# [5,] 5 4
# [6,] 6 1
# [7,] 7 10
# [8,] 3 NA
# [9,] 9 1
#[10,] 10 1

Roland's answer is good, but I'll post an alternative solution:
res <- which(mat==values, arr.ind = T)
res <- res[match(seq_len(nrow(mat)), res[,1]), 2]

Delete specific values in a matrix according to two position vectors

My aim is to delete specific positions in a matrix according to a vector. Just giving you a small example.
Users_pos <- c(1,2)
Items_pos <- c(3,2)
Given a Matrix A:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
My aim according to the two Vectors User_pos and Item_pos is to delete the following values
A[1,3] and A[3,2]
I'm wondering if there's a possibility to do so without typing in the values for rows and columns by hand.

You can index k elements in a matrix A using A[X], where X is a k-row, 2-column matrix where each row is the (row, col) value of the indicated element. Therefore, you can index your two elements in A with the following indexing matrix:
rbind(Users_pos, Items_pos)
# [,1] [,2]
# Users_pos 1 2
# Items_pos 3 2
Using this indexing, you could choose to extract the information current stored with A[X] or replace those elements with A[X] <- new.values. If you, for instance, wanted to replace these elements with NA, you could do:
A[rbind(Users_pos, Items_pos)] <- NA
A
# [,1] [,2] [,3]
# [1,] 1 NA 3
# [2,] 4 5 6
# [3,] 7 NA 9

Saving vectors of different lengths in a matrix/data frame

I have a numeric called area of length 166860. This consists of 412 different elements, most of length 405 and some of length 809. I have their start and end ids.
My goal is to extract them and put them in a matrix/data frame with 412 columns
Right now, I'm trying this code:
m = matrix(NA,ncol=412, nrow=809)
for (j in 1:412){
temp.start = start.ids[j]
temp.end = end.ids[j]
m[,j] = area[temp.start:temp.end]
}
But I just end up with this error message:
"Error in m[, j] = area[temp.start:temp.end] :
number of items to replace is not a multiple of replacement length"

Here's a quite easy approach:
Example data:
area <- c(1:4, 1:5, 1:6, 1:3)
# [1] 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6 1 2 3
start.ids <- which(area == 1)
# [1] 1 5 10 16
end.ids <- c(which(area == 1)[-1] - 1, length(area))
# [1] 4 9 15 18
Create a list with one-row matrices:
mats <- mapply(function(x, y) t(area[seq(x, y)]), start.ids, end.ids)
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
#
# [[3]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 3 4 5 6
#
# [[4]]
# [,1] [,2] [,3]
# [1,] 1 2 3
Use the function rbind.fill.matrix from the plyr package to create the matrix and transpose it (t):
library(plyr)
m <- t(rbind.fill.matrix(mats))
# [,1] [,2] [,3] [,4]
# 1 1 1 1 1
# 2 2 2 2 2
# 3 3 3 3 3
# 4 4 4 4 NA
# 5 NA 5 5 NA
# 6 NA NA 6 NA

You are setting the column length to be 412, and matrices cannot be flexible/variable in their length. This means the value you assign to the columns must either have a length of 412 or something less that can fill into a length of 412. From the manual on ?matrix:
If there are too few elements in data to fill the matrix, then the elements in data are recycled. If data has length zero, NA of an appropriate type is used for atomic vectors (0 for raw vectors) and NULL for lists.
As another commenter said, you may have intended to assign to the rows in which case m[j, ] is the way to do that, but you have to then pad the value you are assigning with NA or allow NA's to be filled so the value being assigned is always of length 809.
m = matrix(NA,ncol=412, nrow=809)
for (j in 1:412){
temp.start = start.ids[j]
temp.end = end.ids[j]
val <- area[temp.start:temp.end]
m[j, ] = c(val, rep(NA, 809 - length(val)))
}

How about this? I've manufactured some sample data:
#here are the random sets of numbers - length either 408 or 809
nums<-lapply(1:412,function(x)runif(sample(c(408,809),1)))
#this represents your numeric (one list of all the numbers)
nums.vec<-unlist(nums)
#get data about the series (which you have)
nums.lengths<-sapply(nums,function(x)length(x))
nums.starts<-cumsum(c(1,nums.lengths[-1]))
nums.ends<-nums.starts+nums.lengths-1
new.vec<-unlist(lapply(1:412,function(x){
v<-nums.vec[nums.starts[x]:nums.ends[x]]
c(v,rep(0,(809-length(v))))
}))
matrix(new.vec,ncol=412)

What about
m[j,] = area[temp.start:temp.end]
?
Edit:
a <- area[temp.start:temp.end]
m[1:length(a),j] <- a

Maybe others have better answers. As I see it, you have two options:
Change m[,j] to m[1:length(area[temp.start:temp.end]),j] and then you will not get an error but you would have some NA's left.
Use a list of matrices instead, so you would get different dimensions for each matrix.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Different behaviours between base::cbind and dplyr::bind_cols - r

Related

How to manipulate large arrays R

Delete the rows of matrix with Inf [duplicate]

returning matrix column indices matching value(s) in R

Delete specific values in a matrix according to two position vectors

Saving vectors of different lengths in a matrix/data frame

Categories

Resources