I would like to cbind different vectors of non-multiple length. The shorter ones are to be (partly) recycled as in vanilla cbind:
cbind(c(1,2,3),c(4,5))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 4
Warning message:
In cbind(c(1, 2, 3), c(4, 5)) :
number of rows of result is not a multiple of vector length (arg 2)
The result is as desired, except for the warning. Since I want to use this in an extension, is there a possibility to suppress the warning or better: who knows a straightforeward solution producing the same results with no warning! -- thanks, S.
Here is one option, wrapping the key concept into a function that arranges for things to just work. The simplest way is just to use rep() on each element of ... to repeat each input vecotr in ... to a common length (i.e. the length of the longest input vector).
This is what I do below using the length.out argument of rep().
CBIND <- function(..., deparse.level = 1) {
dots <- list(...) ## grab the objects passed in via ... this is a list
len <- sapply(dots, length) ## find lengths of individual vectors
## this applies rep() over dots extending each component vector to length
## of longest vector in ...
dots <- lapply(seq_along(dots),
function(i, x, len) rep(x[[i]], length.out = len),
x = dots, len = max(len))
## need to put everything together so arrange for a call to cbind
## passing the extended list of vectors and passing along
## the deparse.level argument of cbind
do.call(cbind, c(dots, deparse.level = deparse.level))
}
This gives:
R> CBIND(c(1,2,3),c(4,5))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 4
R> CBIND(c(1,2,3),c(4,5), deparse.level = 2)
c(1, 2, 3) c(4, 5, 4)
[1,] 1 4
[2,] 2 5
[3,] 3 4
I would certainly favour this over simply clobbering warnings with suppressWarnings() wrapped around the call. For production code you want to explicitly handle the cases you want to allow and let warnings propagate in circumstances where the user has done something you didn't account for.
You could use suppressWarnings, if you really want:
suppressWarnings(cbind(c(1,2,3),c(4,5)))
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 4
Related
What are the default values for the arguments nrow and ncol in the function matrix in R?
In order words: By writing
matrix(c(1,2,3,4,5,6), ncol=2)
I can make the function matrix automatically calculate how many rows the resulting matrix will have, and the output will be
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
However, if I want ncol to be a positional argument in this function call and I simply remove ncol=, that is
matrix(c(1,2,3,4,5,6), 2)
the "2" is going to end up as the value of nrow and not as the value of ncol, and I will instead get the matrix
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
I can fix this by providing the function a value for nrow so that the 2 is pushed to its correct position, like so:
matrix(c(1,2,3,4,5,6), 3, 2)
and I will get the desired matrix again. I could also use this method of providing a value for nrow if I were using a keyword argument for ncol, but at the same time wanted to be clear and provide the nrow argument as well:
matrix(c(1,2,3,4,5,6), nrow=3, ncol=2)
But now matrix doesn't calculate the number of rows for me, but I have to calculate it myself. What value should I write instead of 3 if I want tell matrix to calculate the number of rows itself? I've tried replacing the 3 with NULL, None (usually works in cases like this in Python), and -1 but these all give me errors (and 0 and 1 give me matrices with 0 rows and 1 row, respectively).
We can place a , before
matrix(c(1,2,3,4,5,6), ,2)
Here, the , is placed based on the argument order in the function. If we check ?matrix, the usage is
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
Note, that most of the argument have default values. It is always better to pass function argument with names, but if we want to skip, can use the right number of , (also depends on the function behavior). Here, we can set the dimnames as well after skipping the byrow argument with a ,
matrix(c(1,2,3,4,5,6), ,2, ,list(NULL, c('a', 'b')))
# a b
#[1,] 1 4
#[2,] 2 5
#[3,] 3 6
Does anyone know of a way to add up combinations of numbers within a vector?
Suppose I am going through a for loop and each time I end up with a vector of different lengths, how could I combine each element of this vector such that I have the sum of 2, 3, etc elements?
For example if I have:
vector <- c(1:5)
And want to go through it as in:
element 1 + element 2; element 2 + element 3, etc
But also:
element 1 + element 2 + element 3
How would I do this? It's important to note that in many of the vectors the lengths will be different. So whilst one vector might contain 3 elements another might contain 12.
I know you can do vector[1]+vector[2], but I need some way to iterate throughout the vector wherein it takes into account the above note.
Use you can use combn:
> combn(vector, 3, FUN = NULL, simplify = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5
The trick here is that each call will return a matrix of results, and you will have to decide how you want to aggregate and store all the various combinations.
If you don't mind having a list, then the following should do the trick:
> sapply(c(1:length(vector)),
function(x) {
combn(vector, x, FUN = NULL, simplify = TRUE)
})
Generate pair IDs
In this case, we need to get the pairs:
combn(3, 2)
Output:
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
Pairs are generated by column.
Sum Over Vector Elements (Using a Subset)
To access each element and perform a summation, we opt to define a helper function that takes the combination and the vector.
# Write a helper function
# sums of the index of the vector
comb_subset_sum = function(x, vec){
return(sum(vec[x]))
}
From this, we can use combn directly or use sapply.
Summing for 1 k:
combn directly:
# Input Vector
vec = 1:5
# Length of vector
n = length(vec)
# Generate pairwise combinations and obtain pair_sum
# Specify the k (m in R)
m = combn(n, m = 2, FUN = comb_subset_sum, vec = vec)
sapply usage:
# Input Vector
vec = 1:5
# Number of Observations
n = length(vec)
# Combinations
# Specify the k (m in R)
combinations = combn(n, m = 2)
# Obtain vectorized sum over subset
subset_summed = apply(combinations, 2, comb_subset_sum, vec = vec)
Example Output:
combinations:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 2 2 2 3 3 4
[2,] 2 3 4 5 3 4 5 4 5 5
subset_summed:
[1] 3 4 5 6 5 6 7 7 8 9
Trace:
vec[1]+vec[2]=3
vec[1]+vec[3]=4
vec[1]+vec[4]=5
vec[1]+vec[5]=6
vec[2]+vec[3]=5
vec[2]+vec[4]=6
vec[2]+vec[5]=7
vec[3]+vec[4]=7
vec[3]+vec[5]=8
vec[4]+vec[5]=9
To obtain the trace output, add the following before return() in comb_subset_sum():
cat(paste0("vec[",x,"]", collapse = "+"), "=", sum(vec[x]), "\n")
Summing for multiple k:
Here, we apply the same logic, just in a way that enables the k value of the combination to take multiple values.
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = comb_subset_sum, vec = vec)
}
Note: The size of each element of o will vary as the number of combinations will increase and then decrease.
Summing over combinations
If we do not care about vector element values, we can then just sum over the actual combinations in a similar way to how we obtained the vector elements.
To generate pairs and then sum, use:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Generate all combinations (by column)
# Specify the k (m in R)
m = combn(n, m = 2)
# Obtain sum by going over columns
sum_m = apply(m, 2, sum)
Or do it in one go:
# Specify the k (m in R)
sum_inplace = combn(n, m = 2, FUN = sum)
Equality:
all.equal(sum_m,sum_inplace)
Sum over k uses
And, as before, we can set it up to get all sums under different k by using:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output (varying lengths)
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = sum)
}
The following relies on the binary representation of number. Basically, you have 2^n combinations to check. By writing any number between 1 and 2^n in binary with 'n' bits, you have all the permutations of elements you might want.
The number2binary function comes from Paul Hiestra's answer in this tread: How to convert integer number into binary vector?
number2binary = function(number, noBits) {
binary_vector = rev(as.numeric(intToBits(number)))
if(missing(noBits)) {
return(binary_vector)
} else {
binary_vector[-(1:(length(binary_vector) - noBits))]
}
}
vector <- 1:5
n <- length(vector)
comp_sum <- function(x) {
binary <- number2binary(x, noBits = n)
result <- sum(vector[which(binary==1)])
names(result) <- paste(which(binary == 1), collapse = "+")
return(result)
}
binaries <- sapply(1:2^n-1, comp_sum)
Note: I only go up to 2^n - 1 as you do not need the "zero". By adding some conditions in your comp_sum function, you can pick only sums of two elements or of three elements...
You might be looking for rollsum from zoo package, where you can specify the number of elements you want to add up:
lapply(2:5, function(i) zoo::rollsum(1:5, i))
[[1]]
[1] 3 5 7 9 # two elements roll sum
[[2]]
[1] 6 9 12 # three elements roll sum
[[3]]
[1] 10 14 # four elements roll sum
[[4]]
[1] 15 # five elements roll sum
EDIT: For anyone interested, I completed my little project here and it can be seen at this link http://fdrennan.net/pages/myCurve.html
Scroll down to "I think it's been generalized decently" to see the curve_fitter function. If you find it useful, steal it and I don't need credit. I still have ncol as an input but it isn't required anymore. I just didn't delete it.
I am writing a script that will do some least squares stuff for me. I'll be using it to fit curves but want to generalize it. I want to be able to write in "x, x^2" in a function and have it pasted into a matrix function and read. Here is what I am talking about.
expressionInput <- function(func = "A written function", x = "someData",
nCol = "ncol") {
# Should I coerce the sting to something in order to make...
func <- as.SOMETHING?(func)
# ...this line to be equivalent to ...
A <- matrix(c(rep(1, length(x)), func), ncol = nCol)
# .... this line
# A <- matrix(c(rep(1, length(x)), x, x^2), ncol = 3)
A
}
expressionInput(func = "x, x^2", x = 1:10, nCol = 3)
Returns 10 x 3 matrix with 1's in one column, x in second, and squared values in third column.
The link below will show a few different functions for curve fitting. The idea behind this post is to be able to write in "x + x^2" or "x + sin(x)" or "e^x" etc., and return the coefficients for curve.
http://fdrennan.net/pages/myCurve.html
I think you are looking for something like this
f <- function(expr="", x=NULL, nCol=3) {
expr <- unlist(strsplit(expr,","))
ex <- rep("rep(1,length(x))", nCol)
ex[1:length(expr)] <- expr
sapply(1:length(ex), function(i) eval(parse(text=ex[i])))
}
f("x, x^2", 1:10, 3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 4 1
[3,] 3 9 1
[4,] 4 16 1
[5,] 5 25 1
[6,] 6 36 1
[7,] 7 49 1
[8,] 8 64 1
[9,] 9 81 1
[10,] 10 100 1
Note that, in your example, you separate the expressions to evaluate using a comma (,). Accordingly, I have used a comma to split the string into expressions. If you try passing expressions, which themselves contain commas this will fail. So, either restrict to using simple expressions without commas. Or if this is not possible then use a different character to separate the expressions (and escape it, if you need that character to be evaluated).
However, I would also reiterate the warnings in the comments to your question that depending on what you are trying to achieve, there are likely better ways to do it.
Suppose I have a function that takes an argument x of dimension 1 or 2. I'd like to do something like
x[1, i]
regardless of whether I got a vector or a matrix (or a table of one variable, or two).
For example:
x = 1:5
x[1,2] # this won't work...
Of course I can check to see which class was given as an argument, or force the argument to be a matrix, but I'd rather not do that. In Matlab, for example, vectors are matrices with all but one dimension of size 1 (and can be treated as either row or column, etc.). This makes code nice and regular.
Also, does anyone have an idea why in R vectors (or in general one dimensional objects) aren't special cases of matrices (or multidimensional objects)?
Thanks
In R, it is the other way round; matrices are vectors. The matrix-like behaviour comes from some extra attributes on top of the atomic vector part of the object.
To get the behaviour you want, you'd need to make the vector be a matrix, by setting dimensions on the vector using dim() or explicit coercion.
> vm <- 1:5
> dim(vm) <- c(1,5)
> vm
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
> class(vm)
[1] "matrix"
Next you'll need to maintain the dimensions when subsetting; by default R will drop empty dimensions, which in the case of vm above is the row dimension. You do that using drop = FALSE in the call to '['(). The behaviour by default is drop = TRUE:
> vm[, 2:4]
[1] 2 3 4
> vm[, 2:4, drop = FALSE]
[,1] [,2] [,3]
[1,] 2 3 4
You could add a class to your matrices and write methods for [ for that class where the argument drop is set to FALSE by default
class(vm) <- c("foo", class(vm))
`[.foo` <- function(x, i, j, ..., drop = FALSE) {
clx <- class(x)
class(x) <- clx[clx != "foo"]
x[i, j, ..., drop = drop]
}
which in use gives:
> vm[, 2:4]
[,1] [,2] [,3]
[1,] 2 3 4
i.e. maintains the empty dimension.
Making this fool-proof and pervasive will require a lot more effort but the above will get you started.
I'm trying to write a function to determine the euclidean distance between x (one point) and y (a set of n points).
How should I pass y to the function? Until now, I used a matrix like that:
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 1 1
Which would pass the points (0,2,1) and (1,1,1) to that function.
However, when I pass x as a normal (column) vector, the two variables don't match in the function.
I either have to transpose x or y, or save a vector of vectors an other way.
My question: What is the standard way to save more than one vector in R? (my matrix y)
Is it just my y transposed or maybe a list or dataframe?
There is no standard way, so you should just pick the most effective one, what on the other hand depends on how this vector of vectors looks just after creation (it is better to avoid any conversion which is not necessary) and on the speed of the function itself.
I believe that a data.frame with columns x, y and z should be pretty good choice; the distance function will be quite simple and fast then:
d<-function(x,y) sqrt((y$x-x[1])^2+(y$y-x[2])^2+(y$z-x[3])^2)
The apply function with the margin argument = 1 seems the most obvious:
> x
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 1 1
> apply(x , 1, function(z) crossprod(z, 1:length(z) ) )
[1] 7 6
> 2*2+1*3
[1] 7
> 1*1+2*1+3*1
[1] 6
So if you wanted distances then square-root of the crossproduct of the differences to a chose point seems to work:
> apply(x , 1, function(z) sqrt(sum(crossprod(z -c(0,2,2), z-c(0,2,2) ) ) ) )
[1] 1.000000 1.732051