Randomly populate R dataframe with integers between - r

I would like to create an R dataframe with random integers WITHOUT repetition.
I have come up with this approach which works:
rank_random<-data.frame(matrix(NA, nrow = 13, ncol = 30)
for (colIdx in seq(1:30) {
rank_random[colIdx,] <-sample(1:ncol(subset(exc_ret, select=-c(Date))), 30,
replace=F)
}

I assume that you mean without repetition on each row. If you meant something else, please clarify.
For your example:
N= ncol(subset(exc_ret, select=-c(Date)))
num.rows = 30
t(sapply( seq(num.rows),
FUN=function(x){sample(1:N, num.rows, replace=F)} ))
To test it for a simpler case
N= 5
num.rows = 5
t(sapply( seq(num.rows),
FUN=function(x){sample(1:N, num.rows, replace=F)} ))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 4 5 1 3
# [2,] 2 5 1 3 4
# [3,] 5 1 4 3 2
# [4,] 3 4 5 2 1
# [5,] 3 2 5 1 4

Related

Subtract vector from one column of a matrix

I'm a complete R novice, and I'm really struggling on this problem. I need to take a vector, evens, and subtract it from the first column of a matrix, top_secret. I tried to call up only that column using top_secret[,1] and subtract the vector from that, but then it only returns the column. Is there a way to do this inside the matrix so that I can continue to manipulate the matrix without creating a bunch of separate columns?
Sure, you can. Here is an example:
m <- matrix(c(1,2,3,4),4,4, byrow = TRUE)
> m
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 2 3 4
[3,] 1 2 3 4
[4,] 1 2 3 4
m[,4] <- m[,4] - c(5,5,5,5)
which gives:
> m
[,1] [,2] [,3] [,4]
[1,] 1 2 3 -1
[2,] 1 2 3 -1
[3,] 1 2 3 -1
[4,] 1 2 3 -1
Or another option is replace
replace(m, cbind(seq_len(nrow(m)), 4), m[,4] - 5)
data
m <- matrix(c(1,2,3,4),4,4, byrow = TRUE)

making combinations of 6 numbers using three pairs from four pairs (1,2), (3,4), (5,6), (7,8) in R

I am trying to make combinations of 6 numbers using three pairs from four pairs (1,2), (3,4), (5,6), (7,8) in R
d<-c(1,2,3,4,5,6,7,8)
dc1<-cbind(d[1:2],d[3:4],d[5:6])
dim(dc1)<-c(1,6)
dc2<-cbind(d[1:2],d[3:4],d[7:8])
dim(dc2)<-c(1,6)
dc3<-cbind(d[1:2],d[5:6],d[7:8])
dim(dc3)<-c(1,6)
dc4<-cbind(d[3:4],d[5:6],d[7:8])
dim(dc4)<-c(1,6)
rbind(dc1,dc2,dc3,dc4)
Is it possible to use combn to obtain
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 1 2 3 4 7 8
[3,] 1 2 5 6 7 8
[4,] 3 4 5 6 7 8
I have tried
d<-structure(list(d1=c(1,2),d2=c(3,4),d3=c(5,6),d4=c(7,8)),.Names = c("d1", "d2", "d3", "d4"), row.names = 1:2, class = "data.frame")
dc <- combn(d, 3, simplify=FALSE)
for(i in 1:length(dc)){
dim(dc[i])<-c(1,6)
}
but it is not working. I will appreciate your help. Thanks.
We can create a grouping variable to split and then do the combn
grp <- as.integer(gl(length(d), 2, length(d)))
out <- do.call(rbind, combn(split(d, grp), 3, simplify = FALSE, FUN = unlist))
dimnames(out) <- NULL
out
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 2 3 4 5 6
#[2,] 1 2 3 4 7 8
#[3,] 1 2 5 6 7 8
#[4,] 3 4 5 6 7 8
NOTE: Here, the initial object is just the vector created instead of the pre-procesed 'd'. If we have already separated it to columns, it is much easier as #markus mentioned
t(combn(d, 3, FUN =unlist))
data
d <- 1:8
Here's another way using the combination function from the gtools package:
Create a list of your pairs:
pair.list <- list(c(1,2), c(3, 4), c(5, 6), c(7, 8))
Then create the 4 choose 3 combo matrix:
combos <- combination(4, 3)
Then use the purrr map function to generate the list of output vectors
vec.list <- map(1:4, function(x) unlist(pair.list[combos[x, ]]))
Finally convert the list of vectors to a data.frame:
df <- data.frame(Reduce(rbind, vec.list))
The benefit of this strategy is that your tuples can be of any length and have any values.
Another possibility, starting with the vector 'd':
i <- (seq_along(d) + 1) %/% 2
t(combn(unique(i), 3, function(cb) d[i %in% cb]))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 3 4 5 6
# [2,] 1 2 3 4 7 8
# [3,] 1 2 5 6 7 8
# [4,] 3 4 5 6 7 8

extract every two elements in matrix row in r in sequence to calculate euclidean distance

How to extract every two elements in sequence in a matrix and return the result as a matrix so that I could feed the answer in a formula for calculation:
For example, I have a one row matrix with 6 columns:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
I want to extract column 1 and two in first iteration, 3 and 4 in second iteration and so on. The result has to be in the form of matrix.
[1,] 2 1
[2,] 5 5
[3,] 10 1
My original codes:
data <- matrix(c(1,1,1,2,2,1,2,2,5,5,5,6,10,1,10,2,11,1,11,2), ncol = 2)
Center Matrix:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
[2,] 1 1 2 1 10 1
[3,] 5 5 5 6 11 2
[4,] 2 2 5 5 10 1
[5,] 2 1 5 6 5 5
[6,] 2 2 5 5 11 1
[7,] 2 1 5 5 10 1
[8,] 1 1 5 6 11 1
[9,] 2 1 5 5 10 1
[10,] 5 6 11 1 10 2
objCentroidDist <- function(data, centers) {
resultMatrix <- matrix(NA, nrow=dim(data)[1], ncol=dim(centers)[1])
for(i in 1:nrow(centers)) {
resultMatrix [,i] <- sqrt(rowSums(t(t(data)-centers[i, ])^2))
}
resultMatrix
}
objCentroidDist(data,centers)
I want the Result matrix to be as per below:
[1,][,2][,3]
[1,]
[2,]
[3,]
[4,]
[5,]
[7,]
[8,]
[9,]
[10]
My concern is, how to calculate the data-centers distance if the dimensions of the data matrix are two, and centers matrix are six. (to calculate the distance from the data matrix and every two columns in centers matrix). Each row of the centers matrix has three centers.
Something like this maybe?
m <- matrix(c(2,1,5,5,10,1), ncol = 6)
list.seq.pairs <- lapply(seq(1, ncol(m), 2), function(x) {
m[,c(x, x+1)]
})
> list.seq.pairs
[[1]]
[1] 2 1
[[2]]
[1] 5 5
[[3]]
[1] 10 1
And, in case you're wanting to iterate over multiple rows in a matrix,
you can expand on the above like this:
mm <- matrix(1:18, ncol = 6, byrow = TRUE)
apply(mm, 1, function(x) {
lapply(seq(1, length(x), 2), function(y) {
x[c(y, y+1)]
})
})
EDIT:
I'm really not sure what you're after exactly. I think, if you want each row transformed into a 2 x 3 matrix:
mm <- matrix(1:18, ncol = 6, byrow = TRUE)
list.mats <- lapply(1:nrow(mm), function(x){
a = matrix(mm[x,], ncol = 2, byrow = TRUE)
})
> list.mats
[[1]]
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[[2]]
[,1] [,2]
[1,] 7 8
[2,] 9 10
[3,] 11 12
[[3]]
[,1] [,2]
[1,] 13 14
[2,] 15 16
[3,] 17 18
If, however, you want to get to your results matrix- I think it's probably easiest to do whatever calculations you need to do while you're dealing with each row:
results <- t(apply(mm, 1, function(x) {
sapply(seq(1, length(x), 2), function(y) {
val1 = x[y] # Get item one
val2 = x[y+1] # Get item two
val1 / val2 # Do your calculation here
})
}))
> results
[,1] [,2] [,3]
[1,] 0.5000000 0.7500 0.8333333
[2,] 0.8750000 0.9000 0.9166667
[3,] 0.9285714 0.9375 0.9444444
That said, I don't understand what you're trying to do so this may miss the mark. You may have more luck if you ask a new question where you show example input and the actual expected output that you're after, with the actual values you expect.

R - How to rbind two lists while alternating their list elements

I'd like to know how to rbind two lists containing vectors into a data frame. e.g.
a<-list(c(1,2,3,4,5), c(2,3,4,5,6))
b<-list(c(3,4,5,6,7), c(4,5,6,7,8))
How to make a data frame from the two lists as the following:
1 2 3 4 5
3 4 5 6 7
2 3 4 5 6
4 5 6 7 8
So I need to take the first element of each list and then rbind them. Then take the second element of each list and then rbind to the previous data frame. I know I could use a for loop but is there a better and faster way to do this?
A variation on #DiscoSuperfly's answer that will work with objects of uneven length, like:
a <- list(c(1,2,3,4,5), c(2,3,4,5,6), c(1,1,1,1,1))
b <- list(c(3,4,5,6,7), c(4,5,6,7,8))
An answer:
L <- list(a,b)
L <- lapply(L, `length<-`, max(lengths(L)))
do.call(rbind, do.call(Map, c(rbind, L)))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 2 3 4 5
#[2,] 3 4 5 6 7
#[3,] 2 3 4 5 6
#[4,] 4 5 6 7 8
#[5,] 1 1 1 1 1
A solution using the purrr package.
library(purrr)
map2_dfr(a, b, ~data.frame(rbind(.x, .y)))
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 3 4 5 6 7
3 2 3 4 5 6
4 4 5 6 7 8
Reduce(rbind,Map(rbind,a,b))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 2 3 4 5 6
[4,] 4 5 6 7 8
Of the answers given, this seems the fastest when using two lists, thanks in large part to #thelatemail's suggested edit (thanks!).
Try this:
rbab<-do.call(rbind,c(a,b)); rbind(rbab[c(TRUE,FALSE),],rbab[c(FALSE,TRUE),])
Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 2 3 4 5 6
[4,] 4 5 6 7 8
Using c(TRUE,FALSE) above rbinds every other line a and b; then we flip that to c(FALSE,TRUE) to get the rest. Finally, we rbind it all together.
EDIT: Speed Test
Here's a larger scale speed test, for an objective comparison, which used two lists of 6000 elements each instead of the original a and b provided. A total of 100 iterations were used to estimate these statistics.
#Sample used:
a<-list(c(1,2,3,4,5),c(2,3,4,5,6))
b<-list(c(3,4,5,6,7),c(4,5,6,7,8))
a<-a[rep(1:2,3e3)]
b<-a[rep(1:2,3e3)]
#Here is the collaboration version (with #thelatemail):
func1 <- function(){
rbab<-do.call(rbind,c(a,b)); rbind(rbab[c(TRUE,FALSE),],rbab[c(FALSE,TRUE),])
}
#Here is my original version:
func2 <- function(){
rbind(do.call(rbind,c(a,b))[c(TRUE,FALSE),],do.call(rbind,c(a,b))[c(FALSE,TRUE),])
}
#Here's a base-R translation of #ycw's answer (*translated by thelatemail)
func3 <- function(){
do.call(rbind, Map(rbind, a, b))
}
#Here is #Onyambu's answer (also a great answer for its brevity!):
func4 <- function(){
Reduce(rbind,Map(rbind,a,b))
}
microbenchmark::microbenchmark(
func1(),func2(),func3(),func4()
)
Unit: microseconds
expr min lq mean median uq max neval
func1() 4.39 6.46 14.74 15.85 20.24 31.94 100
func2() 5789.26 6578.83 7114.21 7027.57 7531.52 9411.05 100
func3() 10279.50 10970.70 11611.90 11245.47 11866.70 16315.00 100
func4() 251098.18 265936.30 273667.45 275778.04 281740.77 291279.20 100
I created a new list with both a and b, and then make it a matrix. I am sure there is a more elegant way to do this.
a <- list(c(1,2,3,4,5), c(2,3,4,5,6), c(1,1,1,1,1))
b <- list(c(3,4,5,6,7), c(4,5,6,7,8))
# empty list
ab <- vector("list", length = length(a) + length(b))
# put a and b in correct locations
ab[seq(1, length(ab), 2)] <- a
ab[seq(2, length(ab), 2)] <- b
# make the matrix
res <- t(matrix(unlist(ab), nrow=5, ncol=length(a) + length(b)))
> ab <-rbind(unlist(a), unlist(b))
> ab <- rbind(ab[,1:5], ab[,6:10])
> ab
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 2 3 4 5 6
[4,] 4 5 6 7 8
I would do:
d <- t(as.data.frame(c(a,b)))
rbind( d[ seq(1,nrow(d),by=2) ,] , d[ seq(2,nrow(d),by=2) ,])

Build a square-ish matrix with a specified number of cells

I would like to write a function that transforms an integer, n, (specifying the number of cells in a matrix) into a square-ish matrix that contain the sequence 1:n. The goal is to make the matrix as "square" as possible.
This involves a couple of considerations:
How to maximize "square"-ness? I was thinking of a penalty equal to the difference in the dimensions of the matrix, e.g. penalty <- abs(dim(mat)[1]-dim(mat)[2]), such that penalty==0 when the matrix is square and is positive otherwise. Ideally this would then, e.g., for n==12 lead to a preference for a 3x4 rather than 2x6 matrix. But I'm not sure the best way to do this.
Account for odd-numbered values of n. Odd-numbered values of n do not necessarily produce an obvious choice of matrix (unless they have an integer square root, like n==9. I thought about simply adding 1 to n, and then handling as an even number and allowing for one blank cell, but I'm not sure if this is the best approach. I imagine it might be possible to obtain a more square matrix (by the definition in 1) by adding more than 1 to n.
Allow the function to trade-off squareness (as described in #1) and the number of blank cells (as described in #2), so the function should have some kind of parameter(s) to address this trade-off. For example, for n==11, a 3x4 matrix is pretty square but not as square as a 4x4, but the 4x4 would have many more blank cells than the 3x4.
The function needs to optionally produce wider or taller matrices, so that n==12 can produce either a 3x4 or a 4x3 matrix. But this would be easy to handle with a t() of the resulting matrix.
Here's some intended output:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Here's basically a really terrible start to this problem.
makemat <- function(n) {
n <- abs(as.integer(n))
d <- seq_len(n)
out <- d[n %% d == 0]
if(length(out)<2)
stop('n has fewer than two factors')
dim1a <- out[length(out)-1]
m <- matrix(1:n, ncol=dim1a)
m
}
As you'll see I haven't really been able to account for odd-numbered values of n (look at the output of makemat(7) or makemat(11) as described in #2, or enforce the "squareness" rule described in #1, or the trade-off between them as described in #3.
I think the logic you want is already in the utility function n2mfrow(), which as its name suggests is for creating input to the mfrow graphical parameter and takes an integer input and returns the number of panels in rows and columns to split the display into:
> n2mfrow(11)
[1] 4 3
It favours tall layouts over wide ones, but that is easily fixed via rev() on the output or t() on a matrix produced from the results of n2mfrow().
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
m <- matrix(seq_len(prod(dims)), nrow = dims[1], ncol = dims[2])
m
}
Notice I have to special-case n = 3 as we are abusing a function intended for another use and a 3x1 layout on a plot makes more sense than a 2x2 with an empty space.
In use we have:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> makemat(11, wide = TRUE)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Edit:
The original function padded seq_len(n) with NA, but I realised the OP wanted to have a sequence from 1 to prod(nrows, ncols), which is what the version above does. The one below pads with NA.
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
s <- rep(NA, prod(dims))
ind <- seq_len(n)
s[ind] <- ind
m <- matrix(s, nrow = dims[1], ncol = dims[2])
m
}
I think this function implicitly satisfies your constraints. The parameter can range from 0 to Inf. The function always returns either a square matrix with sides of ceiling(sqrt(n)), or a (maybe) rectangular matrix with rows floor(sqrt(n)) and just enough columns to "fill it out". The parameter trades off the selection between the two: if it is less than 1, then the second, more rectangular matrices are preferred, and if greater than 1, the first, always square matrices are preferred. A param of 1 weights them equally.
makemat<-function(n,param=1,wide=TRUE){
if (n<1) stop('n must be positive')
s<-sqrt(n)
bottom<-n-(floor(s)^2)
top<-(ceiling(s)^2)-n
if((bottom*param)<top) {
rows<-floor(s)
cols<-rows + ceiling(bottom / rows)
} else {
cols<-rows<-ceiling(s)
}
if(!wide) {
hold<-rows
rows<-cols
cols<-hold
}
m<-seq.int(rows*cols)
dim(m)<-c(rows,cols)
m
}
Here is an example where the parameter is set to default, and equally trades off the distance equally:
lapply(c(2,3,9,11),makemat)
# [[1]]
# [,1] [,2]
# [1,] 1 2
#
# [[2]]
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
#
# [[4]]
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
Here is an example of using the param with 11, to get a 4x4 matrix.
makemat(11,3)
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
What about something fairly simple and you can handle the exceptions and other requests in a wrapper?
library(taRifx)
neven <- 8
nodd <- 11
nsquareodd <- 9
nsquareeven <- 16
makemat <- function(n) {
s <- seq(n)
if( odd(n) ) {
s[ length(s)+1 ] <- NA
n <- n+1
}
sq <- sqrt( n )
dimx <- ceiling( sq )
dimy <- floor( sq )
if( dimx*dimy < length(s) ) dimy <- ceiling( sq )
l <- dimx*dimy
ldiff <- l - length(s)
stopifnot( ldiff >= 0 )
if( ldiff > 0 ) s[ seq( length(s) + 1, length(s) + ldiff ) ] <- NA
matrix( s, nrow = dimx, ncol = dimy )
}
> makemat(neven)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
> makemat(nodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 NA
> makemat(nsquareodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 NA
[3,] 3 7 NA
[4,] 4 8 NA
> makemat(nsquareeven)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16

Resources