Saving vectors of different lengths in a matrix/data frame

Saving vectors of different lengths in a matrix/data frame - r

I have a numeric called area of length 166860. This consists of 412 different elements, most of length 405 and some of length 809. I have their start and end ids.
My goal is to extract them and put them in a matrix/data frame with 412 columns
Right now, I'm trying this code:
m = matrix(NA,ncol=412, nrow=809)
for (j in 1:412){
temp.start = start.ids[j]
temp.end = end.ids[j]
m[,j] = area[temp.start:temp.end]
}
But I just end up with this error message:
"Error in m[, j] = area[temp.start:temp.end] :
number of items to replace is not a multiple of replacement length"

Here's a quite easy approach:
Example data:
area <- c(1:4, 1:5, 1:6, 1:3)
# [1] 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6 1 2 3
start.ids <- which(area == 1)
# [1] 1 5 10 16
end.ids <- c(which(area == 1)[-1] - 1, length(area))
# [1] 4 9 15 18
Create a list with one-row matrices:
mats <- mapply(function(x, y) t(area[seq(x, y)]), start.ids, end.ids)
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
#
# [[3]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 3 4 5 6
#
# [[4]]
# [,1] [,2] [,3]
# [1,] 1 2 3
Use the function rbind.fill.matrix from the plyr package to create the matrix and transpose it (t):
library(plyr)
m <- t(rbind.fill.matrix(mats))
# [,1] [,2] [,3] [,4]
# 1 1 1 1 1
# 2 2 2 2 2
# 3 3 3 3 3
# 4 4 4 4 NA
# 5 NA 5 5 NA
# 6 NA NA 6 NA

You are setting the column length to be 412, and matrices cannot be flexible/variable in their length. This means the value you assign to the columns must either have a length of 412 or something less that can fill into a length of 412. From the manual on ?matrix:
If there are too few elements in data to fill the matrix, then the elements in data are recycled. If data has length zero, NA of an appropriate type is used for atomic vectors (0 for raw vectors) and NULL for lists.
As another commenter said, you may have intended to assign to the rows in which case m[j, ] is the way to do that, but you have to then pad the value you are assigning with NA or allow NA's to be filled so the value being assigned is always of length 809.
m = matrix(NA,ncol=412, nrow=809)
for (j in 1:412){
temp.start = start.ids[j]
temp.end = end.ids[j]
val <- area[temp.start:temp.end]
m[j, ] = c(val, rep(NA, 809 - length(val)))
}

How about this? I've manufactured some sample data:
#here are the random sets of numbers - length either 408 or 809
nums<-lapply(1:412,function(x)runif(sample(c(408,809),1)))
#this represents your numeric (one list of all the numbers)
nums.vec<-unlist(nums)
#get data about the series (which you have)
nums.lengths<-sapply(nums,function(x)length(x))
nums.starts<-cumsum(c(1,nums.lengths[-1]))
nums.ends<-nums.starts+nums.lengths-1
new.vec<-unlist(lapply(1:412,function(x){
v<-nums.vec[nums.starts[x]:nums.ends[x]]
c(v,rep(0,(809-length(v))))
}))
matrix(new.vec,ncol=412)

What about
m[j,] = area[temp.start:temp.end]
?
Edit:
a <- area[temp.start:temp.end]
m[1:length(a),j] <- a

Maybe others have better answers. As I see it, you have two options:
Change m[,j] to m[1:length(area[temp.start:temp.end]),j] and then you will not get an error but you would have some NA's left.
Use a list of matrices instead, so you would get different dimensions for each matrix.

Related

How to vectorize this operation?

I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK

The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1

Sampling a number of indivuals in subgroups with no repeating group constellation in R

I have a number of individuals that I want to - randomly - divide in subgroups of size groupsize. This process I want to repeat n_group times - with no repeating group constellation.
How can I achieve this in R?
I tried the following so far:
set.seed(1)
individuals <- 1:6
groupsize <- 3
n_groups <- 4
for(i in 1:n_groups) { print(sample(individuals, groupsize))}
[1] 1 4 3
[1] 1 2 6
[1] 3 2 6
[1] 3 1 5
..but am not sure whether that really does not lead to repeating constellations..?
Edit: After looking at the first suggestions and answers I realized, that another restriction could be interesting to me (sorry for not seeing it upfront..).
Is there (in the concrete example above) a way to ensure, that every individual was in contact with every other individual?

Based on your edited question, I assuma that you want to make sure that all indivuals are in at least one subgroup?
Then this might be the solution:
individuals <- 1:6
groupsize <- 3
n_groups <- 4
#sample groups
library(RcppAlgos)
#initialise
answer <- matrix()
# If the length of all unique elements in the answer is smaller than
# the number of individuals, take a new sample
while (length(unique(as.vector(answer))) < length(individuals)) {
answer <- comboSample(individuals, groupsize, n = n_groups)
# Line below isfor demonstration only
#answer <- comboSample(individuals, groupsize, n = n_groups, seed = 123)
}
# sample answer with seed = 123 (see commented line above)
# [,1] [,2] [,3]
# [1,] 1 3 4
# [2,] 1 3 6
# [3,] 2 3 5
# [4,] 2 3 4
test for groups that contain not every individual
# Test with the following matrix
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 3 4
# [3,] 1 4 5
# [4,] 2 3 4
# Note that individual '6' is not present
answer <- matrix(c(1,2,3,1,3,4,1,4,5,2,3,4), nrow = 4, ncol = 3)
while (length(unique(as.vector(answer))) < length(individuals)) {
answer <- comboSample(individuals, groupsize, n = n_groups)
}
# is recalculated to (in this case) the following answer
# [,1] [,2] [,3]
# [1,] 4 5 6
# [2,] 3 4 5
# [3,] 1 3 6
# [4,] 2 4 5
PASSED ;-)

You can use while to dynamically update your combination set, which avoids duplicates, e.g.,
res <- c()
while (length(res) < pmin(n_groups, choose(length(individuals), groupsize))) {
v <- list(sort(sample(individuals, groupsize)))
if (!v %in% res) res <- c(res, v)
}
which gives
> res
[[1]]
[1] 2 5 6
[[2]]
[1] 2 3 6
[[3]]
[1] 1 5 6
[[4]]
[1] 1 2 6

Find all combinations of a set of numbers that add up to a certain total

I've seen a few solutions to similar problems, but they all require iteration over the number of items to be added together.
Here's my goal: from a list of numbers, find all of the combinations (without replacement) that add up to a certain total. For example, if I have numbers 1,1,2,3,5 and total 5, it should return 5,2,3, and 1,1,3.
I was trying to use combn but it required you to specify the number of items in each combination. Is there a way to do it that allows for solution sets of any size?

This is precisely what combo/permuteGeneral from RcppAlgos (I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool and arrangements.
Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral to find all combinations of our vector that meet a specific criteria:
vec <- c(1,1,2,3,5) ## using variables from #r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5
library(RcppAlgos)
lapply(seq_along(uni), function(x) {
comboGeneral(uni, x, freqs = myRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = ans)
})
[[1]]
[,1]
[1,] 5
[[2]]
[,1] [,2]
[1,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 1 3
[[4]]
[,1] [,2] [,3] [,4] ## no solutions of length 4
These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:
## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))
rle(bigVec)
Run Length Encoding
lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...
bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12
comboCount(bigUni, len, freqs = bigRep)
[1] 32248100
All 300000+ results are returned very quickly:
system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = bigAns))
user system elapsed
0.273 0.004 0.271
head(bigTest)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 2 3 4 25 26 26 26 27 28 30
[2,] 1 1 2 3 5 24 26 26 26 27 28 30
[3,] 1 1 2 3 5 25 25 26 26 27 28 30
[4,] 1 1 2 3 7 24 24 26 26 27 28 30
[5,] 1 1 2 3 7 24 25 25 26 27 28 30
[6,] 1 1 2 3 7 24 25 26 26 26 28 30
nrow(bigTest)
[1] 280018
all(rowSums(bigTest) == bigAns)
[1] TRUE
Addendum
I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4) can easily be treated as 1:4, however treating c(3.98486 7.84692 0.0038937 7.4879) as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).

I took your combn idea and looped over the possible sizes of the sets.
func = function(x, total){
M = length(x)
y = NULL
total = 15
for (m in 1:M){
tmp = combn(x, m)
ind = which(colSums(tmp) == total)
if (length(ind) > 0){
for (j in 1:length(ind))
y = c(y, list(tmp[,ind[j]]))
}
}
return (unique(lapply(y, sort)))
}
x = c(1,1,2,3,5,8,13)
> func(x, 15)
[[1]]
[1] 2 13
[[2]]
[1] 1 1 13
[[3]]
[1] 2 5 8
[[4]]
[1] 1 1 5 8
[[5]]
[1] 1 1 2 3 8
Obviously, this will have problems as M grows since tmp will get big pretty quickly and the length of y can't be (maybe?) pre-determined.

Similar to mickey's answer, we can use combn inside another looping mechanism. I'll use lapply:
vec <- c(1,1,2,3,5)
ans <- 5
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v[, colSums(v) == ans, drop = FALSE]
}))
# [[1]]
# [,1]
# [1,] 5
# [[2]]
# [,1]
# [1,] 2
# [2,] 3
# [[3]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 3
You can omit the Filter(length, portion, though it may return a number of empty matrices. They're easy enough to deal with and ignore, I just thought removing them would be aesthetically preferred.
This method gives you a matrix with multiple candidates in each column, so
ans <- 4
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v[, colSums(v) == ans, drop = FALSE]
}))
# [[1]]
# [,1] [,2]
# [1,] 1 1
# [2,] 3 3
# [[2]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2
If duplicates are a problem, you can always do:
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v <- v[, colSums(v) == ans, drop = FALSE]
v[,!duplicated(t(v)),drop = FALSE]
}))
# [[1]]
# [,1]
# [1,] 1
# [2,] 3
# [[2]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2

Now here is a solution involving gtools:
# Creating lists of all permutations of the vector x
df1 <- gtools::permutations(n=length(x),r=length(x),v=1:length(x),repeats.allowed=FALSE)
ls1 <- list()
for(j in 1:nrow(df1)) ls1[[j]] <- x[df1[j,1:ncol(df1)]]
# Taking all cumulative sums and filtering entries equaling our magic number
sumsCum <- t(vapply(1:length(ls1), function(j) cumsum(ls1[[j]]), numeric(length(x))))
indexMN <- which(sumsCum == magicNumber, arr.ind = T)
finalList <- list()
for(j in 1:nrow(indexMN)){
magicRow <- indexMN[j,1]
magicCol <- 1:indexMN[j,2]
finalList[[j]] <- ls1[[magicRow]][magicCol]
}
finalList <- unique(finalList)
where x = c(1,1,2,3,5) and magicNumber = 5. This is a first draft, I am sure it can be improved here and there.

Not the most efficient but the most compact so far:
x <- c(1,1,2,3,5)
n <- length(x)
res <- 5
unique(combn(c(x,rep(0,n-1)), n, function(x) x[x!=0][sum(x)==res], FALSE))[-1]
# [[1]]
# [1] 1 1 3
#
# [[2]]
# [1] 2 3
#
# [[3]]
# [1] 5
#

How to delete an element at a time from a vector while retaining the others?

I have a vector x containing 5 elements.
x <- (1,2,3,4,5)
I would want to delete one element at each iteration and retain other elements in the vector.(as shown below)
x <- (2,3,4,5) #vector after first iteration
x <- (1,3,4,5) #vector after second iteration
x <- (1,2,4,5) #vector after third iteration
x <- (1,2,3,5) #vector after fourth iteration
and also, is it possible to store these new vectors in a list?
is there a way to extend this to multiple vectors?

You could use combn:
combn(5,4)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 2
[2,] 2 2 2 3 3
[3,] 3 3 4 4 4
[4,] 4 5 5 5 5
To get the data as a list:
as.list(data.frame(combn(5,4)))
To use this on multiple vectors or a matrix, first transform it into a data.frame, to make it easier for lapply to go over the length (columns) of the data.frame. Then you can use lapply with combn like so:
mat <- data.frame(matrix(1:10,5))
lapply(mat, function(x) combn(x,length(x)-1))
$X1
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 2
[2,] 2 2 2 3 3
[3,] 3 3 4 4 4
[4,] 4 5 5 5 5
$X2
[,1] [,2] [,3] [,4] [,5]
[1,] 6 6 6 6 7
[2,] 7 7 7 8 8
[3,] 8 8 9 9 9
[4,] 9 10 10 10 10

We can do
lapply(seq_along(x), function(i) x[-i])

drop_n <- function(n, x) x[-n]
lapply(1:5, drop_n, x)

Here you have a way to get what you want. You only need to change the parameter n to make it more general
# Generate a list
L <- list()
# Define the number of elements
n <- 5
# Define the values
values <- 1:n
# Complete the list
for (i in 1:n){
L[[i]] <- values[-i]
}

Questions about missing data

In a matrix, if there is some missing data recorded as NA.
how could I delete rows with NA in the matrix?
can I use na.rm?

na.omit() will take matrices (and data frames) and return only those rows with no NA values whatsoever - it takes complete.cases() one step further by deleting the FALSE rows for you.
> x <- data.frame(c(1,2,3), c(4, NA, 6))
> x
c.1..2..3. c.4..NA..6.
1 1 4
2 2 NA
3 3 6
> na.omit(x)
c.1..2..3. c.4..NA..6.
1 1 4
3 3 6

I think na.rm usually only works within functions, say for the mean function. I would go with complete.cases: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/complete.cases.htm
let's say you have the following 3x3-matrix:
x <- matrix(c(1:8, NA), 3, 3)
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
then you can get the complete cases of this matrix with
y <- x[complete.cases(x),]
> y
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
The complete.cases-function returns a vector of truth values that says whether or not a case is complete:
> complete.cases(x)
[1] TRUE TRUE FALSE
and then you index the rows of matrix x and add the "," to say that you want all columns.

If you want to remove rows that contain NA's you can use apply() to apply a quick function to check each row. E.g., if your matrix is x,
goodIdx <- apply(x, 1, function(r) !any(is.na(r)))
newX <- x[goodIdx,]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Saving vectors of different lengths in a matrix/data frame - r

What about m[j,] = area[temp.start:temp.end] ? Edit: a <- area[temp.start:temp.end] m[1:length(a),j] <- a

Maybe others have better answers. As I see it, you have two options: Change m[,j] to m[1:length(area[temp.start:temp.end]),j] and then you will not get an error but you would have some NA's left. Use a list of matrices instead, so you would get different dimensions for each matrix.

Related

How to vectorize this operation?

Sampling a number of indivuals in subgroups with no repeating group constellation in R

Find all combinations of a set of numbers that add up to a certain total

How to delete an element at a time from a vector while retaining the others?

Questions about missing data

Categories

Resources