Let's say I have a vector of integers 1:6
w=1:6
I am attempting to obtain a matrix of 90 rows and 6 columns that contains the multinomial combinations from these 6 integers taken as 3 groups of size 2.
6!/(2!*2!*2!)=90
So, columns 1 and 2 of the matrix would represent group 1, columns 3 and 4 would represent group 2 and columns 5 and 6 would represent group 3. Something like:
1 2 3 4 5 6
1 2 3 5 4 6
1 2 3 6 4 5
1 2 4 5 3 6
1 2 4 6 3 5
...
Ultimately, I would want to expand this to other multinomial combinations of limited size (because the numbers get large rather quickly) but I am having trouble getting things to work. I've found several functions that do binomial combinations (only 2 groups) but I could not locate any functions that do this when the number of groups is greater than 2.
I've tried two approaches to this:
Building up the matrix from nothing using for loops and attempting things with the reshape package (thinking that might be something there for this with melt() )
working backwards from the permutation matrix (720 rows) by attempting to retain unique rows within groups and or removing duplicated rows within groups
Neither worked for me.
The permutation matrix can be obtained with
library(gtools)
dat=permutations(6, 6, set=TRUE, repeats.allowed=FALSE)
I think working backwards from the full permutation matrix is a bit excessive but I'm tring anything at this point.
Is there a package with a prebuilt function for this? Anyone have any ideas how I shoud proceed?
Here is how you can implement your "working backwards" approach:
gps <- list(1:2, 3:4, 5:6)
get.col <- function(x, j) x[, j]
is.ordered <- function(x) !colSums(diff(t(x)) < 0)
is.valid <- Reduce(`&`, Map(is.ordered, Map(get.col, list(dat), gps)))
dat <- dat[is.valid, ]
nrow(dat)
# [1] 90
Related
this may be a simple question but I'm fairly new to R.
What I want to do is to perform some kind of addition on the indexes of a list, but once I get to a maximum value it goes back to the first value in that list and start over from there.
for example:
x <-2
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
data[x]
1
data[x+12]
1
data[x+13]
3
or something functionaly equivalent. In the end i want to be able to do something like
v=6
x=8
y=9
z=12
values <- c(v,x,y,z)
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
set <- c(data[values[1]],data[values[2]], data[values[3]],data[values[4]])
set
5 7 8 11
values <- values + 8
set
1 3 4 7
I've tried some stuff with additon and substraction to the lenght of my list but it does not work well on the lower numbers.
I hope this was a clear enough explanation,
thanks in advance!
We don't need a loop here as vectors can take vectors of length >= 1 as index
data[values]
#[1] 5 7 8 11
NOTE: Both the objects are vectors and not list
If we need to reset the index
values <- values + 8
ifelse(values > length(data), values - length(data) - 1, values)
#[1] 1 3 4 7
This question already has answers here:
How to produce random integer numbers from 0 - 100 but by 10s only?
(2 answers)
Closed 4 years ago.
I am new to R and would appreciate any help on this 2 step task.
I need to write the R codes to create a list of numbers which are mutiples of 3 between 1 and 40.
The second part would need the codes to randomly select 6 numbers from the list above.
To generate number multiple of 3 try seq with by=3. And then use sample to pick 6 random samples out of that sequence. I have used set.seed(1) to get fixed output:
set.seed(1)
sample(seq(3,40,by=3), 6)
#[1] 12 15 21 30 6 24
Here is a solution in a step-by-step manner:
# 1. List of numbers between 1 and 40
list_numbers <- seq(1:40)
# 2. Filter
list_filter <- sapply(list_numbers, function(x) {x %% 3 == 0})
# 3. List of numbers multiple 3
list_numbers_multiple_3 <- list_numbers[list_filter]
# 4. Select 6 random numbers
sample(list_numbers_multiple_3, 6)
I wish to find the speediest way to find up to 1000 possible combinations of 'n' integers to find a target integer.
For example. Say I wanted to sum to the number '20'. I want to find up to 1000 combinations of four integers that sum to this number. The integers can repeat themselves. I also have a condition that the integer must not be smaller than a particular number, in this case 4.
target<-20 #the number I wish to sum to
lowest<-4 #the smallest integer I allow
size<-4 #the number of integers I wish to use to sum
maxposs <- target - ((size-1) * lowest) #given the lowest, this is the max possible integer. In my example it is 8.
This is how I have started to work this out. Using combn to find all combinations of the four chosen integers and then filtering by those that sum to my target.
m <- combn(rep(lowest:maxposs,size), size)
m1<- m[,colSums(m)==target]
Here, 'm1' has 245 columns. There are only this many solutions. The last few columns:
# [,238] [,239] [,240] [,241] [,242] [,243] [,244] [,245]
#[1,] 4 4 4 4 4 4 5 5
#[2,] 5 5 5 6 7 4 6 4
#[3,] 7 4 5 4 4 5 4 5
#[4,] 4 7 6 6 5 7 5 6
However, in my real application, I can be dealing with very high integers (summing up to 1000) and want to limit myself to a random sample of 1000 possible combinations. As this is for a randomization statistical test, speed is of the essence. I wonder if anyone knows of a faster way of doing this. My way doesn't feel intuitively quick.
my_matrix <- matrix(nrow = 1000, ncol = 4)
i <- 1
nn <- 1000
while(i <= 1000){
x <- sample(x = 4:nn, size = 3)
y = nn - sum(x)
if(y >= 4){
my_matrix[i, ] <- c(x, y)
i <- i + 1
}
}
Per Gavin's suggestion, redone with a preallocated matrix. Now this runs in .158 seconds, twice as fast, and probably scales better.
This question already has answers here:
Create a Data Frame of Unequal Lengths
(6 answers)
Closed 9 years ago.
I have the following elementary issue in R.
I have a for (k in 1:x){...} cycle which produces numerical vectors whose length depends on k.
For each value of k I produce a single numerical vector.
I would like to collect them as rows of a data frame in R, if possible. In other words, I would like to introduce a data frame data s.t.
for (k in 1:x) {
data[k,] <- ...
}
where the dots represent the command producing the vector with length depending on k.
Unfortunately, as far as I know, the length of the rows of a dataframe in R is constant, as it is a list of vectors of equal length. I have already tried to complete each row with a suitable number of zeroes to arrive at a constant length (in this case equal to x). I would like to work "dynamically", instead.
I do not think that this issue is equivalent to merge vectors of different lengths in a dataframe; due to the if cycle, only 1 vector is known at each step.
Edit
A very easy example of what I mean. For each k, I would like to write the vector whose components are 1,2,...,k and store it as kth row of the dataframe data. In the above setting, I would write
for (k in 1:x) {
data[k,] <- seq(1,k,1)
}
As the length of seq(1,k,1) depends on k the code does not work.
You could consider using ldply from plyr here.
set.seed(123)
#k is the length of each result
k <- sample( 5 , 3 , repl = TRUE )
#[1] 2 4 3
# Make a list of vectors, each a sequence from 1:k
ll <- lapply( k , function(x) seq_len(x) )
#[[1]]
#[1] 1 2
#[[2]]
#[1] 1 2 3 4
#[[3]]
#[1] 1 2 3
# take our list and rbind it into a data.frame, filling in missing values with NA
ldply( ll , rbind)
# 1 2 3 4
#1 1 2 NA NA
#2 1 2 3 4
#3 1 2 3 NA
Say I have a data set x and do the following kmeans cluster:
fit <- kmeans(x,2)
My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center?
For example: If x=c(1.5,1.4,1.45,.2,.3,.3) , then fit$cluster should result in (1,1,1,2,2,2) but not result in (2,2,2,1,1,1)
Similarly, if x=c(1.5,.2,1.45,1.4,.3,.3) then fit$cluster should return (1,2,1,1,2,2), instead of (2,1,2,2,1,1)
Right now, fit$cluster seems to label the cluster numbers randomly. I've looked into documentation but haven't been able to find anything. Please let me know if you can help!
I had a similar problem. I had a vector of ages that I wanted to separate into 5 factor groups based on a logical ordinal set. I did the following:
I ran the k-means function:
k5 <- kmeans(all_data$age, centers = 5, nstart = 25)
I built a data frame of the k-means indexes and centres; then arranged it by centre value.
kmeans_index <- as.numeric(rownames(k5$centers))
k_means_centres <- as.numeric(k5$centers)
k_means_df <- data_frame(index=kmeans_index, centres=k_means_centres)
k_means_df <- k_means_df %>%
arrange(centres)
Now that the centres are in the df in ascending order, I created my 5 element factor list and bound it to the data frame:
factors <- c("very_young", "young", "middle_age", "old", "very_old")
k_means_df <- cbind(k_means_df, factors)
Looks like this:
> k_means_df
index centres factors
1 2 23.33770 very_young
2 5 39.15239 young
3 1 55.31727 middle_age
4 4 67.49422 old
5 3 79.38353 very_old
I saved my cluster values in a data frame and created a dummy factor column:
cluster_vals <- data_frame(cluster=k5$cluster, factor=NA)
Finally, I iterated through the factor options in k_means_df and replaced the cluster value with my factor/character value within the cluster_vals data frame:
for (i in 1:nrow(k_means_df))
{
index_val <- k_means_df$index[i]
factor_val <- as.character(k_means_df$factors[i])
cluster_vals <- cluster_vals %>%
mutate(factor=replace(factor, cluster==index_val, factor_val))
}
Voila; I now have a vector of factors/characters that were applied based on their ordinal logic to the randomly created cluster vector.
# A tibble: 3,163 x 2
cluster factor
<int> <chr>
1 4 old
2 2 very_young
3 2 very_young
4 2 very_young
5 3 very_old
6 3 very_old
7 4 old
8 4 old
9 2 very_young
10 5 young
# ... with 3,153 more rows
Hope this helps.
K-means is a randomized algorithm. It is actually correct when the labels are not consistent across runs, or ordered in "ascending" order.
But you can of course remap the labels as you like, you know...
You seem to be using 1-dimensional data. Then k-means is actually not the best choice for you.
In contrast to 2- and higher-dimensional data, 1-dimensional data can efficiently be sorted. If your data is 1-dimensional, use an algorithm that exploits this for efficiency. There are much better algorithms for 1-dimensional data than for multivariate data.