Creating shuffled numbers in R - r

As a result of seeing THIS EXAMPLE, I was wondering how I could create one set of 15 shuffled orderings of 1 through 4 in R?
On THIS Website, you can get 1 Set of 15 shuffled Numbers
Ranging: From 1 to 4
As an example, on my run I got:
Set #1:
3, 2, 2, 1, 1, 1, 3, 2, 2, 3, 2, 1, 3, 4, 1
Is there a way I can replicate the above in R?

If I understood correctly your question, at first it comes to mind a solution like the following one: very basic, but it does its job.
size <- 40
vec <- sample(1:4, size = size, replace = TRUE)
while(length(unique(vec)) < 4){
vec <- sample(1:4, size = size, replace = TRUE)
}
vec
The while cycle will not go on for long as it's very unlikely that a digit does not appear in the random vector vec if you sample 40 times.
Of course you can change the size of your vector, the code will still work, except you want vec to be < 4; in that case, the loop will go on indefinitely.

Related

Removing a number of consecutive rows that fulfills a certain condition

I am trying to remove consecutive rows in a dataframe if all the values in the rows are less than 1 and it exceeds e.g 4 rows.
Lets say we have a column [0.1, 0, 5, 4, 0.2, 0.1, 0, 0, 0, 4, 9, 10]. Then I would like to remove only the middle part [0.2, 0.1, 0, 0, 0] and have left [0.1, 0, 5, 4, 4, 9, 10]. The thing is I can easily do this by using a for loop, however I am dealing with over 3 million data points and it takes way too long. Therefore I am looking for a solution that makes use of vectorization in R. Does anyone know what function I can use?
Thanks in advance!
You can try to perform a convolution/correlation over your dataset. If all elements in 4 consecutive rows are less than 1, then their sum is less than 4 * m, with m being the number of columns of your dataset. Then, it is a matter of upsampling the result correctly. Here is a complete example, with NumPy array (that you can easily extract from your DataFrame with df.to_numpy()):
import numpy as np
"""
Notation: row whose elements are all < 1, will be called "target row"
Task: Remove every target row in a cluster of 4 consecutive target rows
Input: 11 x 5 dataset with target rows [0, 1, 2, 3, 4, 7]
Output: pruned dataset with rows [5, 6, 7, 8, 9, 10]
(Note that target row 7 must be kept because it's separated from the others)
"""
# Input
n, m = 11, 5
ar = np.random.rand(n, m)
ar[[5, 6, 8, 9, 10]] += 1.
min_rows = 4
# Find all target rows
sums = (ar.sum(axis=1) < ar.shape[1]).astype(np.float32)
print(f" Sums: {sums}")
# Find centers of clusters with 4 consecutive target rows
kernel = np.ones((min_rows,))
output = np.correlate(sums, kernel, mode="same")
print(f" Output: {output}")
mask = output == min_rows
print(f" Mask: {mask.astype(np.float32)}")
# Find all elements in the clusters
mask_ids = np.nonzero(mask)[0]
center = min_rows // 2
rng = np.arange(-center, center + (min_rows % 2 != 0), dtype=np.int32)
ids = (rng + mask_ids.reshape(-1, 1)).ravel()
mask[ids] = True
print(f"New Mask: {mask.astype(np.float32)}")
# mask the dataset
ar = ar[~mask]

Finding the statistical mode of a vector: When having more than single mode — return the last mode

When calculating the statistical mode of a vector, there is often more than one mode:
c(1, 1, 2, 2, 3, 4) # mode is both 1 and 2
In such scenarios, if I want to decide between two (or more) possible values, I use fmode() from {collapse} package, which offers, through the ties argument, 3 possible methods for deciding:
ties
an integer or character string specifying the method to resolve ties between multiple possible > modes i.e. multiple values with the maximum frequency or sum of weights:
Int.
String
Description
1
first
take the first occurring mode.
2
min
take the smallest of the possible modes.
3
max
take the largest of the possible modes.
Example of fmode()
library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2) # 4 modes here: 1, 2, 5, -3
fmode(my_vec, ties = "first")
#> [1] 1
fmode(my_vec, ties = "min")
#> [1] -6
fmode(my_vec, ties = "max")
#> [1] 5
My Question
I'm looking for a "last" method — i.e., whenever there's more than one mode, return the "last" mode. But unfortunately, fmode() doesn't have a "last" method.
So if we return to my example, what I wish is that for the vector:
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
I want a function that does
custom_mode_func(my_vec, method = "last")
## [1] 2
The only option you have with collapse is sorting the data beforehand e.g.
library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
data.frame(v = my_vec, g = gl(2, 5)) %>%
roworder(g) %>%
tfm(t = data.table::rowid(g)) %>%
roworder(g, -t) %>%
gby(g) %>%
smr(last = fmode(v, ties = "first"))
The reason revdoesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by fmode itself). So in your code rev is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling fmode.defaultdirectly (to optimize on method dispatch) would be the fastest solution. I can think about adding a "last" mode if I find time for that.

r sequence problem - max number of changes in a given sequence

Can somebody help me understand a CS problem.
The problem is the New York Time Rollercoaster problem.
I have a queue:
queue <- seq(from = 1, to = 5)
1 2 3 4 5
A person can bribe another person who is ahead of them in the queue but by only a maximum of 2 times. Thus a queue sequence might look like:
Ride: 1, 2, 3, 4, 5 # Original queue
Ride: 1, 2, 3, 5, 4 # 5 bribes number 4
Ride: 1, 2, 5, 3, 4 # 5 bribes number 3 and thus runs out of bribes and cannot move further (it does not state in the problem if 3 can "re-bribe" 5 so I assume they cannot).
Ride: 2, 1, 5, 3, 4 # 2 bribes number 1
So given the input c(1, 2, 3, 4, 5) what are the minimum number of swaps it would take to get to the final output which would be c(2, 1, 5, 3, 4).
Python code from here:
def minimumBribes(q):
moves = 0
for pos, val in enumerate(q):
if (val-1) - pos > 2:
return "Too chaotic"
for j in xrange(max(0,val-2), pos):
if q[j] > val:
moves+=1
return moves
I am trying to re-create this in R and understand the solution.
Here's a way I think -
minimumBribes <- function(final_q) {
change <- final_q - seq_along(final_q)
if(any(change > 2)) return("Too chaotic!")
sum(change[change > 0])
}
minimumBribes(q = c(2, 1, 5, 3, 4))
[1] 3
Explanation -
initial_q <- 1:5
final_q <- c(2, 1, 5, 3, 4)
# calculate change in position; +ve is gain and -ve is loss
change <- final_q - initial_q
[1] 1 -1 2 -1 -1
# it is clear that if some gained x posn combined then other(s) lost x posn combined
# i.e. sum of posn gains and losses will always be 0
# therefore, to get min total swaps, simply add either gains or losses
# which in a way implies the most direct path from initial_q to final_q
sum(change[change > 0])
[1] 3

R-Randomly pick a number and do it over and over until a condition is achivied

I want to randomly pick a number from a vector with 8 elements that sums to 35. If the number is 0 look for another number. If the number is greater than 0, make this number -1. Do this in a loop until the sum of the vector is 20. How can I do this in R?
For example: vec<-c(2,3,6,0,8,5,6,5)
Pick a number from this list randomly and make the number -1 until the sum of the elements becomes 20.
I'm really really not sure that is what you want, but for what I understand of your question, here is my solution. You'll get most of the concept and key fonctions in my script. Use that and help() to understand them and optimize it.
vec <- c(2, 3, 6, 0, 8, 5, 6, 5)
summ <- 0
new.vec <- NULL
iter <- 1
while(summ<20) {
selected <- sample(vec,1)
if(selected!=0) new.vec[iter] <- selected-1
summ <- sum(new.vec)
iter <- iter+1
}
Try this:
vec <- c(2, 3, 6, 0, 8, 5, 6, 5)
#just setting the seed for reproducibility
set.seed(19)
tabulate(sample(rep(seq_along(vec),vec),20))
#[1] 0 2 4 0 4 5 3 2

Variable sample upper value in R

I have the following matrix
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(x) = c("Y","Z")
m <-data.frame(m)
I am trying to create a random number in each row where the upper limit is a number based on a variable value (in this case 1*Y based on each row's value for for Z)
I currently have:
samp<-function(x){
sample(0:1,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
which work works well applying the sample function independently to each row, but I always get an error when I try to alter the x in sample. I thought I could do something like this:
samp<-function(x){
sample(0:m$Z,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
but I guess that was wishful thinking.
Ultimately I want the result:
Y Z randoms
2 5 4
4 7 7
3 9 3
5 3 1
1 7 6
Any ideas?
The following will sample from 0 to x$Y for each row, and store the result in randoms:
x$randoms <- sapply(x$Y + 1, sample, 1) - 1
Explanation:
The sapply takes each value in x$Y separately (let's call this y), and calls sample(y + 1, 1) on it.
Note that (e.g.) sample(y+1, 1) will sample 1 random integer from the range 1:(y+1). Since you want a number from 0 to y rather than 1 to y + 1, we subtract 1 at the end.
Also, just pointing out - no need for replace=T here because you are only sampling one value anyway, so it doesn't matter whether it gets replaced or not.
Based on #mathematical.coffee suggestion and my edited example this is the slick final result:
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(m) = c("Y","Z")
m <-data.frame(m)
samp<-function(x){
sample(Z + 1, 1)}
m$randoms <- sapply(m$Z + 1, sample, 1) - 1

Resources