How to exclusive a set from a large set in R

How to exclusive a set from a large set in R - r

Suppose that I have a set of 10 elements. Suppose that my code is able to choose only 3 elements at a time. Then, I would like it to choose another $3$ elements, however, without selecting the elements that are already selected.
x <- c(4,3,5,6,-2,7,-4,10,22,-12)
Then, suppose that my condition is to select 3 elements that are less than 5. Then,
new_x <- c(4, 3, -2)
Then, I would like to select another 3 elements that are less than 5 but were not selected at the first time. If there is no 3 element then the third element should have value zero.
Hence,
new_xx <- c(-4,-12,0)
Any help, please?

Here is an option using split
f <- function(x, max = 5, n = 3) {
x <- x[x < max]
ret <- split(x, rep(1:(length(x) / n + 1), each = n)[1:length(x)])
lapply(ret, function(w) replace(rep(0, n), 1:length(w), w))
}
f(x)
#$`1`
#[1] 4 3 -2
#
#$`2`
#[1] -4 -12 0
Explanation: We define a custom function that first selects entries < 5, then splits the resulting vector into chunks of length 3 and stores the result in a list, and finally 0-pads those list elements that are vectors of length < 3.
Sample data
x <- c(4,3,5,6,-2,7,-4,10,22,-12)

Related

How can I create a for-loop to count the number of values within a vector that fall between a set boundary?

I'm trying to set an upper and lower boundary of a vector by simply adding and subtracting a set value from each index. I then want to create a loop that tells me for each value (i) in the vector, how many other points within the vector falls within that boundary.
Essentially creating a pseudo-density calculation based on how many values fall within the established range.
I have my vector "v" that contains random values. I then add/subtract three to it to get the upper and lower ranges. But can't create a loop that will count how many other values from that vector fall within that.
v <- c(1, 3, 4, 5, 8, 9, 10, 54)
for (i in v){
vec2 <- (vec +3 > vec[i] & vec -3 < vec[i])
}
vec2
I get NA's from this code.
I've also tried indexing the vec +/- 3 and it also didn't work.
vec2 <- (vec[i] +3 > vec[i] & vec - 3 < vec[i))
What I want is for every "i" value in the vector, I want to know how many points fall within that value + and -3.
i.e. first value being 1: so the upper limit would be 4 and the lower would be -2. I want it to count how many values remaining in the vector, fall within this. Which would be 3 for the first index (if it includes itself).
vec2 = (3, 4, 3, . . . )

Are you looking for something like this? Your code doesn't work because your syntax is incorrect.
vec <- c(1, 3, 4, 5, 8, 9, 10, 54) #Input vector
countvalswithin <- vector() #Empty vector that will store counts of values within bounds
#For loop to cycle through values stored in input vector
for(i in 1:length(vec)){
currval <- vec[i] #Take current value
lbound <- (currval - 3) #Calculate lower bound w.r.t. this value
ubound <- (currval + 3) #Calculate upper bound w.r.t. this value
#Create vector containing all values from source vector except current value
#This will be used for comparison against current value to find values within bounds.
othervals <- subset(vec, vec != currval)
currcount <- 1 #Set to 0 to exclude self; count(er) of values within bounds of current value
#For loop to cycle through all other values (excluding current value) to find values within bounds of current value
for(j in 1:length(othervals)){
#If statement to evaluate whether compared value is within bounds of current value; if it is, counter updates by 1
if(othervals[j] > lbound & othervals[j] <= ubound){
currcount <- currcount + 1
}
}
countvalswithin[i] <- currcount #Append count for current value to a vector
}
df <- data.frame(vec, countvalswithin) #Input vector and respective counts as a dataframe
df
# vec countvalswithin
# 1 1 3
# 2 3 4
# 3 4 3
# 4 5 4
# 5 8 3
# 6 9 3
# 7 10 3
# 8 54 1
Edit: added comments to the code explaining what it does.

In your for loop we can loop over every element in v, create range (-3, +3) and check how many of the elements in v fall within that range and store the result in new vector vec2.
vec2 <- numeric(length = length(v))
for (i in seq_along(v)) {
vec2[i] <- sum((v >= v[i] - 3) & (v <= v[i] + 3))
}
vec2
#[1] 3 4 4 4 4 3 3 1
However, you can avoid the for loop by using mapply
mapply(function(x, y) sum(v >= y & v <= x), v + 3, v - 3)
#[1] 3 4 4 4 4 3 3 1

Monte Carlo Simulation with Replacement Based On Sum of A Column

I am trying to simulate an unlikely situation in a videogame using a Monte Carlo simulation. I'm extremely new at coding and thought this would be a fun situation to simulate.
There are 3 targets and they are being attacked 8 times independently. My problem comes with how to deal with the fact that one of the columns cannot be attacked more than 6 times, when there are 8 attacks.
I would like to take any attack aimed at column 2 select one of the other 2 columns at random to attack instead, but only if column 2 has been attacked 6 times already.
Here is my attempt to simulate with 5000 repeats, for example.
#determine number of repeats
trial <- 5000
#create matrix with a row for each trial
m <- matrix(0, nrow = trial, ncol = 3)
#The first for loop is for each row
#The second for loop runs each attack independently, sampling 1:3 at random, then adding one to that position of the row.
#The function that is called by ifelse() when m[trial, 2] > 6 = TRUE is the issue.
for (trial in 1:trial){
for (attack in 1:8) {
target <- sample(1:3, 1)
m[trial, target] <- m[trial, target] + 1
ifelse(m[trial, 2] > 6, #determines if the value of column 2 is greater than 6 after each attack
function(m){
m[trial, 2] <- m[trial, 2] - 1 #subtract the value from the second column to return it to 6
newtarget <- sample(c(1,3), 1) #select either column 1 or 3 as a new target at random
m[trial, newtarget] <- m[trial, newtarget] + 1 #add 1 to indicate the new target has been selected
m}, #return the matrix after modification
m) #do nothing if the value of the second column is <= 6
}
}
For example, if I have the matrix below:
> matrix(c(2,1,5,7,1,0), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 2 5 1
[2,] 1 7 0
I would like the function to look at the 2nd line of the matrix, subtract 1 from 7, and then add 1 to either column 1 or 3 to create c(2,6,0) or c(1,6,1). I would like to learn how to do this within the loop, but it could be done afterwards as well.
I think I am making serious, fundamental error with how to use function(x) or ifelse.
Thank you.

Here's an improved version of your code:
set.seed(1)
trial <- 5000
#create matrix with a row for each trial
m <- matrix(0, nrow = trial, ncol = 3)
#The first for loop is for each row
#The second for loop runs each attack independently, sampling 1:3 at random, then adding one to that position of the row.
#The function that is called by ifelse() when m[trial, 2] > 6 = TRUE is the issue.
for (i in 1:trial){
for (attack in 1:8) {
target <- sample(1:3, 1)
m[i, target] <- m[i, target] + 1
#determines if the value of column 2 is greater than 6 after each attack
if(m[i, 2] > 6){
#subtract the value from the second column to return it to 6
m[i, 2] <- m[i, 2] - 1
#select either column 1 or 3 as a new target at random
newtarget <- sample(c(1,3), 1)
#add 1 to indicate the new target has been selected
m[i, newtarget] <- m[i, newtarget] + 1
}
}
}
# Notice the largest value in column 2 is no greater than 6.
apply(m, 2, max)
set.seed is used to make the results reproducible (usually just used for testing). The ifelse function has a different purpose than the normal if-else control flow. Here's an example:
x = runif(100)
ifelse(x < 0.5, 0, x)
You'll notice any element in x that is less than 0.5 is now zero. I changed your code to have an if block. Notice that m[i, 2] > 6 returns a single TRUE or FALSE whereas in the small example above, x < 0.5 a vector of logicals is returned. So ifelse can take a vector of logicals, but the if block requires there be only a single logical.
You were on the right track with using function, but it just isn't necessary in this case. Often, but not always, you'll define a function like this:
f = function(x)
x^2
But just returning the value doesn't mean what you want is changed:
x = 5
f(5) # 25
x # still 5
For more on this, look up function scope in R.
Lastly, I changed the loop to be i in 1:trial instead of trial in 1:trial. You probably wouldn't notice any issues in your case, but it is better practice to use a separate variable than that which makes up the range of the loop.
Hope this helps.
P.S. R isn't really known for it's speed when looping. If you want to make things goes faster, you'll typically need to vectorize your code.

Swapping elements between more than 2 arrays

Swapping elements within a single array (x) is a classic problem in computer science. The immediate (but by no means only, e.g., XOR) solution in a low-level language like C is to use a temporary variable:
x[0] = tmp
x[0] = x[1]
x[1] = tmp
The above algorithm swaps the first and second elements of x.
To swap elements between two subarrays, x and y, is similar
x[0] = tmp
x[0] = y[1]
y[1] = tmp
What about for the case of 3 arrays with the added restriction that an element of Array 1 must be swapped with an element of Array 2 and an element of Array 2 must be swapped with an element of Array 3? Elements in Arrays 1 and 3 are not swapped with one another.
How can such an approach (with the added restriction) be generalized to k arrays?

You could create a for-loop that repeats your set of instructions:
l=list(x = c(1,2,3,4,5),y = c(5,4,3,2,1),z = c(6,7,8,9,10))
swap_elements <- function(l)
{
for(i in 1:(length(l)-1))
{
tmp = l[[i]][1]
l[[i]][1] = l[[i+1]][2]
l[[i+1]][2] = tmp
}
return(l)
}
Output:
> swap_elements(l)
$x
[1] 4 2 3 4 5
$y
[1] 7 1 3 2 1
$z
[1] 6 5 8 9 10

if the Arrays are stacked into a matrix, you can lag the rows to create the required action
k <- 6
#generate dummy data with k rows and 3 columns
mat <- matrix(seq_len(3*k), nrow=k, byrow=TRUE)
mat
#lag the matrix
mat[c(seq_len(k)[-1], 1),]

The number of data points in matrix and vector forms

Supposed that X contains 1000 rows with m columns, where m equal to 3 as follows:
set.seed(5)
X <- cbind(rnorm(1000,0,0.5), rnorm(1000,0,0.5), rnorm(1000,0,0.5))
Variable selection is performed, then the condition will be checked before performing the next operation as follows.
if(nrow(X) < 1000){print(a+b)}
,where a is 5 and b is 15, so if nrow(X) < 1000 is TRUE, then 20 will be printed out.
However, in case that X happens to be a vector because only one column is selected,
how can I check the number of data points when X can be either a matrix or vector ?
What I can think of is that
if(is.matrix(X)){
n <- nrow(X)
} else {
n <- length(X)}
if(n < 1000){print(a+b)}
Anyone has a better idea ?
Thank you

You can use NROW for both cases. From ?NROW
nrow and ncol return the number of rows or columns present in x. NCOL and NROW do the same treating a vector as 1-column matrix.
So that means that even if the subset is dropped down to a vector, as long as x is an array, vector, or data frame NROW will treat it as a one-column matrix.
sub1 <- X[,2:3]
is.matrix(sub1)
# [1] TRUE
NROW(sub1)
# [1] 1000
sub2 <- X[,1]
is.matrix(sub2)
# [1] FALSE
NROW(sub2)
# [1] 1000
So if(NROW(X) < 1000L) a + b should work regardless of whether X is a matrix or a vector. I use <= below, since X has exactly 1000 rows in your example.
a <- 5; b <- 15
if(NROW(sub1) <= 1000L) a + b
# [1] 20
if(NROW(sub2) <= 1000L) a + b
# [1] 20
A second option would be to use drop=FALSE when you make the variable selection. This will make the subset remain a matrix when the subset is only one column. This way you can use nrow with no worry. An example of this is
X[, 1, drop = FALSE]

fill up a matrix one random cell at a time

I am filling a 10x10 martix (mat) randomly until sum(mat) == 100
I wrote the following.... (i = 2 for another reason not specified here but i kept it at 2 to be consistent with my actual code)
mat <- matrix(rep(0, 100), nrow = 10)
mat[1,] <- c(0,0,0,0,0,0,0,0,0,1)
mat[2,] <- c(0,0,0,0,0,0,0,0,1,0)
mat[3,] <- c(0,0,0,0,0,0,0,1,0,0)
mat[4,] <- c(0,0,0,0,0,0,1,0,0,0)
mat[5,] <- c(0,0,0,0,0,1,0,0,0,0)
mat[6,] <- c(0,0,0,0,1,0,0,0,0,0)
mat[7,] <- c(0,0,0,1,0,0,0,0,0,0)
mat[8,] <- c(0,0,1,0,0,0,0,0,0,0)
mat[9,] <- c(0,1,0,0,0,0,0,0,0,0)
mat[10,] <- c(1,0,0,0,0,0,0,0,0,0)
i <- 2
set.seed(129)
while( sum(mat) < 100 ) {
# pick random cell
rnum <- sample( which(mat < 1), 1 )
mat[rnum] <- 1
##
print(paste0("i =", i))
print(paste0("rnum =", rnum))
print(sum(mat))
i = i + 1
}
For some reason when sum(mat) == 99 there are several steps extra...I would assume that once i = 91 the while would stop but it continues past this. Can somone explain what I have done wrong...
If I change the while condition to
while( sum(mat) < 100 & length(which(mat < 1)) > 0 )
the issue remains..

Your problem is equivalent to randomly ordering the indices of a matrix that are equal to 0. You can do this in one line with sample(which(mat < 1)). I suppose if you wanted to get exactly the same sort of output, you might try something like:
set.seed(144)
idx <- sample(which(mat < 1))
for (i in seq_along(idx)) {
print(paste0("i =", i))
print(paste0("rnum =", idx[i]))
print(sum(mat)+i)
}
# [1] "i =1"
# [1] "rnum =5"
# [1] 11
# [1] "i =2"
# [1] "rnum =70"
# [1] 12
# ...

See ?sample
Arguments:
x: Either a vector of one or more elements from which to choose,
or a positive integer. See ‘Details.’
...
If ‘x’ has length 1, is numeric (in the sense of ‘is.numeric’) and
‘x >= 1’, sampling _via_ ‘sample’ takes place from ‘1:x’. _Note_
that this convenience feature may lead to undesired behaviour when
‘x’ is of varying length in calls such as ‘sample(x)’. See the
examples.
In other words, if x in sample(x) is of length 1, sample returns a random number from 1:x. This happens towards the end of your loop, where there is just one 0 left in your matrix and one index is returned by which(mat < 1).

The iteration repeats on level 99 because sample() behaves very differently when the first parameter is a vector of length 1 and when it is greater than 1. When it is length 1, it assumes you a random number from 1 to that number. When it has length >1, then you get a random number from that vector.
Compare
sample(c(99,100),1)
and
sample(c(100),1)
Of course, this is an inefficient way of filling your matrix. As #josilber pointed out, a single call to sample could do everything you need.

The issue comes from how sample and which do the sampling when you have only a single '0' value left.
For example, do this:
mat <- matrix(rep(1, 100), nrow = 10)
Now you have a matrix of all 1's. Now lets make two numbers 0:
mat[15]<-0
mat[18]<-0
and then sample
sample(which(mat<1))
[1] 18 15
by adding a size=1 argument you get one or the other
now lets try this:
mat[18]<-1
sample(which(mat<1))
[1] 3 13 8 2 4 14 11 9 10 5 15 7 1 12 6
Oops, you did not get [1] 15 . Instead what happens in only a single integer (15 in this case) is passed tosample. When you do sample(x) and x is an integer, it gives you a sample from 1:x with the integers in random order.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to exclusive a set from a large set in R - r

Related

How can I create a for-loop to count the number of values within a vector that fall between a set boundary?

Monte Carlo Simulation with Replacement Based On Sum of A Column

Swapping elements between more than 2 arrays

The number of data points in matrix and vector forms

fill up a matrix one random cell at a time

Categories

Resources