Putting generated data in a matrix format - r

I have one question about putting a simulated data in a matrix format, but I cannot suitably write its program in R, and constantly receive an error, I guess my "rep" definition and final "Matrix" expression are somehow wrong, but I do not know how to fix them. Here my specific question is:
I would like to produce a matrix contains generated values. I have 20000 generated values for x and y. As the output, I like to have a (2000 by 10) matrix that each column of the matrix contains the output of following for loop.
My R.code:
x=rnorm(2e4,5,6)
vofdiv=quantile(x,probs=seq(0,1,0.1))
y=rnorm(2e4,4,6)
Matrix=rep(NULL,2000)
for(i in 1:10)
{
Matrix[i]=y[(x>=vofdiv[i] & x<vofdiv[i+1])] #The i(th) col of matrix
}
Matrix # A 2000*10 Matrix, as the final output
I highly appreciate that someone helps me!

You have several problems here.
First of all, the correct way to define an empty matrix of size 2e4*10, would be
Matrix <- matrix(NA, 2e4, 10)
Although you could potentially create a matrix using your way(rep) and then use dim, something like
Matrix <- rep(NA, 2e5)
dim(Matrix) <- c(2e4, 10)
Second problem is, when trying to insert into a column in a matrix, you need to index it correctly, i.e.,
Matrix[, i] <-
instead of
Matrix[i] <-
The latter will index Matrix as if it was a vector (which is it basically is). In other words, it will convert a 2000*10 matrix to a 20000 length single vector and index it.
The third problem is, that when your loop reaches i = 11 and you are running x<vofdiv[i+1] you are always excluding the last values which are x == vofdiv[11], thus you are always getting less than 2000 values:
for(i in 1:10)
{
print(length(y[ (x >= vofdiv[i] & x < vofdiv[i+1])]))
}
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 2000
# [1] 1999 <----
Thus, it will give you an error if you will try to replace 2000 length vector with 1999 length one, because a matrix in R can't contain different dimensions for each column.
The workaround would be to add = to your last statement, such as
Matrix <- matrix(NA, 2e4, 10)
for(i in 1:10)
{
Matrix[, i] <- y[x >= vofdiv[i] & x <= vofdiv[i + 1]]
}

Related

Monte Carlo Simulation with Replacement Based On Sum of A Column

I am trying to simulate an unlikely situation in a videogame using a Monte Carlo simulation. I'm extremely new at coding and thought this would be a fun situation to simulate.
There are 3 targets and they are being attacked 8 times independently. My problem comes with how to deal with the fact that one of the columns cannot be attacked more than 6 times, when there are 8 attacks.
I would like to take any attack aimed at column 2 select one of the other 2 columns at random to attack instead, but only if column 2 has been attacked 6 times already.
Here is my attempt to simulate with 5000 repeats, for example.
#determine number of repeats
trial <- 5000
#create matrix with a row for each trial
m <- matrix(0, nrow = trial, ncol = 3)
#The first for loop is for each row
#The second for loop runs each attack independently, sampling 1:3 at random, then adding one to that position of the row.
#The function that is called by ifelse() when m[trial, 2] > 6 = TRUE is the issue.
for (trial in 1:trial){
for (attack in 1:8) {
target <- sample(1:3, 1)
m[trial, target] <- m[trial, target] + 1
ifelse(m[trial, 2] > 6, #determines if the value of column 2 is greater than 6 after each attack
function(m){
m[trial, 2] <- m[trial, 2] - 1 #subtract the value from the second column to return it to 6
newtarget <- sample(c(1,3), 1) #select either column 1 or 3 as a new target at random
m[trial, newtarget] <- m[trial, newtarget] + 1 #add 1 to indicate the new target has been selected
m}, #return the matrix after modification
m) #do nothing if the value of the second column is <= 6
}
}
For example, if I have the matrix below:
> matrix(c(2,1,5,7,1,0), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 2 5 1
[2,] 1 7 0
I would like the function to look at the 2nd line of the matrix, subtract 1 from 7, and then add 1 to either column 1 or 3 to create c(2,6,0) or c(1,6,1). I would like to learn how to do this within the loop, but it could be done afterwards as well.
I think I am making serious, fundamental error with how to use function(x) or ifelse.
Thank you.
Here's an improved version of your code:
set.seed(1)
trial <- 5000
#create matrix with a row for each trial
m <- matrix(0, nrow = trial, ncol = 3)
#The first for loop is for each row
#The second for loop runs each attack independently, sampling 1:3 at random, then adding one to that position of the row.
#The function that is called by ifelse() when m[trial, 2] > 6 = TRUE is the issue.
for (i in 1:trial){
for (attack in 1:8) {
target <- sample(1:3, 1)
m[i, target] <- m[i, target] + 1
#determines if the value of column 2 is greater than 6 after each attack
if(m[i, 2] > 6){
#subtract the value from the second column to return it to 6
m[i, 2] <- m[i, 2] - 1
#select either column 1 or 3 as a new target at random
newtarget <- sample(c(1,3), 1)
#add 1 to indicate the new target has been selected
m[i, newtarget] <- m[i, newtarget] + 1
}
}
}
# Notice the largest value in column 2 is no greater than 6.
apply(m, 2, max)
set.seed is used to make the results reproducible (usually just used for testing). The ifelse function has a different purpose than the normal if-else control flow. Here's an example:
x = runif(100)
ifelse(x < 0.5, 0, x)
You'll notice any element in x that is less than 0.5 is now zero. I changed your code to have an if block. Notice that m[i, 2] > 6 returns a single TRUE or FALSE whereas in the small example above, x < 0.5 a vector of logicals is returned. So ifelse can take a vector of logicals, but the if block requires there be only a single logical.
You were on the right track with using function, but it just isn't necessary in this case. Often, but not always, you'll define a function like this:
f = function(x)
x^2
But just returning the value doesn't mean what you want is changed:
x = 5
f(5) # 25
x # still 5
For more on this, look up function scope in R.
Lastly, I changed the loop to be i in 1:trial instead of trial in 1:trial. You probably wouldn't notice any issues in your case, but it is better practice to use a separate variable than that which makes up the range of the loop.
Hope this helps.
P.S. R isn't really known for it's speed when looping. If you want to make things goes faster, you'll typically need to vectorize your code.

Finding sum of number nearest to specific number

I have following vector of numbers in r
bay_no <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
bay_cont <- c(45,25,25,0,19,61,2,134,5,27,0,54,102,97,5,6,65,47,85,0)
count <- 3
bay_to_serve <- sum(bay_cont)/count
In above bay_cont vector I want to find sum which will be close to bay_to_serve in above case bay_to_serve = 268
Now, from (45 till 2) sum is 177 and (45 till 134) sum is 311,so 311 is closest to 268 then it should return the index of i.e 8 from bay_no
We will get one vector from bay_no = 1-8
Again starting from bay_cont from 5 till the sum close to 268
Desired output is
bay_no 1-8,9-14 and then remaining bay_nos
How can we do it in r?
Dunno if there is a smart way to do but I'd think of nested loops.
Your inner loop may look like this (Please note that I have no access to R right now, so I can't test it.):
old_sum = bay_count[1]
for(i in 2:length(by_cont)) {
new_sum <- sum (bay_count[1:i])
if (abs(bay_to_serve - new_sum) < abs(bay_to_serve - old_sum)) {
output <- paste("bay_no", paste(1,i, sep="-"), sep=" ") break
}else{
old_sum <- new_sum
}
}
This way, whenever the sum of the first X entries is smaller than the previous sum, it will break the loop and create an output string. Just add another loop around the first loop and one or to more if statements to run from j:length(by_cont), whereby j is first set to 1 and will be set to i+1 within the inner loop.
You can try:
res <- NULL
i = 1
while(i < length(bay_cont)){
tmp <- which.min(abs(cumsum(bay_cont[i:length(bay_cont)]) - bay_to_serve))
res <- append(res,tmp)
i = tmp + i
}
cumsum(res)
[1] 8 14 19
If you want to break ties specifically you can use rank together with which.min like follows:
which.min(rank(abs(cumsum(bay_cont[i:length(bay_cont)]) - bay_to_serve), ties.method = "last"))
Then I would create a matrix instead of pasting it together:
cbind(c(1, cumsum(res)[-length(cumsum(res))]+1), cumsum(res))
[,1] [,2]
[1,] 1 8
[2,] 9 14
[3,] 15 19
Of course you can paste it together as well:
apply(cbind(c(1, cumsum(res)[-length(cumsum(res))]+1), cumsum(res)), 1, paste, collapse="-")
[1] "1-8" "9-14" "15-19"
My solution uses a dirty for loop but yields the required indizes...
Hope that fits to you?
bay_no <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
bay_cont <- c(45,25,25,0,19,61,2,134,5,27,0,54,102,97,5,6,65,47,85,0)
count <- 3
bay_to_serve <- sum(bay_cont)/count
temp_sum=0
for (i in 1:(length(bay_cont)-1)) {
temp_sum=temp_sum+bay_cont[i]
if ( abs(bay_to_serve-temp_sum)<abs(bay_to_serve-(temp_sum +bay_cont[i+1]))) {
print(i)
temp_sum=0
}
}
I probably misunderstand the question, but it seems more easy to do this:
bay_no[ which.min(abs(cumsum(bay_cont) - bay_to_serve)) ]
To start at 5, omit elements 1:4 and add 4 to the which.min index
bay_no[ which.min(abs(cumsum(bay_cont[-(1:4)]) - bay_to_serve))+4 ]

Can I further vectorize this function

I am relatively new to R, and matrix-based scripting languages in general. I have written this function to return the index's of each row which has a content similar to any another row's content. It is a primitive form of spam reduction that I am developing.
if (!require("RecordLinkage")) install.packages("RecordLinkage")
library("RecordLinkage")
# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
threshold <- 0.8
values <- NULL
for(i in 1:length(x)) {
values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
}
return(values)
}
is there a way that I could write this to avoid the for loop entirely?
We can simplify the code somewhat using sapply.
# some test data #
x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx')
# create an x by x matrix specifying which strings are alike
m = sapply(x, jarowinkler, x) > threshold
# set diagonal to FALSE: we're not interested in strings being identical to themselves
diag(m) = FALSE
# And find index positions of all strings that are similar to at least one other string
which(rowSums(m) > 0)
# [1] 1 2 4 5
I.e. this returns the index positions of 'hello', 'hollow', 'turtle', and 'bottle' as being similar to another string
If you prefer, you can use colSums instead of rowSums to get a named vector, but this could be messy if the strings are long:
which(colSums(m) > 0)
# hello hollow turtle bottle
# 1 2 4 5

indexing through values of a nested list using mapply

I have a list of lists, with each sub-list containing 3 values. My goal is to cycle through every value of this nested list in a systematic way (i.e. start with list 1, go through all 3 values, go to list 2, and so on), applying a function to each. But my function hits missing values and breaks and I've traced the problem to the indexing itself, which doesn't behave in the way I am expecting. The lists are constructed as:
pop <- 1:100
treat.temp <- NULL
treat <- NULL
## Generate 5 samples of pop
for (i in 1:5){
treat.temp <- sample(pop, 3)
treat[[i]] <- treat.temp
}
## Create a list with which to index mapply
iterations <- (1:5)
Illustrative function and results.
test.function <- function(j, k){
for (n in 1:3){
print(k[[n]][j])
}
}
results <- mapply(test.function, iterations, treat)
[1] 61
[1] 63
[1] 73
[1] NA
[1] NA
[1] NA
[1] NA
[1] NA
<snipped>
For the first cycle through 'j', this works. But after that it throws NAs. But if I do it manually, it returns the values I would expect.
> print(treat[[1]][1])
[1] 61
> print(treat[[1]][2])
[1] 63
> print(treat[[1]][3])
[1] 73
> print(treat[[2]][1])
[1] 59
> print(treat[[2]][2])
[1] 6
> print(treat[[2]][3])
[1] 75
<snipped>
I'm sure this is a basic question, but I can't seem to find the right search terms to find an answer here or on Google. Thanks in advance!
Edited to Add: MrFlick's answer works well for my problem. I have multiple list inputs (hence mapply) in my actual use. A more detailed example, with a few notes.
pop <- 1:100
years <- seq.int(2000, 2014, 1)
treat.temp <- NULL
treat <- NULL
year.temp <- NULL
year <- NULL
## Generate 5 samples of treated states, control states and treatment years
for (i in 1:5){
treat.temp <- sample(pop, 20)
treat[[i]] <- treat.temp
year.temp <- sample(years, 1)
year[[i]] <- year.temp
}
## Create a list with which to index mapply
iterations <- (1:5)
## Define function
test.function <- function(j, k, l){
for (n in 1:3){
## Cycles treat through each value of jXn
print(k[n])
## Holds treat (k) fixed for each 3 cycle set of n (using first value in each treat sub-list); cycles through sub-lists as j changes
print(k[1])
## Same as above, but with 2nd value in each sub-list of treat
print(k[2])
## Holds year (l) fixed for each 3 cycle set of n, cycling through values of year each time j changes
print(l[1])
## Functionally equivalent to
print(l)
}
}
results <- mapply(test.function, iterations, treat, year)
Well, you might be misunderstanding how mapply works. The function will loop through both of the iterations you pass as parameters, which means treat will also be subset each iteration. Essentially, the functions being called are
test.function(iterations[1], treat[[1]])
test.function(iterations[2], treat[[2]])
test.function(iterations[3], treat[[3]])
...
and you seem to treat the k variable as if it were the entire list. Also, you have your indexes backwards as well. But just to get your test working, you can do
test.function <- function(j, k){
for (n in 1:3) print(k[n])
}
results <- mapply(test.function, iterations, treat)
but this isn't really a super awesome way to iterate a list. What exactly are you trying to accomplish?

fill up a matrix one random cell at a time

I am filling a 10x10 martix (mat) randomly until sum(mat) == 100
I wrote the following.... (i = 2 for another reason not specified here but i kept it at 2 to be consistent with my actual code)
mat <- matrix(rep(0, 100), nrow = 10)
mat[1,] <- c(0,0,0,0,0,0,0,0,0,1)
mat[2,] <- c(0,0,0,0,0,0,0,0,1,0)
mat[3,] <- c(0,0,0,0,0,0,0,1,0,0)
mat[4,] <- c(0,0,0,0,0,0,1,0,0,0)
mat[5,] <- c(0,0,0,0,0,1,0,0,0,0)
mat[6,] <- c(0,0,0,0,1,0,0,0,0,0)
mat[7,] <- c(0,0,0,1,0,0,0,0,0,0)
mat[8,] <- c(0,0,1,0,0,0,0,0,0,0)
mat[9,] <- c(0,1,0,0,0,0,0,0,0,0)
mat[10,] <- c(1,0,0,0,0,0,0,0,0,0)
i <- 2
set.seed(129)
while( sum(mat) < 100 ) {
# pick random cell
rnum <- sample( which(mat < 1), 1 )
mat[rnum] <- 1
##
print(paste0("i =", i))
print(paste0("rnum =", rnum))
print(sum(mat))
i = i + 1
}
For some reason when sum(mat) == 99 there are several steps extra...I would assume that once i = 91 the while would stop but it continues past this. Can somone explain what I have done wrong...
If I change the while condition to
while( sum(mat) < 100 & length(which(mat < 1)) > 0 )
the issue remains..
Your problem is equivalent to randomly ordering the indices of a matrix that are equal to 0. You can do this in one line with sample(which(mat < 1)). I suppose if you wanted to get exactly the same sort of output, you might try something like:
set.seed(144)
idx <- sample(which(mat < 1))
for (i in seq_along(idx)) {
print(paste0("i =", i))
print(paste0("rnum =", idx[i]))
print(sum(mat)+i)
}
# [1] "i =1"
# [1] "rnum =5"
# [1] 11
# [1] "i =2"
# [1] "rnum =70"
# [1] 12
# ...
See ?sample
Arguments:
x: Either a vector of one or more elements from which to choose,
or a positive integer. See ‘Details.’
...
If ‘x’ has length 1, is numeric (in the sense of ‘is.numeric’) and
‘x >= 1’, sampling _via_ ‘sample’ takes place from ‘1:x’. _Note_
that this convenience feature may lead to undesired behaviour when
‘x’ is of varying length in calls such as ‘sample(x)’. See the
examples.
In other words, if x in sample(x) is of length 1, sample returns a random number from 1:x. This happens towards the end of your loop, where there is just one 0 left in your matrix and one index is returned by which(mat < 1).
The iteration repeats on level 99 because sample() behaves very differently when the first parameter is a vector of length 1 and when it is greater than 1. When it is length 1, it assumes you a random number from 1 to that number. When it has length >1, then you get a random number from that vector.
Compare
sample(c(99,100),1)
and
sample(c(100),1)
Of course, this is an inefficient way of filling your matrix. As #josilber pointed out, a single call to sample could do everything you need.
The issue comes from how sample and which do the sampling when you have only a single '0' value left.
For example, do this:
mat <- matrix(rep(1, 100), nrow = 10)
Now you have a matrix of all 1's. Now lets make two numbers 0:
mat[15]<-0
mat[18]<-0
and then sample
sample(which(mat<1))
[1] 18 15
by adding a size=1 argument you get one or the other
now lets try this:
mat[18]<-1
sample(which(mat<1))
[1] 3 13 8 2 4 14 11 9 10 5 15 7 1 12 6
Oops, you did not get [1] 15 . Instead what happens in only a single integer (15 in this case) is passed tosample. When you do sample(x) and x is an integer, it gives you a sample from 1:x with the integers in random order.

Resources