Summing elements of a vector in R - r

I'm trying to figure out how to add each respective component in a vector and store that in another vector. This is what I have so far:
# Create a function to roll a die n times.
RollDie = function(n) sample(1:6, n, rep=T)
die1 = RollDie(500)
die2 = RollDie(500)
die3 = RollDie(500)
die4 = RollDie(500)
die5 = RollDie(500)
die6 = RollDie(500)
# Sum the values of the first component of each vector which represent the values
# of the six die rolled.
X = sum(die1[1], die2[1], die3[1], die4[1], die5[1], die6[1])
X
What I'm trying to do is sum the first, second, etc components of die 1 through 6.
So, the first component of X will be
sum(die1[1], die2[1], die3[1], die4[1], die5[1], die6[1])
the second component of X will be
sum(die1[2], die2[2], die3[2], die4[2], die5[2], die6[2])
the third component of X will be
sum(die1[3], die2[3], die3[3], die4[3], die5[3], die6[3])
and so on. X will have a length of 500.
I'm trying to find the appropriate command, but not having any luck. Please help. Thanks!

A possible solution with a vectorized approach:
rowSums(replicate(6, RollDie(500)))

Related

In R, How to apply if statement in matrix

recently I am trying to mimic a game.
I am going to throw 2 dice at the same time. If the sum of 2 dice is greater than or equals to 10, I win 1 point.
If it is lower than 10, I lose 1 point. I will do this for 1000 times.
At the very beginning, I draw 2000 random samples with set.seed (1234)
set.seed(1234)
d = sample(c(1:6), size = 2000, replace = T)
d
And then, I turn it into a matrix, and sum each row
a = matrix(d, nrow=1000, ncol=2, byrow=T)
t = rowSums(a)
t
Now, I have 1000 elements (sum of two dice each time). I would like to create a vector X to calculate the point that I can get.
However, how can I apply if statement to create vector X in this time?
Thank you very much
Do you mean this?
X <- ifelse(t>=10,1,-1)
or
X <- 2*(t>=10)-1
Using case_when
library(dplyr)
case_when(t >= 10 ~ 1, TRUE ~ -1)
You could assign a temporary variable and assign points by comparing the values.
tmp <- t
t[tmp >= 10] <- 1
t[tmp < 10] <- -1
Or without a temporary variable.
t1 <- c(-1, 1)[(t >= 10) + 1]

A number turn into a vector in R and the answer turn out to be the sum of components of the vector

UrnA =rep(c(10,5,1),c(5,5,5))
UrnB =rep(c(20,5,1),c(9,3,3))
n=1e3
sum=0
for( i in 1:n ){
dice=sample(1:6,1)
sum=sum+(dice<=4)*sample(UrnA,2,replace = FALSE)+(dice>=5)*sample(UrnB,2,replace = FALSE)
}
E=sum/n
I want to use the sentences above to solve the problem below.
"Urn A contains 5 $10 bills, 5 $5 bills, and 5 $1 bills.
Urn B contains 9 $20 bills, 3 $5 bills, and 3 $1 bills.
A dice is thrown. If it lands on 1,2,3, or 4, two bills are drawn from Urn A (without replacement),
Otherwise two bills are drawn from Urn B. Let X = the total value of the bills drawn.
(a) Use simulations to estimate E[X]."
And the problem is that,when I run the sentence the sum turn out to be a array with two components which really makes me confused.And I calculate it myself and the sum of each components of sum turn out to be the right answer . enter image description here
You can avoid the for loop if you consider that rolling a single die n times is the same as rolling n dice once.
UrnA <- rep(c(10,5,1), c(5,5,5))
UrnB <- rep(c(20,5,1), c(9,3,3))
n <- 1e3
set.seed(2018);
sum(as.integer(sapply(sample(1:6, n, replace = T), function(x)
if (x <= 4) sample(UrnA, 2) else sample(UrnB, 2))))
#[1] 15818
I'm using a fixed seed here for reproducibility; remove if necessary.
We can confirm convergence by repeating the process 1000 times
val <- sapply(1:1000, function(x)
sum(as.integer(sapply(sample(1:6, n, replace = T), function(x)
if (x <= 4) sample(UrnA, 2) else sample(UrnB, 2)))))
ggplot(data.frame(idx = 1:1000, val = val), aes(idx, val)) +
geom_point() +
ylim(0, pretty(max(val))[2])
Both your sample function will return a set of two values. You need to sum their components.
# Instead of
sum=sum+(dice<=4)*sample(UrnA,2,replace = FALSE)+(dice>=5)*sample(UrnB,2,replace = FALSE)
#Use:
sum=sum+sum((dice<=4)*sample(UrnA,2,replace = FALSE)+(dice>=5)*sample(UrnB,2,replace = FALSE))

Sampling from a subset of data

I have the following problem.
I have multiple subarrays (say 2) that I have populated with character labels (1, 2, 3, 4, 5). My algorithm selects labels at random based on occurrence probabilities.
How can I get R to instead select labels 1:3 for subarray 1 and 4:5 for subarray 2, say, without using subsetting (i.e., []). That is, I want a random subset of labels to be selected for each subarray, instead of all labels assigned to each subarray manually using [].
I know sample() should help.
Using subsetting (which I don't want) one would do
x <- 1:5
sample(x[1:3], size, prob = probs[1:3])
but this assigns labels 1:3 to ALL subarrays.
Would
sample(sample(x), size, replace = TRUE, prob = probs)
work?
Any ideas? Please let me know if this is unclear.
Here is a small example, which selects labels from 1:5 for each of 10 subarrays.
set.seed(1)
N <- 10
K <- 2
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
perms <- 5
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs) # I would like each subarray to contain a random subset of 1:5.
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
pop
Hopefully this helps.
I think what you actually want is something like that
num.specs <- 3
haps[sample(seq(haps),size = num.specs,replace = F)]
[1] 3 5 4
That is a random subset of your vector haps ?
Not quite what you want (returns list of matrices instead of 3D array) but this might help
lapply(split(1:5, cut(1:5, breaks=c(0, 2, 5))), function(i) matrix(sample(i, 25, replace=TRUE), ncol=5))
Use cut and split to partition your vector of character labels before sampling them. Here I split your character labels at the value 2. Also, rather than sampling 5 numbers 5 times, you can sample 25 numbers once, and convert to matrix.

Merging two vectors at random in R

I have two vectors x and y. x is a larger vector compared to y. For example (x is set to all zeros here, but that need not be the case)
x = rep(0,20)
y = c(2,3,-1,-1)
What I want to accomplish is overlay some y's in x but at random. So in the above example, x would look like
0,0,2,3,-1,-1,0,0,0,0,2,3,-1,-1,...
Basically, I'll step through each value in x, pull a random number, and if that random number is less than some threshold, I want to overlay y for the next 4 places in x unless I've reached the end of x. Would any of the apply functions help? Thanks much in advance.
A simple way of doing it would be to choose points at random (the same length as x) from the two vectors combined:
sample(c(x, y), length(x), replace = TRUE)
If you want to introduce some probability into it, you could do something like:
p <- c(rep(2, each = length(x)), rep(1, each = length(y)))
sample(c(x, y), length(x), prob = p, replace = TRUE)
This is saying that an x point is twice as likely to be chosen over a y point (change the 2 and 1 in p accordingly for different probabilities).
Short answer: yes :-) . Write some function like
ranx <- runif(length(x)-length(y)+1)
# some loop or apply func...
if (ranx[j] < threshold) x[j:j+length(y)] <- y
# and make sure to stop the loop at length(y)-length(x)
Something like the following worked for me.
i = 1
while(i <= length(x)){
p.rand = runif(1,0,1)
if(p.rand < prob[i]){
p[i:(i+length(y))] = y
i = i+length(y)
}
i = i + 1
}
where prob[i] is some probability vector.

R filter() dealing with NAs

I am trying to implement Chebyshev filter to smooth a time series but, unfortunately, there are NAs in the data series.
For example,
t <- seq(0, 1, len = 100)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
I am using Chebyshev filter: cf1 = cheby1(5, 3, 1/44, type = "low")
I am trying to filter the time series exclude NAs, but not mess up the orders/position. So, I have already tried na.rm=T, but it seems there's no such argument.
Then
z <- filter(cf1, x) # apply filter
Thank you guys.
Try using x <- x[!is.na(x)] to remove the NAs, then run the filter.
You can remove the NAs beforehand using the compelete.cases function. You also might consider imputing the missing data. Check out the mtsdi or Amelia II packages.
EDIT:
Here's a solution with Rcpp. This might be helpful is speed is important:
require(inline)
require(Rcpp)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
NAs <- x
x2 <- x[!is.na(x)]
#do something to x2
src <- '
Rcpp::NumericVector vecX(vx);
Rcpp::NumericVector vecNA(vNA);
int j = 0; //counter for vx
for (int i=0;i<vecNA.size();i++) {
if (!(R_IsNA(vecNA[i]))) {
//replace and update j
vecNA[i] = vecX[j];
j++;
}
}
return Rcpp::wrap(vecNA);
'
fun <- cxxfunction(signature(vx="numeric",
vNA="numeric"),
src,plugin="Rcpp")
if (identical(x,fun(x2,NAs)))
print("worked")
# [1] "worked"
I don't know if ts objects can have missing values, but if you just want to re-insert the NA values, you can use ?insert from R.utils. There might be a better way to do this.
install.packages(c('R.utils', 'signal'))
require(R.utils)
require(signal)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA, NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA)
cf1 = cheby1(5, 3, 1/44, type = "low")
xex <- na.omit(x)
z <- filter(cf1, xex) # apply
z <- as.numeric(z)
for (m in attributes(xex)$na.action) {
z <- insert(z, ats = m, values = NA)
}
all.equal(is.na(z), is.na(x))
?insert
Here is a function you can use to filter a signal with NAs in it.
The NAs are ignored rather than replaced by zero.
You can then specify a maximum percentage of weight which the NAs may take at any point of the filtered signal. If there are too many NAs (and too few actual data) at a specific point, the filtered signal itself will be set to NA.
# This function applies a filter to a time series with potentially missing data
filter_with_NA <- function(x,
window_length=12, # will be applied centrally
myfilter=rep(1/window_length,window_length), # a boxcar filter by default
max_percentage_NA=25) # which percentage of weight created by NA should not be exceeded
{
# make the signal longer at both sides
signal <- c(rep(NA,window_length),x,rep(NA,window_length))
# see where data are present and not NA
present <- is.finite(signal)
# replace the NA values by zero
signal[!is.finite(signal)] <- 0
# apply the filter
filtered_signal <- as.numeric(filter(signal,myfilter, sides=2))
# find out which percentage of the filtered signal was created by non-NA values
# this is easy because the filter is linear
original_weight <- as.numeric(filter(present,myfilter, sides=2))
# where this is lower than one, the signal is now artificially smaller
# because we added zeros - compensate that
filtered_signal <- filtered_signal / original_weight
# but where there are too few values present, discard the signal
filtered_signal[100*(1-original_weight) > max_percentage_NA] <- NA
# cut away the padding to left and right which we previously inserted
filtered_signal <- filtered_signal[((window_length+1):(window_length+length(x)))]
return(filtered_signal)
}

Resources