recently I am trying to mimic a game.
I am going to throw 2 dice at the same time. If the sum of 2 dice is greater than or equals to 10, I win 1 point.
If it is lower than 10, I lose 1 point. I will do this for 1000 times.
At the very beginning, I draw 2000 random samples with set.seed (1234)
set.seed(1234)
d = sample(c(1:6), size = 2000, replace = T)
d
And then, I turn it into a matrix, and sum each row
a = matrix(d, nrow=1000, ncol=2, byrow=T)
t = rowSums(a)
t
Now, I have 1000 elements (sum of two dice each time). I would like to create a vector X to calculate the point that I can get.
However, how can I apply if statement to create vector X in this time?
Thank you very much
Do you mean this?
X <- ifelse(t>=10,1,-1)
or
X <- 2*(t>=10)-1
Using case_when
library(dplyr)
case_when(t >= 10 ~ 1, TRUE ~ -1)
You could assign a temporary variable and assign points by comparing the values.
tmp <- t
t[tmp >= 10] <- 1
t[tmp < 10] <- -1
Or without a temporary variable.
t1 <- c(-1, 1)[(t >= 10) + 1]
Related
I have a data frame with 30 rows and 4 columns (namely, x, y, z, u). It is given below.
mydata = data.frame(x = rnorm(30,4), y = rnorm(30,2,1), z = rnorm(30,3,1), u = rnorm(30,5))
Further, I have a sequence values, which represent row number in my data frame.
myseq = c(seq(1, 30, by = 5))
myseq
[1] 1 6 11 16 21 26
Now, I wanted to compute the prob values for each segment of 99 rows.
filt= subset(mydata[1:6,], mydata[1:6,]$x < mydata[1:6,]$y & mydata[1:6,]$z < mydata[1:6,]$u
filt
prob = length(filt$x)/30
prob
Then I need to compute the above prob for 1:6,.., 27:30 and so on . Here, I have only 6 prob values. So, I can do one by one. If I have 100 values it would be tedious. Are there any way to compute the prob values?.
Thank you in advance.
BTW: in subset(DF[1:99,], ...), use DF[1:99,] in the first argument, not again, ala
subset(DF[1:99,], cumsuml < inchivaluel & cumsumr < inchivaluer)
Think about how to do this in a list.
The first step is to break your data into the va starting points. I'll start with a list of the indices to break it into:
inds <- mapply(seq, va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
this now is a list of sequences, starting with 1:99, then 100:198, etc. See str(inds) to verify.
Now we can subset a portion of the data based on each element's vector of indices:
filts <- lapply(inds, function(ind) subset(DF[ind,], cumsuml < inchivaluel & cumsumr < inchivaluer))
We now have a list of vectors, let's summarize it:
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))
Bottom line, it helps to think about how to break this problem into lists, examples at http://stackoverflow.com/a/24376207/3358272.
BTW: instead of initially making a list of indices, we could just break up the data in that first step, ala
DF2 <- mapply(function(a,b) DF[a:b,], va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
filts <- lapply(DF2, function(x) subset(x, cumsuml < inchivaluel & cumsumr < inchivaluer))
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))
I am working on Spike Trains and my code to get a spike train like this:
for 20 trials is written below. The image is representational for 5 trials.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #10msSpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
{
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
}
Each trial has spikes occuring at random time points. Now what I am trying to work towards, is getting a random sample time point that works for all 20 trials and I want to get the vector consisting of length of the intervals this point falls into, for each trial. The code to get the time vector for the points where the spikes occur is,
A <- numeric()
for (i in 1: nTrials)
{
ISI <- function(i){
spike_times <- c(dt*which( SpikeMat[i, ]==1))
ISI1vec <- c(diff(spike_times))
A <- c(A, ISI1vec)
return(A)}
}
Then you call ISI(i) for whichever trial you wish to see the Interspike interval vector for. A visual representation of what I want is:
I want to get a vector that has the lengths of the interval where this points fall into, for each trial. I want to figure out it's distribution as well, but that's for later. Can anybody help me figure out how to code my way to this? Any help is appreciated, even if it's just about how to start/where to look.
Your data
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
I suggest transforming your sparse matrix data into a list of indices where spikes occur
L <- lapply(seq_len(nrow(SpikeMat)), function(i) setNames(which(SpikeMat[i, ] == 1), seq_along(which(SpikeMat[i, ] == 1))))
Grab random timepoint
set.seed(1)
RT <- round(runif(1) * ncol(SpikeMat))
# 531
Result
distances contains the distances to the 2 nearest spikes - each element of the list is a named vector where the values are the distances (to RT) and their names are their positions in the vector. nearest_columns shows the original timepoint (column number) of each spike in SpikeMat.
bookend_values <- function(vec) {
lower_val <- head(sort(vec[sign(vec) == 1]), 1)
upper_val <- head(sort(abs(vec[sign(vec) == -1])), 1)
return(c(lower_val, upper_val))
}
distances <- lapply(L, function(i) bookend_values(RT-i))
nearest_columns <- lapply(seq_along(distances), function(i) L[[i]][names(distances[[i]])])
Note that the inter-spike interval of the two nearest spikes that bookend RT can be obtained with
sapply(distances, sum)
I have the following problem.
I have multiple subarrays (say 2) that I have populated with character labels (1, 2, 3, 4, 5). My algorithm selects labels at random based on occurrence probabilities.
How can I get R to instead select labels 1:3 for subarray 1 and 4:5 for subarray 2, say, without using subsetting (i.e., []). That is, I want a random subset of labels to be selected for each subarray, instead of all labels assigned to each subarray manually using [].
I know sample() should help.
Using subsetting (which I don't want) one would do
x <- 1:5
sample(x[1:3], size, prob = probs[1:3])
but this assigns labels 1:3 to ALL subarrays.
Would
sample(sample(x), size, replace = TRUE, prob = probs)
work?
Any ideas? Please let me know if this is unclear.
Here is a small example, which selects labels from 1:5 for each of 10 subarrays.
set.seed(1)
N <- 10
K <- 2
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
perms <- 5
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs) # I would like each subarray to contain a random subset of 1:5.
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
pop
Hopefully this helps.
I think what you actually want is something like that
num.specs <- 3
haps[sample(seq(haps),size = num.specs,replace = F)]
[1] 3 5 4
That is a random subset of your vector haps ?
Not quite what you want (returns list of matrices instead of 3D array) but this might help
lapply(split(1:5, cut(1:5, breaks=c(0, 2, 5))), function(i) matrix(sample(i, 25, replace=TRUE), ncol=5))
Use cut and split to partition your vector of character labels before sampling them. Here I split your character labels at the value 2. Also, rather than sampling 5 numbers 5 times, you can sample 25 numbers once, and convert to matrix.
In a large dataframe (1 million+ rows), I am counting the number of elements (rows) that are within a particular range and satisfy a third criteria. I have 33 of those ranges and use a very slow for loop to get me the answer, no problem.
As speed is of massive concern, I would appreciate any help to get this to run faster. Can I get rid of the for loop and "vectorise" or any sort of "apply" solution?
Thanks in advance
Code:
N.data<-c(1:33)
Lower<-c(0,100000,125000,150000,175000,200000,225000,250000,275000,300000,325000,350000,375000,400000,425000,450000,475000,500000,550000,600000,650000,700000,750000,800000,850000,900000,950000,1000000,1100000,1200000,1300000,1400000,1500000)
Upper<-c(100000,125000,150000,175000,200000,225000,250000,275000,300000,325000,350000,375000,400000,425000,450000,475000,500000,550000,600000,650000,700000,750000,800000,850000,900000,950000,1000000,1100000,1200000,1300000,1400000,1500000, 5000000)
for (i in 1:(length(N.data))){
N.data[i]<-nrow(dataset[dataset$Z==c & dataset$X > Lower[i] & dataset$X < Upper[i],])
}
A more efficient approach:
# first logical index (vector)
idx1 <- dataset$Z == c
# second logical index (matrix)
idx2 <- mapply(function(l, u) dataset$X > l & dataset$X < u, Lower, Upper)
# combine both indices and count number of rows
N.data <- colSums(idx1 & idx2)
apply functions are not VECTORIZED. They are merely more efficient implementations of a for loop. To achieve what you seek using vectorization, here is one approach.
# Create a Dummy Dataset and Breaks
dataset = data.frame(
X = rpois(100, 10),
Z = rpois(100, 20)
)
breaks = seq(0, max(dataset$Z), length = 5)
# Add Column with Breaks
dataset = transform(dataset, Z2 = cut(Z, breaks, labels = FALSE))
# Use Aggregate to compute length for each value of Z2
c = 10
aggregate(Z ~ Z2, data = dataset, length, subset = (X == c))
This should be more efficient that using mapply, as it is completely vectorized.
I have a matrix of one column and 6 rows. I would like to replicate that column i times but change one value randomly each time, and after each iteration, calculate the mean and variance across all columns.
For example:
values = rnorm(6, 6, 1); matrix1 = matrix(values, 6)
After i=1, would look like:
values2 = values
values2[sample(1:6, 1)] = values2[sample(1:6, 1)]+runif(1, 0, 1)
matrix2 = matrix(c(values, values2), 6)
At the end, I would like to output a data frame that looks like so:
i mean var
1 1.23 2.31
2 1.24 2.33 etc...
For many i's. I imagine there is a way to do this with loops, but my skills are not such that I can figure it out. Thanks for all your help!
If you know how many times you're doing this, it would be best to construct your final matrix beforehand, especially if i is large. However, without that:
jitter.func <- function(x, vec) {
cell <- sample(1:length(vec), 1)
vec[cell] <- vec[cell] + runif(1, 0, 1)
return(c(mean=mean(vec), var=var(vec)))
}
i <- 10
sapply(1:i, jitter.func, vec=values)
j <- 20 # Number of columns
i <- 6 # Number of rows
vec <- matrix(rnorm(i,6,1),ncol=j,nrow=i)# vector replicated j times
idx <- sample(seq(i),j,replace=TRUE) # j random rows
vec[cbind(idx, seq(j))] <- vec[cbind(idx, seq(j))]+runif(j) # add random number to random row in each column
apply(vec,2,plyr::each(mean,var)) # summary statistics