I have a problem that sounds easy, but I really cannot find the mistake. I have 3377 data points (measurements of body temperature). The sampling rate is 5min and I would like to put the data into a matrix. However, R starts recycling once it has put all 3377 data points into the matrix. To prevent r from doing this, I wrote a loop and I want the loop to stop when the end of the vector is reached.
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11 # 11 full days of sampling (and a few more points, wherefore the matrix is to be 12 rows)
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*p)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*p)+p)]
} else {stop
}
}
}
However, the loop does not stop...it loops till the last cell in my matrix. According to my calculations, the last 79 cells should remain free (i.e. NA, because 3456 cells - 3377 = 79), but that is only true for the last 8 or so...
Any hints where the mistake is?
Thanks!
I think this does what you would like to do:
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11
length(Ankle.r) <- a * (c + 1) #pad input vector with NA values
m <- matrix(Ankle.r, ncol = a, byrow = TRUE)
Ok, try an example and it will show you where your mistake is...sighing. The loop must be:
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*a)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<=x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*a)+p)]
} else {stop
}
}
}
Thanks anyway!
Best,
Christine
Related
I have a data frame with 30 rows and 4 columns (namely, x, y, z, u). It is given below.
mydata = data.frame(x = rnorm(30,4), y = rnorm(30,2,1), z = rnorm(30,3,1), u = rnorm(30,5))
Further, I have a sequence values, which represent row number in my data frame.
myseq = c(seq(1, 30, by = 5))
myseq
[1] 1 6 11 16 21 26
Now, I wanted to compute the prob values for each segment of 99 rows.
filt= subset(mydata[1:6,], mydata[1:6,]$x < mydata[1:6,]$y & mydata[1:6,]$z < mydata[1:6,]$u
filt
prob = length(filt$x)/30
prob
Then I need to compute the above prob for 1:6,.., 27:30 and so on . Here, I have only 6 prob values. So, I can do one by one. If I have 100 values it would be tedious. Are there any way to compute the prob values?.
Thank you in advance.
BTW: in subset(DF[1:99,], ...), use DF[1:99,] in the first argument, not again, ala
subset(DF[1:99,], cumsuml < inchivaluel & cumsumr < inchivaluer)
Think about how to do this in a list.
The first step is to break your data into the va starting points. I'll start with a list of the indices to break it into:
inds <- mapply(seq, va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
this now is a list of sequences, starting with 1:99, then 100:198, etc. See str(inds) to verify.
Now we can subset a portion of the data based on each element's vector of indices:
filts <- lapply(inds, function(ind) subset(DF[ind,], cumsuml < inchivaluel & cumsumr < inchivaluer))
We now have a list of vectors, let's summarize it:
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))
Bottom line, it helps to think about how to break this problem into lists, examples at http://stackoverflow.com/a/24376207/3358272.
BTW: instead of initially making a list of indices, we could just break up the data in that first step, ala
DF2 <- mapply(function(a,b) DF[a:b,], va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
filts <- lapply(DF2, function(x) subset(x, cumsuml < inchivaluel & cumsumr < inchivaluer))
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))
I am working on Spike Trains and my code to get a spike train like this:
for 20 trials is written below. The image is representational for 5 trials.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #10msSpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
{
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
}
Each trial has spikes occuring at random time points. Now what I am trying to work towards, is getting a random sample time point that works for all 20 trials and I want to get the vector consisting of length of the intervals this point falls into, for each trial. The code to get the time vector for the points where the spikes occur is,
A <- numeric()
for (i in 1: nTrials)
{
ISI <- function(i){
spike_times <- c(dt*which( SpikeMat[i, ]==1))
ISI1vec <- c(diff(spike_times))
A <- c(A, ISI1vec)
return(A)}
}
Then you call ISI(i) for whichever trial you wish to see the Interspike interval vector for. A visual representation of what I want is:
I want to get a vector that has the lengths of the interval where this points fall into, for each trial. I want to figure out it's distribution as well, but that's for later. Can anybody help me figure out how to code my way to this? Any help is appreciated, even if it's just about how to start/where to look.
Your data
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
I suggest transforming your sparse matrix data into a list of indices where spikes occur
L <- lapply(seq_len(nrow(SpikeMat)), function(i) setNames(which(SpikeMat[i, ] == 1), seq_along(which(SpikeMat[i, ] == 1))))
Grab random timepoint
set.seed(1)
RT <- round(runif(1) * ncol(SpikeMat))
# 531
Result
distances contains the distances to the 2 nearest spikes - each element of the list is a named vector where the values are the distances (to RT) and their names are their positions in the vector. nearest_columns shows the original timepoint (column number) of each spike in SpikeMat.
bookend_values <- function(vec) {
lower_val <- head(sort(vec[sign(vec) == 1]), 1)
upper_val <- head(sort(abs(vec[sign(vec) == -1])), 1)
return(c(lower_val, upper_val))
}
distances <- lapply(L, function(i) bookend_values(RT-i))
nearest_columns <- lapply(seq_along(distances), function(i) L[[i]][names(distances[[i]])])
Note that the inter-spike interval of the two nearest spikes that bookend RT can be obtained with
sapply(distances, sum)
I have the following problem.
I have multiple subarrays (say 2) that I have populated with character labels (1, 2, 3, 4, 5). My algorithm selects labels at random based on occurrence probabilities.
How can I get R to instead select labels 1:3 for subarray 1 and 4:5 for subarray 2, say, without using subsetting (i.e., []). That is, I want a random subset of labels to be selected for each subarray, instead of all labels assigned to each subarray manually using [].
I know sample() should help.
Using subsetting (which I don't want) one would do
x <- 1:5
sample(x[1:3], size, prob = probs[1:3])
but this assigns labels 1:3 to ALL subarrays.
Would
sample(sample(x), size, replace = TRUE, prob = probs)
work?
Any ideas? Please let me know if this is unclear.
Here is a small example, which selects labels from 1:5 for each of 10 subarrays.
set.seed(1)
N <- 10
K <- 2
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
perms <- 5
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs) # I would like each subarray to contain a random subset of 1:5.
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
pop
Hopefully this helps.
I think what you actually want is something like that
num.specs <- 3
haps[sample(seq(haps),size = num.specs,replace = F)]
[1] 3 5 4
That is a random subset of your vector haps ?
Not quite what you want (returns list of matrices instead of 3D array) but this might help
lapply(split(1:5, cut(1:5, breaks=c(0, 2, 5))), function(i) matrix(sample(i, 25, replace=TRUE), ncol=5))
Use cut and split to partition your vector of character labels before sampling them. Here I split your character labels at the value 2. Also, rather than sampling 5 numbers 5 times, you can sample 25 numbers once, and convert to matrix.
I have a data matrix with 100,000 rows of values corresponding to methylation values across several cell types. I would like to visually display the changes in methylation in a clustered heatmap. To get the data into a more manageable size I was thinking of creating a new data matrix every 10th or so row. Is there any simple way to do this?
Use seq and combinations of arguments. E.g.:
m1 <- matrix(runif(100000*10), ncol = 10)
m2 <- m1[seq(from = 1, to = nrow(m1), by = 10), ]
> dim(m2)
[1] 10000 10
How does this work? Look at what this does:
> sq <- seq(from = 1, to = nrow(m1), by = 10)
> head(sq)
[1] 1 11 21 31 41 51
> tail(sq)
[1] 99941 99951 99961 99971 99981 99991
> nrow(m1)
[1] 100000
We specify to go from the first row to the last incrementing 10 each step. This gives us rows 1, 11, 21, etc. When we get to the end of the sequence, even though we specified nrow(m1) (which is 100000) the last element in our sequence in 99991. This is because 99991 + 10 would take us beyond the from argument limit (beyond 100000) and hence that is not included in the sequence.
Try the following which takes your large matrix m and generates a list of smaller matrices. It generates a sequence of indices that breaks at every chunk.length values and then collects the chunks.
list.of.matrices <- lapply(X=seq.int(1, nrow(m), by=chunk.length)),
FUN=function (k) {
m[k + seq_len(chunk.length) - 1, ])
})
However, if you have 100,000 rows, it will be wasteful for your RAM to save all these chunks separately. Perhaps, you can just do the required computation on the subsets and save only the results. Just a suggestion.
I have a matrix of one column and 6 rows. I would like to replicate that column i times but change one value randomly each time, and after each iteration, calculate the mean and variance across all columns.
For example:
values = rnorm(6, 6, 1); matrix1 = matrix(values, 6)
After i=1, would look like:
values2 = values
values2[sample(1:6, 1)] = values2[sample(1:6, 1)]+runif(1, 0, 1)
matrix2 = matrix(c(values, values2), 6)
At the end, I would like to output a data frame that looks like so:
i mean var
1 1.23 2.31
2 1.24 2.33 etc...
For many i's. I imagine there is a way to do this with loops, but my skills are not such that I can figure it out. Thanks for all your help!
If you know how many times you're doing this, it would be best to construct your final matrix beforehand, especially if i is large. However, without that:
jitter.func <- function(x, vec) {
cell <- sample(1:length(vec), 1)
vec[cell] <- vec[cell] + runif(1, 0, 1)
return(c(mean=mean(vec), var=var(vec)))
}
i <- 10
sapply(1:i, jitter.func, vec=values)
j <- 20 # Number of columns
i <- 6 # Number of rows
vec <- matrix(rnorm(i,6,1),ncol=j,nrow=i)# vector replicated j times
idx <- sample(seq(i),j,replace=TRUE) # j random rows
vec[cbind(idx, seq(j))] <- vec[cbind(idx, seq(j))]+runif(j) # add random number to random row in each column
apply(vec,2,plyr::each(mean,var)) # summary statistics