sample with a minimal difference between two consecutive values

sample with a minimal difference between two consecutive values - r

I would like to sample values, but have a constraint in place that demands two values are at least window apart. This would be akin to sampling days in a year, but setting the window to be at least a fortnight apart. So far I've tried it like this
check.diff <- TRUE
window <- 14
while (check.diff == TRUE) {
sampled.session <- sort(sample(1:365, size = 5, replace = FALSE))
check.diff <- any(diff(sampled.session) < window)
}
This works nicely if the window constraint is small. If one specifies a rather large value, this can become an infinite loop. While I can insert all sorts of checks and maximum number of iterations, I was wondering if there's a smarter way of attacking this?

One way to do this is by removing candidates from the population from which you take the sample:
set.seed(42)
population <- 1:356
n_samples <- 5
window <- 14
sampled_session <- rep(sample(population, 1), n_samples) # initialize the vector
for (i in seq.int(2, n_samples)) {
borders <- sampled_session[i - 1] + (window - 1) * c(-1, 1)
days_in_window <- seq.int(borders[1], borders[2])
population <- setdiff(population, days_in_window)
sampled_session[i] <- sample(population, 1)
}
sort(sampled_session)
# [1] 90 193 264 309 326
diff(sort(sampled_session))
# [1] 103 71 45 17
Another way would be
set.seed(357)
population <- 1:357
n_samples <- 5
window <- 14
sampled.session <- numeric(n_samples)
for (i in seq_len(n_samples)) {
sampled.session[i] <- pick <- sample(population, 1)
population <- population[-which(population < pick + window & population > pick - window)]
}
sort(sampled.session)
[1] 19 39 111 134 267

Well, how about something like this.
window <- 14
sample_pair <- sample(1:365, size=2)
sample_pair[2] <- sample_pair[2] + window*(diff(foo)<window)
Then dump that pair into any larger sample group.
Or you could scale your entire sample set after drawing. Pseudocode:
samp.window <- diff(range(sample.set))
if (sample.window < window) sample.set <- sample.set *window/sample.window
Followed by a round or truncate if desired. Probably worth checking to make sure these distributions are uniform :-(

Related

How to get a random observation point at a specific time over multiple trials in R?

I am working on Spike Trains and my code to get a spike train like this:
for 20 trials is written below. The image is representational for 5 trials.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #10msSpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
{
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
}
Each trial has spikes occuring at random time points. Now what I am trying to work towards, is getting a random sample time point that works for all 20 trials and I want to get the vector consisting of length of the intervals this point falls into, for each trial. The code to get the time vector for the points where the spikes occur is,
A <- numeric()
for (i in 1: nTrials)
{
ISI <- function(i){
spike_times <- c(dt*which( SpikeMat[i, ]==1))
ISI1vec <- c(diff(spike_times))
A <- c(A, ISI1vec)
return(A)}
}
Then you call ISI(i) for whichever trial you wish to see the Interspike interval vector for. A visual representation of what I want is:
I want to get a vector that has the lengths of the interval where this points fall into, for each trial. I want to figure out it's distribution as well, but that's for later. Can anybody help me figure out how to code my way to this? Any help is appreciated, even if it's just about how to start/where to look.

Your data
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
I suggest transforming your sparse matrix data into a list of indices where spikes occur
L <- lapply(seq_len(nrow(SpikeMat)), function(i) setNames(which(SpikeMat[i, ] == 1), seq_along(which(SpikeMat[i, ] == 1))))
Grab random timepoint
set.seed(1)
RT <- round(runif(1) * ncol(SpikeMat))
# 531
Result
distances contains the distances to the 2 nearest spikes - each element of the list is a named vector where the values are the distances (to RT) and their names are their positions in the vector. nearest_columns shows the original timepoint (column number) of each spike in SpikeMat.
bookend_values <- function(vec) {
lower_val <- head(sort(vec[sign(vec) == 1]), 1)
upper_val <- head(sort(abs(vec[sign(vec) == -1])), 1)
return(c(lower_val, upper_val))
}
distances <- lapply(L, function(i) bookend_values(RT-i))
nearest_columns <- lapply(seq_along(distances), function(i) L[[i]][names(distances[[i]])])
Note that the inter-spike interval of the two nearest spikes that bookend RT can be obtained with
sapply(distances, sum)

Plot Wind Barb in R

This is more a question to see if anyone has seen anything like this in their travels. I am working with a lot of weather data and I would like to plot wind based on wind barbs.
I have looked into the package RadioSonde however its plotwind() function is not doing the job I had anticipated. It does have a good example of the type of data data(ExampleSonde)
Arguably I can use TeachingDemos in conjunction with my.symbols() to create these wind barbs. I was just curious if anyone has found (or created) a way to plot wind barbs. Otherwise my.symbols() it is.
Thanks,
Badger

Another way is to create the wind barbs using grid graphics.
First step is to calculate how many, and what type of barb is needed. As described here, I created three types, that represent 50, 10, and 5 knots - I round down the speed to the nearest five.
The function below wind_barb generates a new grob for each wind speed it is given. Using an idea from Integrating Grid Graphics Output with Base Graphics Output - Murrell (pg4) you can plot the grobs easily and represent the wind direction by rotating the viewport.
An example
Create some data
set.seed(1)
dat <- data.frame(x=-2:2, y=-2:2,
direction=sample(0:360, 5),
speed=c(10, 15, 50, 75, 100))
# x y direction speed
# 1 -2 -2 95 10
# 2 -1 -1 133 15
# 3 0 0 205 50
# 4 1 1 325 75
# 5 2 2 72 100
Plot
library(gridBase)
library(grid)
with(dat, plot(x, y, ylim=c(-3, 3), xlim=c(-3, 3), pch=16))
vps <- baseViewports()
pushViewport(vps$inner, vps$figure, vps$plot)
# Plot
for (i in 1:nrow(dat)) {
pushViewport(viewport(
x=unit(dat$x[i], "native"),
y=unit(dat$y[i], "native"),
angle=dat$direction[i]))
wind_barb(dat$speed[i])
popViewport()
}
popViewport(3)
Which produces
wind_barb function to create barbs (please simplify me!). You can change the height and width of the barb by adjusting mlength and wblength arguments respectively.
wind_barb <- function(x, mlength=0.1, wblength=0.025) {
# Calculate which / how many barbs
# any triangles (50)
fif <- floor(x /50)
# and then look for longer lines for remaining speed (10)
tn <- floor( (x - fif* 50)/10)
# and then look for shorter lines for remaining speed (5)
fv <- floor( (x - fif* 50 - tn* 10)/5)
# Spacing & barb length
yadj <- 0.5+mlength
dist <- (yadj-0.5) / 10
xadj <- 0.5+wblength
xfadj <- 0.5+wblength/2
# Create grobs
main_grob <- linesGrob(0.5, c(0.5, yadj ))
# 50 windspeed
if(fif != 0) {
fify <- c(yadj, yadj-dist*seq_len(2* fif) )
fifx <- c(0.5, xadj)[rep(1:2, length=length(fify))]
fif_grob <- pathGrob(fifx, fify, gp=gpar(fill="black"))
} else {
fif_grob <- NULL
fify <- yadj+dist
}
# Ten windspeed
if(tn != 0) {
tny <- lapply(seq_len(tn) , function(x) min(fify) - dist*c(x, x-1))
tn_grob <- do.call(gList,
mapply(function(x,y)
linesGrob(x=x, y=y, gp=gpar(fill="black")),
x=list(c(0.5, xadj)), y=tny, SIMPLIFY=FALSE))
} else {
tn_grob <- NULL
tny <- fify
}
# Five windspeed
if(fv != 0) {
fvy <- lapply(seq_len(fv) , function(x) min(unlist(tny)) -dist* c(x, x-0.5))
fv_grob <- do.call(gList,
mapply(function(x,y)
linesGrob(x=x, y=y, gp=gpar(fill="black")),
x=list(c(0.5, xfadj)), y=fvy, SIMPLIFY=FALSE))
} else {
fv_grob <- NULL
}
# Draw
#grid.newpage()
grid.draw(gList(main_grob, fif_grob, tn_grob, fv_grob))
}
-------------------------------------
comment from sezen below
The plotted wind direction is wrong. To have right meteorological wind direction, use angle = 360 - dat$direction[i]. See http://tornado.sfsu.edu/geosciences/classes/m430/Wind/WindDirection.html

R loop does not stop

I have a problem that sounds easy, but I really cannot find the mistake. I have 3377 data points (measurements of body temperature). The sampling rate is 5min and I would like to put the data into a matrix. However, R starts recycling once it has put all 3377 data points into the matrix. To prevent r from doing this, I wrote a loop and I want the loop to stop when the end of the vector is reached.
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11 # 11 full days of sampling (and a few more points, wherefore the matrix is to be 12 rows)
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*p)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*p)+p)]
} else {stop
}
}
}
However, the loop does not stop...it loops till the last cell in my matrix. According to my calculations, the last 79 cells should remain free (i.e. NA, because 3456 cells - 3377 = 79), but that is only true for the last 8 or so...
Any hints where the mistake is?
Thanks!

I think this does what you would like to do:
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11
length(Ankle.r) <- a * (c + 1) #pad input vector with NA values
m <- matrix(Ankle.r, ncol = a, byrow = TRUE)

Ok, try an example and it will show you where your mistake is...sighing. The loop must be:
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*a)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<=x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*a)+p)]
} else {stop
}
}
}
Thanks anyway!
Best,
Christine

Randomly selecting from a subset of rows

I have data in blocks[[i]] where i = 4 to 6 like so
Stimulus Response PM
stretagost s <NA>
colpublo s <NA>
zoning d <NA>
epilepsy d <NA>
resumption d <NA>
incisive d <NA>
440 rows in each block[[i]].
Currently my script does some stuff to 1 randomly selected item out of every 15 trials (except for the first 5 trials every 110, also I have it set so I can never choose rows less than 2 apart) for each block [[i]].
What I would like to be able to do is do stuff to 1 item from every 15 trials, randomly selected out of only those where response == "d". i.e., I don't want my random selection to ever do stuff to rows where response=="s". I have no idea how to achieve this but here is the script I have so far, which just randomly chooses 1 row out of each 15:
PMpositions <- list()
for (i in 4:6){
positions <- c()
x <- 0
for (j in c(seq(5, 110-15, 15),seq(115, 220-15, 15),seq(225, 330-15, 15),seq(335,440-15, 15)))
{
sub.samples <- setdiff(1:15 + j, seq(x-2,x+2,1))
x <- sample(sub.samples, 1)
positions <- c(positions,x)
}
PMpositions[[i]] <- positions
blocks[[i]]$Response[PMpositions[[i]]] <- Wordresponse
blocks[[i]]$PM[PMpositions[[i]]] <- PMresponse
blocks[[i]][PMpositions[[i]],]$Stimulus <- F[[i]]
}
I ended up dealing with it like so
PMpositions <- list()
for (i in 1:3){
startingpositions <- c(seq(5, 110-15, 15),seq(115, 220-15, 15),seq(225, 330-15,
15),seq(335, 440-15, 15))
positions <- c()
x <- 0
for (j in startingpositions)
{
sub.samples <- setdiff(1:15 + j, seq(x-2,x+2,1))
x <- sample(sub.samples, 1)
positions <- c(positions,x)
}
repeat {
positions[which(blocks[[i]][positions,2]==Nonwordresponse)]<-
startingpositions[which(blocks[[i]][positions,2]==Nonwordresponse)]+sample(1:15,
size=length(which(blocks[[i]][positions,2]==Nonwordresponse)), replace = TRUE)
distancecheck<- which ( abs( c(positions[2:length(positions)],0)-positions ) < 2)
if (length(positions[which(blocks[[i]][positions,2]==Nonwordresponse)])== 0 & length
(distancecheck)== 0) break
}
PMpositions[[i]] <- positions
blocks[[i]]$Response[PMpositions[[i]]] <- Wordresponse
blocks[[i]]$PM[PMpositions[[i]]] <- PMresponse
blocks[[i]][PMpositions[[i]],]$Stimulus <- as.character(NF[[i]][,1])
Nonfocal[[i]] <- blocks[[i]]
}
I realised when getting stuck on repeat loops that sometimes I have 15 "s" in response in a row! doh. Would be nice to be able to fix this but it is ok for what I need, when I get stuck I'm just running it again (the location of d/s are randomly generated).

EDIT: Here's a different approach that only samples 'd' rows. It's pretty customized code, but the main idea is to use the prob argument to only sample rows where "Response"=="d" and set the probably of sampling all other rows to zero.
Response <- rep(c("s","d"),220)
chunk <- sort(rep(1:30,15))[1:440] # chunks of 15 up to 440
# function to randomly sample from each set of 15 rows
sampby15 <- function(i){
sample((1:440)[chunk==i], 1,
# use the `prob` argument to only sample 'd' values
prob=rep(1,length=440)[chunk==i]*(Response=="d")[chunk==i])
}
s <- sapply(1:15,FUN=sampby15) # apply to each chunk to get sample rows
Response[s] # confirm only 'd' values
# then you have code to do whatever to those rows...

So the really basic function you'll want to operate on each block is like this:
subsetminor <- function(dataset, only = "d", rows = 1) {
remainder <- subset(dataset, Response == only)
return(remainder[sample(1:nrow(remainder), size = rows), ])
}
We can spruce it up a bit to avoid rows next to each other:
subsetminor <- function(dataset, only = "d", rows = 1) {
remainder <- subset(dataset, Response == only)
if(rows > 1) {
sampled <- sample(1:nrow(remainder), size = rows)
pairwise <- t(combn(sampled, 2))
while(any(abs(pairwise[, 1] - pairwise[, 2]) <= 2)) {
sampled <- sample(1:nrow(remainder), size = rows)
pairwise <- t(combn(sampled, 2))
}
}
out <- remainder[sampled, ]
return(out)
}
The above can be simplified/DRY'd out quite a bit, but it should get the job done.

Equal frequency discretization in R

I'm having trouble finding a function in R that performs equal-frequency discretization. I stumbled on the 'infotheo' package, but after some testing I found that the algorithm is broken. 'dprep' seems to no longer be supported on CRAN.
EDIT :
For clarity, I do not need to seperate the values between the bins. I really want equal frequency, it doesn't matter if one value ends up in two bins. Eg :
c(1,3,2,1,2,2)
should give a bin c(1,1,2) and one c(2,2,3)

EDIT : given your real goal, why don't you just do (corrected) :
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
This returns a vector with indicators for which bin they are. But as some values might be present in both bins, you can't possibly define the bin limits. But you can do :
x <- rpois(50,5)
y <- EqualFreq2(x,15)
table(y)
split(x,y)
Original answer:
You can easily just use cut() for this :
EqualFreq <-function(x,n,include.lowest=TRUE,...){
nx <- length(x)
id <- round(c(1,(1:(n-1))*(nx/n),nx))
breaks <- sort(x)[id]
if( sum(duplicated(breaks))>0 stop("n is too large.")
cut(x,breaks,include.lowest=include.lowest,...)
}
Which gives :
set.seed(12345)
x <- rnorm(50)
table(EqualFreq(x,5))
[-2.38,-0.886] (-0.886,-0.116] (-0.116,0.586] (0.586,0.937] (0.937,2.2]
10 10 10 10 10
x <- rpois(50,5)
table(EqualFreq(x,5))
[1,3] (3,5] (5,6] (6,7] (7,11]
10 13 11 6 10
As you see, for discrete data an optimal equal binning is rather impossible in most cases, but this method gives you the best possible binning available.

This sort of thing is also quite easily solved by using (abusing?) the conditioning plot infrastructure from lattice, in particular function co.intervals():
cutEqual <- function(x, n, include.lowest = TRUE, ...) {
stopifnot(require(lattice))
cut(x, co.intervals(x, n, 0)[c(1, (n+1):(n*2))],
include.lowest = include.lowest, ...)
}
Which reproduces #Joris' excellent answer:
> set.seed(12345)
> x <- rnorm(50)
> table(cutEqual(x, 5))
[-2.38,-0.885] (-0.885,-0.115] (-0.115,0.587] (0.587,0.938] (0.938,2.2]
10 10 10 10 10
> y <- rpois(50, 5)
> table(cutEqual(y, 5))
[0.5,3.5] (3.5,5.5] (5.5,6.5] (6.5,7.5] (7.5,11.5]
10 13 11 6 10
In the latter, discrete, case the breaks are different although they have the same effect; the same observations are in the same bins.

How about?
a <- rnorm(50)
> table(Hmisc::cut2(a, m = 10))
[-2.2020,-0.7710) [-0.7710,-0.2352) [-0.2352, 0.0997) [ 0.0997, 0.9775)
10 10 10 10
[ 0.9775, 2.5677]
10

The classInt library is created "for choosing univariate class intervals for mapping or other graphics purposes". You can just do:
dataset <- c(1,3,2,1,2,2)
library(classInt)
classIntervals(dataset, 2, style = 'quantile')
where 2 is the number of bins you want and the quantile style provides quantile breaks. There are several styles available for this function: "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust",
"bclust", "fisher", or "jenks". Check docs for more info.

Here is a function that handle the error :'breaks' are not unique, and automatically select the closest n_bins value to the one you setted up.
equal_freq <- function(var, n_bins)
{
require(ggplot2)
n_bins_orig=n_bins
res=tryCatch(cut_number(var, n = n_bins), error=function(e) {return (e)})
while(grepl("'breaks' are not unique", res[1]) & n_bins>1)
{
n_bins=n_bins-1
res=tryCatch(cut_number(var, n = n_bins), error=function(e) {return (e)})
}
if(n_bins_orig != n_bins)
warning(sprintf("It's not possible to calculate with n_bins=%s, setting n_bins in: %s.", n_bins_orig, n_bins))
return(res)
}
Example:
equal_freq(mtcars$carb, 10)
Which retrieves the binned variable and the following warning:
It's not possible to calculate with n_bins=10, setting n_bins in: 5.

Here is a one liner solution inspired by #Joris' answer:
x <- rpois(50,5)
binSize <- 5
desiredFrequency = floor(length(x)/binSize)
split(sort(x), rep(1:binSize, rep(desiredFrequency, binSize)))

Here's another solution using mltools.
set.seed(1)
x <- round(rnorm(20), 2)
x.binned <- mltools::bin_data(x, bins = 5, binType = "quantile")
table(x.binned)
x.binned
[-2.21, -0.622) [-0.622, 0.1) [0.1, 0.526) [0.526, 0.844) [0.844, 1.6]
4 4 4 4 4

We can use package cutr with feature what = "rough", the look of labels can be customized to taste :
# devtools::install_github("moodymudskipper/cutr")
library(cutr)
smart_cut(c(1, 3, 2, 1, 2, 2), 2, "rough", brackets = NULL, sep="-")
# [1] 1-2 2-3 1-2 1-2 2-3 2-3
# Levels: 1-2 < 2-3

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

sample with a minimal difference between two consecutive values - r

Related

How to get a random observation point at a specific time over multiple trials in R?

Plot Wind Barb in R

R loop does not stop

Randomly selecting from a subset of rows

Equal frequency discretization in R

Categories

Resources