I am trying to bucket coordinates into their nearest coordinate. In a sense, I am doing one iteration of kmeans clustering, with 1222 centroids. Below I have a function that does this, imperfectly, and too slowly as well. I am looking for help on improving this function:
discretizeCourt <- function(x_loc, y_loc) {
# create the dataframe of points that I want to round coordinates to
y <- seq(0, 50, by = 2)
x1 <- seq(1, 93, by = 2)
x2 <- seq(2, 94, by = 2)
x <- c(x1, x2)
coordinates <- data.frame(
x = rep(x, 13),
y = rep(y, each = length(x1)),
count = 0
)
# loop over each point in x_loc and y_loc
# increment the count column whenever a point is 'near' that column
for(i in 1:length(x_loc)) {
this_x = x_loc[i]
this_y = y_loc[i]
coordinates[coordinates$x > this_x-1 &
coordinates$x < this_x+1 &
coordinates$y > this_y-1 &
coordinates$y < this_y+1, ]$count =
coordinates[coordinates$x > this_x-1 &
coordinates$x < this_x+1 &
coordinates$y > this_y-1 &
coordinates$y < this_y+1, ]$count + 1
}
}
Here is some test data that I'm working with:
> dput(head(x_loc, n = 50))
c(13.57165, 13.61702, 13.66478, 13.70833, 13.75272, 13.7946,
13.83851, 13.86792, 13.8973, 13.93906, 13.98099, 14.02396, 14.06338,
14.10872, 14.15412, 14.2015, 14.26116, 14.30871, 14.35056, 14.39536,
14.43964, 14.48442, 14.5324, 14.57675, 14.62267, 14.66972, 14.71443,
14.75383, 14.79012, 14.82455, 14.85587, 14.87557, 14.90737, 14.9446,
14.97763, 15.01079, 15.04086, 15.06752, 15.09516, 15.12394, 15.15191,
15.18061, 15.20413, 15.22896, 15.25411, 15.28108, 15.3077, 15.33578,
15.36507, 15.39272)
> dput(head(y_loc, n = 50))
c(25.18298, 25.17431, 25.17784, 25.18865, 25.20188, 25.22865,
25.26254, 25.22778, 25.20162, 25.25191, 25.3044, 25.35787, 25.40347,
25.46049, 25.5199, 25.57132, 25.6773, 25.69842, 25.73877, 25.78383,
25.82168, 25.86067, 25.89984, 25.93067, 25.96943, 26.01083, 26.05861,
26.11965, 26.18428, 26.25347, 26.3352, 26.35756, 26.4682, 26.55412,
26.63745, 26.72157, 26.80021, 26.8691, 26.93522, 26.98879, 27.03783,
27.07818, 27.03786, 26.9909, 26.93697, 26.87916, 26.81606, 26.74908,
26.67815, 26.60898)
My actual x_loc and y_loc files are ~60000 coordinates, and I have thousands of files each with ~60000 coordinates, so it's a lot of work. I am pretty certain that the reason the function runs slow is the way I am indexing / incrementing.
The counting is imperfect. A technically better approach would be to loop over all 60000 points (above only 50 points for the example), and for each point, calculate the distance between that point and each point in the coordinates dataframe (1222 points). However thats 60000 * 1222 calculations, just for this one set of points, which is too high.
Would greatly appreciate any help on this!
Thanks,
EDIT: I'm working on converting my dataframes / vectors to 2 matrices, and vectorizing the whole approach, will let you know if it works.
If you want to process your matrix faster than your solution, consider using data.table library. Please see the example below:
df <- data.table(x_loc, y_loc) # Your data.frame is turned into a data.table
df$row.idx <- 1:nrow(df) # This column is used as ID for each sample point.
Now, we can find the right coordinate for each point. Later we can calculate how many points belong to a certain coordinate. We keep the coordinates data frame first:
y <- seq(0, 50, by = 2)
x1 <- seq(1, 93, by = 2)
x2 <- seq(2, 94, by = 2)
x <- c(x1, x2)
coordinates <- data.frame(
x = rep(x, 13),
y = rep(y, each = length(x1)),
count = 0
)
coordinates$row <- 1:nrow(coordinates) # Similar to yours. However, this time we are interested in seeing which points belong to this coordinate.
Now, we define a function which checks the coordinates and returns the one within one unit distance of the point in question.
f <- function(this_x, this_y, coordinates) {
res <- coordinates[coordinates$x > this_x-1 &
coordinates$x < this_x+1 &
coordinates$y > this_y-1 &
coordinates$y < this_y+1, ]$row
res
}
For each point, we find its right coordinate:
df[, coordinate.idx := f(x_loc, y_loc), by = row.idx]
df[, row.idx := NULL]
df contains the following variables: (x_loc, y_loc, coordinate.idx). You can populate coordinates$count using this. Even for 60000 points, it should not take more than 1 second.
for(i in 1:nrow(coordinates)) {
coordinates$count = length(which(df$coordinate.idx == i))
}
Related
Recently, I learned how to write a loop that initializes some number, and then randomly generates numbers until the initial number is guessed (while recording the number of guesses it took) such that no number will be guessed twice:
# https://stackoverflow.com/questions/73216517/making-sure-a-number-isnt-guessed-twice
all_games <- vector("list", 100)
for (i in 1:100){
guess_i = 0
correct_i = sample(1:100, 1)
guess_sets <- 1:100 ## initialize a set
trial_index <- 1
while(guess_i != correct_i){
guess_i = sample(guess_sets, 1) ## sample from this set
guess_sets <- setdiff(guess_sets, guess_i) ## remove it from the set
trial_index <- trial_index + 1
}
## no need to store `i` and `guess_i` (as same as `correct_i`), right?
game_results_i <- data.frame(i, trial_index, guess_i, correct_i)
all_games[[i]] <- game_results_i
}
all_games <- do.call("rbind", all_games)
I am now trying to modify the above code to create the following two loops:
(Deterministic) Loop 1 will always guess the midpoint (round up) and told if their guess is smaller or bigger than the correct number. They will then re-take the midpoint (e.g. their guess and the floor/ceiling) until they reach the correct number.
(Semi-Deterministic) Loop 2 first makes a random guess and is told if their guess is bigger or smaller than the number. They then divide the difference by half and makes their next guess randomly in a smaller range. They repeat this process many times until they reach the correct number.
I tried to write a sketch of the code:
#Loop 2:
correct = sample(1:100, 1)
guess_1 = sample(1:100, 1)
guess_2 = ifelse(guess_1 > correct, sample(50:guess_1, 1), sample(guess_1:100, 1))
guess_3 = ifelse(guess_2 > correct, sample(50:guess_2, 1), sample(guess_2:100, 1))
guess_4 = ifelse(guess_4 > correct, sample(50:guess_3, 1), sample(guess_3:100, 1))
#etc
But I am not sure if I am doing this correctly.
Can someone please help me with this?
Thank you!
Example : Suppose I pick the number 68
Loop 1: first random guess = 51, (100-51)/2 + 51 = 75, (75-50)/2 + 50 = 63, (75 - 63)/2 + 63 = 69, (69 - 63)/2 + 63 = 66, etc.
Loop 2: first random guess = 53, rand_between(53,100) = 71, rand_between(51,71) = 65, rand(65,71) = 70, etc.
I don't think you need a for loop for this, you can create structures since the beginning, with sample, sapply and which:
## correct values can repeat, so we set replace to TRUE
corrects <- sample(1:100, 100, replace = TRUE)
## replace is by default FALSE in sample(), if you don't want repeated guesses
## sapply() creates a matrix
guesses <- sapply(1:100, function(x) sample(1:100, 100))
## constructing game_results_i equal to yours, but could be simplified
game_results_i <- data.frame(
i = 1:100,
trial_index = sapply(
1:100,
function(x) which(
## which() returns the index of the first element that makes the predicate true
guesses[, x] == corrects[x]
)
),
guess_i = corrects,
correct_i = corrects # guess_i and correct_i are obviously equal
)
Ok, let's see if now I match question and answer properly :)
If I got correctly your intentions, in both loops, you are setting increasingly finer lower and upper bounds. Each guess reduces the search space. However, this interpretation does not always match your description, please double check if it can be acceptable for your purposes.
I wrote two functions, guess_bisect for the deterministic loop_1 and guess_sample for loop_2:
guess_bisect <- function(correct, n = 100) {
lb <- 0
ub <- n + 1
trial_index <- 1
guess <- round((ub - lb) / 2) + lb
while (guess != correct) {
# cat(lb, ub, guess, "\n") # uncomment to print the guess iteration
if (guess < correct)
lb <- guess
else
ub <- guess
guess <- round((ub - lb) / 2) + lb
trial_index <- trial_index + 1
}
trial_index
}
guess_sample <- function(correct, n = 100) {
lb <- 0
ub <- n + 1
trial_index <- 1
guess <- sample((lb + 1):(ub - 1), 1)
while (guess != correct) {
# cat(lb, ub, guess, "\n") # uncomment to print the guess iteration
if (guess < correct)
lb <- guess
else
ub <- guess
guess <- sample((lb + 1):(ub - 1), 1)
trial_index <- trial_index + 1
}
trial_index
}
Obviously, guess_bisect always produces the same results with the same input, guess_sample changes randomly instead.
By plotting the results in a simple chart, it seems that the deterministic bisection is on the average much better, as the random sampling may become happen to pick improvements from the wrong sides. x-axis is the correct number, spanning 1 to 100, y-axis is the trial index, with guess_bisect you get the red curve, with many attempts of guess_sample you get the blue curves.
I am working on Spike Trains and my code to get a spike train like this:
for 20 trials is written below. The image is representational for 5 trials.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #10msSpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
{
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
}
Each trial has spikes occuring at random time points. Now what I am trying to work towards, is getting a random sample time point that works for all 20 trials and I want to get the vector consisting of length of the intervals this point falls into, for each trial. The code to get the time vector for the points where the spikes occur is,
A <- numeric()
for (i in 1: nTrials)
{
ISI <- function(i){
spike_times <- c(dt*which( SpikeMat[i, ]==1))
ISI1vec <- c(diff(spike_times))
A <- c(A, ISI1vec)
return(A)}
}
Then you call ISI(i) for whichever trial you wish to see the Interspike interval vector for. A visual representation of what I want is:
I want to get a vector that has the lengths of the interval where this points fall into, for each trial. I want to figure out it's distribution as well, but that's for later. Can anybody help me figure out how to code my way to this? Any help is appreciated, even if it's just about how to start/where to look.
Your data
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
I suggest transforming your sparse matrix data into a list of indices where spikes occur
L <- lapply(seq_len(nrow(SpikeMat)), function(i) setNames(which(SpikeMat[i, ] == 1), seq_along(which(SpikeMat[i, ] == 1))))
Grab random timepoint
set.seed(1)
RT <- round(runif(1) * ncol(SpikeMat))
# 531
Result
distances contains the distances to the 2 nearest spikes - each element of the list is a named vector where the values are the distances (to RT) and their names are their positions in the vector. nearest_columns shows the original timepoint (column number) of each spike in SpikeMat.
bookend_values <- function(vec) {
lower_val <- head(sort(vec[sign(vec) == 1]), 1)
upper_val <- head(sort(abs(vec[sign(vec) == -1])), 1)
return(c(lower_val, upper_val))
}
distances <- lapply(L, function(i) bookend_values(RT-i))
nearest_columns <- lapply(seq_along(distances), function(i) L[[i]][names(distances[[i]])])
Note that the inter-spike interval of the two nearest spikes that bookend RT can be obtained with
sapply(distances, sum)
Everyone. I'm trying to filter GPS location data based on distance (UTMs) and time (H:M:S) criteria independently and concurrently. Here's the data structure:
head(collar)
FID animal date time zone easting northing
1 URAM01_2012 6/24/2012 10:00:00 AM 13S 356664 3971340
2 URAM01_2012 6/24/2012 1:02:00 PM 13S 356760 3971480
3 URAM01_2012 6/24/2012 4:01:00 PM 13S 357482 3972325
4 URAM01_2012 6/24/2012 7:01:00 PM 13S 356882 3971327
5 URAM01_2012 6/25/2012 4:01:00 AM 13S 356574 3971765
6 URAM01_2012 6/25/2012 7:01:00 AM 13S 357796 3972231
Right now I'm filtering by distance only but I'm having some issues. The code should calculate the distance between FID[1] and FID[2] and then assign that distance to FID[1] in a new column ($step.length). After all distances have been calculated, the data is then subsetted based on a distance rule. Right now I have it set to where I want all locations that are >200m apart. Once subsetted, the process is then repeated until the distance between all subsequent locations is >200m. Here's the code that I've written that accomplishes only a portion of what I'd like to do:
reps <- 10
#Begin loop for the number of reps. Right now it's at 10 just to see if the code works.
for(rep in 1:reps){
#Begin loop for the number of GPS locations in the file
for(i in 1:length(collar$FID)){
#Calculate the distance between a GPS location and the next GPS locations. the formula is the hypotenuse of the Pythagorean theorem.
collar$step.length[i] <- sqrt(((collar$easting[i] - collar$easting[i+1])^2) + ((collar$northing[i] - collar$northing[i+1])^2))
}
#Subset the data. Select all locations that are >200m from the next GPS location.
collar <- subset(collar, step.length >200)
}
Now, the code isn't perfect and I would like to add 2 conditions into the code.
1.) Animal ID isn't considered. Therefore, a distance for the last location of an animal will be generated using the first location of a new animal when the distance should be NA. I thought using for(i in 1:unique(collar$animal)) might work, but it didn't (shocking) and I'm not sure what to do since for(i in length(collar$animal)) doesn't use only unique values.
2.) I'd also like to insert a break in the for loop when all locations that are >200m. I'm sure there has to be a better way of doing this, but I thought I'd set reps to something large (e.g., 10000) and once a criteria was met then R would break:
if(collar$step.length > 200){
break }
Yet, since the if condition is >1 only the first element is used. I've haven't thought about time or distance/time yet, but if anyone has any suggestions for those endeavors, I'd appreciate the advice. Thanks for your help and guidance.
I don't quite understand what you are trying to do with the reps but you can take advantage of the split and unsplit functions to focus on each individual animal.
First I created a distance() function that finds the columns named easting and northing from the object to create a vector of distances. Then we split collar up by the animal, and apply the distance function to each animal. We add this list of distances to the list of animals with some mapply code and then unsplit the results to make everything go back together.
Let me know what you want to do with the ">200" step.
distance <- function(x){
easting <- x$easting
northing <- x$northing
easting2 <- c(easting[-1], NA)
northing2 <- c(northing[-1], NA)
sqrt((easting - easting2)^2 + (northing - northing2)^2)
}
s <- split(collar, collar$animal)
distances <- lapply(s, distance)
s2 <- mapply(cbind, s, "Distance" = distances, SIMPLIFY = F)
collar.new <- unsplit(s2, collar$animal)
EDIT:
Apologies if this is cumbersome, I'm sure I can get it shorter but as of now let me know if it works for you. I would also be curious to see how fast it runs as I have been making up my own data.
filterout <- function(input, value = NULL){
# requirements of the input object
stopifnot(all(c("FID","animal","easting","northing") %in% colnames(input)))
distance <- function(x){ # internal distance function
e1 <- x$easting; e2 <- c(NA, e1[-nrow(x)])
n1 <- x$northing; n2 <- c(NA, n1[-nrow(x)])
sqrt((e1 - e2)^2 + (n1 - n2)^2)
}
nc <- ncol(input) # save so we can "rewrite" Distance values each reiteration
f <- function(input){ # the recursive function (will run until condition is met)
z <- split(input[,-(nc+1)], input$animal) # split by animal & remove (if any) prior Distance column
distances <- lapply(z, distance) # collect distances
z2 <- mapply(cbind, z, "Distance" = distances, SIMPLIFY = F) # attach distances
r1 <- lapply(z2, function(x) { # delete first row under criteria
a <- x$Distance < value # CRITERIA
a[is.na(a)] <- FALSE # Corrects NA values into FALSE so we don't lose them
first <- which(a == T)[1] # we want to remove one at a time
`if`(is.na(first), integer(0), x$FID[first]) # returns FIDs to remove
})
z3 <- unsplit(z2, input$animal)
# Whether to keep going or not
if(length(unlist(r1)) != 0){ # if list of rows under criteria is not empty
remove <- which(z3$FID %in% unlist(r1, use.names = F)) # remove them
print(unlist(r1, use.names = F)) # OPTIONAL*** printing removed FIDs
f(z3[-remove,]) # and run again
} else {
return(z3) # otherwise return the final list
}
}
f(input)
}
And the function can be used as follows:
filterout(input = collar, value = 200)
filterout(input = collar, value = 400)
filterout(input = collar, value = 600)
EDIT2:
I opened up a bounty question to figure out how to do a certain step but hopefully this answer helps. It might take a little ~ a minute to do 37k rows but let me know~
x <- collar
skipdistance <- function(x, value = 200){
d <- as.matrix(dist(x[,c("easting","northing")]))
d[lower.tri(d)] <- 0
pick <- which(d > value, arr.ind = T) # pick[order(pick[,"row"]),] # visual clarity
findConnectionsBase <- function(m) {
n <- nrow(m)
myConnections <- matrix(integer(0), nrow = n, ncol = 2)
i <- j <- 1L
k <- 2L
while (i <= n) {
myConnections[j, ] <- m[i, ]
while (k <= n && m[i, 2] != m[k, 1]) {k <- k + 1L}
i <- k
j <- j + 1L
}
myConnections[!is.na(myConnections[,1]), ]
}
keep.ind <- findConnectionsBase(pick)
keep.row <- unique(c(keep.ind))
cbind(x[keep.row,], Distance = c(NA,d[keep.ind]))
}
a <- do.call(rbind,lapply(split(x, x$animal), skipdistance, value = 200))
dim(a)
Edit #3:
library(lubridate) # great package for string -> dates
# changed to give just rows that satisfy greater than value criteria
skip <- function(dist.var, value = 200){
d <- as.matrix(dist(dist.var))
d[lower.tri(d)] <- 0
pick <- which(d > value, arr.ind = T) # pick[order(pick[,"row"]),] # visual clarity
findConnectionsBase <- function(m) {
n <- nrow(m)
myConnections <- matrix(integer(0), nrow = n, ncol = 2)
i <- j <- 1L
k <- 2L
while (i <= n) {
myConnections[j, ] <- m[i, ]
while (k <= n && m[i, 2] != m[k, 1]) {k <- k + 1L}
i <- k
j <- j + 1L
}
myConnections[!is.na(myConnections[,1]), ]
}
unique(c(findConnectionsBase(pick)))
}
collar <- structure(list(FID = 1:8, animal = c("URAM01_2012", "URAM01_2012", "URAM01_2012", "URAM01_2012", "URAM01_2013", "URAM01_2013", "URAM01_2013", "URAM01_2013"), date = c("6/24/2012", "6/24/2012", "6/24/2012", "6/24/2012", "6/25/2012", "6/25/2012", "6/25/2012", "6/25/2012" ), time = c("10:00:00AM", "1:02:00PM", "4:01:00PM", "7:01:00PM", "4:01:00AM", "7:01:00AM", "7:01:00AM", "7:01:00AM"), zone = c("13S", "13S", "13S", "13S", "13S", "13S", "13S", "13S"), easting = c(356664L,
356760L, 356762L, 356882L, 356574L, 357796L, 357720L, 357300L), northing = c(3971340L, 3971480L, 3971498L, 3971498L, 3971765L, 3972231L, 3972230L, 3972531L)), .Names = c("FID", "animal", "date", "time", "zone", "easting", "northing"), class = "data.frame", row.names = c(NA, -8L))
collar[skip(dist.var = collar[,c("easting","northing")],
value = 200),]
# dist function works on dates, but it makes sense to convert to hours
dist(lubridate::mdy_hms(paste(collar$date, collar$time)))
hours <- 2.99
collar[ skip(dist.var = lubridate::mdy_hms(paste(collar$date, collar$time)),
value = hours * 3600), ]
Big thanks and shout out to Evan for all of his hard work. Obviously, the code that he generated is a bit different than what I proposed, but that's the great thing about this community; sharing unique solutions ourselves may not think come to. See Edit #2 for the final code which filters GPS collar data by the distance between consecutive points.
I have a gridded climate dataset, such as:
# generate time vector
time1 <- seq(14847.5,14974.5, by = 1)
time2 <- seq(14947.5,14974.5, by = 1)
time <- c(time1,time2)
time <- as.POSIXct(time*86400,origin='1970-01-01 00:00')
# generate lat and lon coordinates
lat <- seq(80,90, by = 1)
lon <- seq(20,30, by = 1)
# generate 3dimensional array
dat <- array(runif(length(lat)*length(lon)*length(time)),
dim = c(length(lon),length(lat),length(time)))
such that
> dim(dat)
[1] 11 11 156
the dimensions of the data are describing the variable at different longitude (dim = 1), latitude (dim = 2), and time (dim = 3).
The issue I have at the moment is that some of the times are repeated, something to do with overlapping sensors measuring the data. Therefore, I was wondering if it was possible to only keep the unique times for dat, but average the data within the grid for the duplicated times i.e. if there are two repeated days we take the average value in each latitude and longitude grid for that time.
I can find the unique times as:
# only select unique times
new_time <- unique(time)
unique_time <- unique(time)
The following code then aims to loop through each grid (lat/lon) and average all of the duplicated days.
# loop through lat/lon coordinates to generate new data
new_dat <- array(dim = c(length(lon),length(lat),length(new_time)))
for(i in 1:length(lon)){
for(ii in 1:length(lat)){
dat2 <- dat[i,ii,]
dat2b <- NA
for(k in 1:length(unique_time)){
idx <- time == unique_time[k]
dat2b[k] <- mean(dat2[idx], na.rm = TRUE)
}
new_dat[i,ii,] <- dat2b
}
}
I'm convinced that this provides the correct answer, but I'm certain there is a much cleaner method do achieve this.
I should also note that my data is quite large (i.e. k = 7000), so this last loop is not very efficient, to say the least.
My original answer:
This is a bit more concise and efficient by use of aggregate:
for(i in 1:length(lon)){
for(ii in 1:length(lat)){
new_dat[i,ii,] <- as.numeric(aggregate(dat[i,ii,], by=list(time),mean)$x)
}
}
It still has 2 out of the 3 of the loops, but it manages to bypass creating dat2, dat2b, and unique_time.
My improved answer:
f <- function(i, ii){as.numeric(aggregate(dat[i,ii,], by=list(time),mean)$x)}
for(i in 1:nrow(expand.grid(1:length(lon),1:length(lat)))){
new_dat[expand.grid(1:length(lon),1:length(lat))[i,1],
expand.grid(1:length(lon),1:length(lat))[i,2],] <-
f(expand.grid(1:length(lon),1:length(lat))[i,1],expand.grid(1:length(lon),
1:length(lat))[i,2])
}
Got it down to just 1 loop. We could probably bypass that loop too with an apply.
I have been working on a project for which I need to find peaks and valleys in a dataset (not just the highest numbers per column, but all of the peaks and valleys).
I did manage to get it to work on 1 column, but I use a for-loop for that and I need to do this for about 50 columns, so I think I should use an 'apply' function. I just don't know how to do so. Can I put 'if' statements and such in an 'apply' function?
Here is what I used for checking one column:
('First' is the name of the dataset and 'Seq1' is the first column)
Lowest = 0
Highest = 0
Summits = vector('numeric')
Valleys = vector('numeric')
for (i in 1:length(First$Seq1))
{
if (!is.na(First$Seq1[i+1]))
{
if (First$Seq1[i] < Lowest) {Lowest = First$Seq1[i]}
if (First$Seq1[i] > Highest) {Highest = First$Seq1[i]}
if (First$Seq1[i] > 0 && First$Seq1[i+1] < 0)
{ Summits <- append(Summits, Highest, after=length(Summits)) }
if (First$Seq1[i] < 0 && First$Seq1[i+1] > 0)
{ Valleys <- append(Valleys, Lowest, after=length(Summits)) }
}
}
Sure you can! I would first define a helper function that defines what is to be done with one specific column and then you call that function within apply:
HelperFun <- function(x) {
# your code from above, replacing 'Seq1' by x
}
apply(First, 2, HelperFun)
An *apply function is not better for this than a for loop, provided you don't grow an object in the for loop. You must never use append in a loop. Pre-allocate your results object and fill it.
This finds all local minima on a grid:
#an example
set.seed(42)
plane <- matrix(rnorm(100, sd = 5), 10)
#plot
library(raster)
plot(raster(plane))
#initialize a logical matrix
res <- matrix(TRUE, ncol = ncol(plane), nrow = nrow(plane))
#check for each subgrid of 2 times 2 cells which of the cells is the minimum
for (i in 1:(nrow(plane) - 1)) {
for (j in 1:(ncol(plane) - 1)) {
inds <- as.matrix(expand.grid(r = i + 0:1, c = j + 0:1))
#cell must be a minimum of all 4 subgrids it is part of
res[inds] <- res[inds] & plane[inds] == min(plane[inds])
}
}
print(res)
plane[res]
#[1] -13.282277 -8.906542 -8.585043 -12.071038 -3.919195 -14.965450 -5.215595 -5.498904 -5.971644 -2.380870 -7.296070
#highlight local minima
plot(rasterToPolygons(raster(res)), border = t(res), add = TRUE)
library(reshape2)
res1 <- melt(res)
res1 <- res1[res1$value,]
text(x = res1$Var2 /10 - 0.05,
y = 1-res1$Var1 /10 + 0.05,
labels = round(plane[res],1))
I've assumed here that diagonal neighbors are counted as neighbors and not only neighbors in the same column or row. But this would be trivial to change.
I know that this is not the solution you want --- you have one-dimensional time series, but here is a (more direct) variation on Roland's solution.
#example data
set.seed(42)
plane <- matrix(rnorm(100, sd = 5), 10)
library(raster)
r <- raster(plane)
f <- focal(r, matrix(1,3,3), min, pad=TRUE, na.rm=TRUE)
x <- r == f
mins <- mask(r, x, maskvalue=FALSE)
pts <- rasterToPoints(mins)
cells <- cellFromXY(x, pts)
r[cells]
plot(r)
text(mins, digits=1)
plot(rasterToPolygons(mins), add=TRUE)