Here I have a function which I want to equalize the number of stops from a object in motion. To carry out this, there is a position list (in which this function is named Trip), and duration which is the length of trip but will be used in further development of the code.
Now to know the number of the stop throughout the trip what I have to do is for each Trip, which has different positions as:
x,y,z
10,11,13
12,11,14
13,11,15,
....
**20,11,35
20,11,35
20,11,35**
Compare themselves to know which are equal.
On this last positions as the object remain on the same location we can conclude was stopped. So, in order to know the stop we need to compare each position with the next ones.
I write this code:
StopsNumber <- function(Trip,Duration)
{
i=1
aux = Trip
while(i<length(Trip))
{
if(aux[i] == aux[i+1] && aux[i] == aux[i+2]){
Stop = aux[i]
NStops = Nstops+1
}
aux = [aux+1]
i=i+1
} # end
return (Stop,Nstops)
}
MThe problem I think is that i do not know how to create list of things. For instance: on Stop = aux[i] I don't know if it is working out properly. Because i want to do Stop be a list (or a vector, with aux, (those positions where the object has been quiet).And doing this if there are more than one stops, the last one will replace the rest.
May somebody help me?
Thank you
Your definitions of movement, intervals and stops are unclear. Therefore the code is fairly long to avoid misunderstandings. Otherwise it could be boiled down one or two lines. First some clear cut definitions
An interval is some time between between to xyz-points
Movement has occured in an interval, if start and end point differ in space
A stop is an interval of no movement after an interval of movement
You can choose to assume the object was(or was not) in movement before the first interval. Thus a stop can happen already in first interval.
a tip: try out the loop-functions apply(), sapply(), lapply() and foreach() instead of the low-level for() and while().
the code
#your data added some more positions
mixed.vector = c(
10,11,13,
12,11,14,
13,11,15,
20,11,35,
20,11,35, #this is a stop
20,11,35,
13,11,25,
10,20,30,
10,20,30) #this is a stop
#convert data to appropiate data structure.
#I suggest a matrix. List of vector would also do
#some tricks to convert (multiple ways to do this)
#mixed vector to matrix
xyz.matrix = matrix(mixed.vector,ncol=3,byrow=TRUE)
print(xyz.matrix) #each row is a position, columns are x, y and z respectively.
#matrix to list of vectors (if this structure is preferred)
list_of_vectors = split(xyz.matrix,1:dim(xyz.matrix)[1])
print(list_of_vectors)
#list of vectors to matrix (if this is how your data initially is ordered)
xyz.matrix = do.call(rbind,list_of_vectors)
print(xyz.matrix) #and we're back with a matrix
#function checking if intervals have no movement
#(total number of intervals is number of positions minus 1)
find_interval_with_movement = function(XYZ.m) {
nrows = dim(XYZ.m)[1] #scalar of n.position rows
interval_with_movement = !apply(XYZ.m[-nrows,]==XYZ.m[-1,],1,all) #check pairs of row if x y z match.
return(interval_with_movement)
}
#function finding stops, optional assuming the object was moving before first interval
find_stops = function(interval_movements,object.moving.at.t0=TRUE) {
intervals_to_search= c(object.moving.at.t0,interval_movements)
len = length(intervals_to_search)
#search for intervals with no movement where previous interval has movement
did.stop = sapply(2:len,function(i) all(intervals_to_search[(i-1):i] == c(T,F)))
return(did.stop)
}
#these intervals has no movement
print(!find_interval_with_movement(xyz.matrix))
#these intervals had no movement, where previous had
print(find_stops(find_interval_with_movement(xyz.matrix)))
#and the full number of stops
print(sum(find_stops(find_interval_with_movement(xyz.matrix))))
Related
I struggle with adapting the example of the function bigglm.data.frame within package biglm to a case where chunksize is not constant but chunks are identified by a factor, say "GROUP" in the input dataframe i.e. say "DF" (around 20 million rows in my case). My problem is not storing the data but understanding how to feed it in gradually to bigglm. I have made splitted version of DF along the variable GROUP, i.a list of data frames, call it DATALIST.
I understand the function, more exactly its subfunction datafun must return the next chunk data. So in my case I want it to go to the next i in DATALIST[[i]]. I can equally usethe original data frame, i.e subsetting with DF$GROUP==i. My question is how I adapt the example funtion from the package to do this.
From the package (https://github.com/cran/biglm/blob/master/R/bigglm.R) the function is
function (formula, data, ..., chunksize = 5000)
{
n <- nrow(data)
cursor <- 0
datafun <- function(reset = FALSE) {
if (reset) {
cursor <<- 0
return(NULL)
}
if (cursor >= n)
return(NULL)
start <- cursor + 1
cursor <<- cursor + min(chunksize, n - cursor)
data[start:cursor, ]
}
rval <- bigglm(formula = formula, data = datafun, ...)
rval$call <- sys.call()
rval$call[[1]] <- as.name(.Generic)
rval
}
I am no good programmer obviously, rather a simple user with a loop mindset, so I had expected bigglm would have an index that I could match to i, but there is none. I see n refers to rows and start from zero then increases by adding chunksize. I know n from my dataframe. And I can also have cursor from the length of each chunk (length(DATALIST[[i]])), but I need first to identify the chunk itself and that is where I am stuck.
Meanwhile I know I can just fit a glm to each chunk separately but that is a more traditional way and would love to have the big model fitted. One could also suggest I go for equal chunksize but I prepared chunks exactly to make sure I never have only zeros or ones (it is a logit model) once I have controlled for combined fixed effects.
Thanks for any help!
I am trying to get the half life of a process by finding the time corresponding the half the maximum value of the y-variable and apply it across different cases. I have tried two variations of the which() in R but non of them give me the result I want.
#rc and time are columns of a data.frame
time[which.max(rc)] # gives the time at rc-max, but i need the time at half rc-max
time[which(rc==max(rc)/2] #returns numeric(0)
what can I do to get this value so that I can apply to other cases?
You could do something like this...
time <- 1:10 #sample data
rc <- exp(-(1:10))
uniroot( #finds roots of functions
approxfun(time, rc - max(rc) / 2), #linear interpolation function
range(time) #range of values to check
)$root #value of time where rc=max(rc)/2
[1] 1.790988
See the help pages for these functions for further details and options
Trying to figure out how to loop through a vector and eliminate components containing a particular pattern above a predetermined limit. For example, in the following vector, I might want to keep just the first two instances of both the "a_a_" and "b_b_" components.
x <- c("a_a_a", "a_a_b", "a_a_c", "a_a_d", "b_b_a", "b_b_b", "b_b_c", "b_b_d")
The resulting vector, after the loop deleting extraneous components, would be like this:
x = "a_a_a", "a_a_b", "b_b_a", "b_b_b"
The tricky part is that the code must first detect what is contained in the pattern, then loop through the (extremely long) vector to find all matching patterns, and establish a means of counting instances so that once it hits that given level, it then eliminates all matching components thereafter.
Any help is greatly appreciated.
You can use grep to identify which elements have the patterns and keep only the first two.
patterns = c("a_a", "b_b")
keep = NULL
for(p in patterns) { keep = c(keep, grep(p,x)[1:2]) }
x = x[keep]
x
[1] "a_a_a" "a_a_b" "b_b_a" "b_b_b"
I would need some help to understand this type of code and the action that happens here. For instance, we take a vector x defined by the integer (8,6,5,4,2,1,9).
The first step of this function would be to check if the condition is given, that the length of this vector is higher than 1. For x, the condition is given.
The next step is to highlight the position of the smallest value in this vector, this is 6. But I dont understand what actually happens in the next steps and why it has to combine it as a vector?
selsort <- function(x) {
if(length(x) > 1) {
mini <- which.min(x)
c(x[mini], selsort(x[-mini])) #selsort() somewhere in here -> recursion
} else x
}
In recursion, there are 2 key cases:
Base case - input produces a result directly
Recursive case - input causes the program to call itself again
In your function, the base case is when the length of x is not greater than 1. When this happens, we just return x. When we reach the base case, we will not be running the function any more times, all it will do is back track through all of the previous recursive cases to finish executing those selsort() calls.
The recursive case is when the length is greater than 1. For this, we combine the smallest value in our vector with the result of selsort() without that smallest value. This will continue until we reach the base case. So, we find the smallest value, remove it from the list, and then repeat with all of the values from the previous run except the one we selected. Once we reach the base case of there only being 1 element left (the largest one), we have no more minimum finding to do, so we just return the last element.
This is called selection sort, because we are specifically selecting 1 element each time (the smallest element). With large data, this is inefficient, but it is a natural way to think about sorting.
There are more efficient sorting algorithms. One nice one that is easy to understand is merge sort: Merge Sort in R
It puts the smallest number at the first position of the vector, removes this entry from the vector and recursively repeats this until the entire vector entries are sorted from smallest to largest number.
Example
In the first step
x <- x1 <- c(8,6,5,4,2,1,9)
the position of the smallest number in the vector is identified by selsort() with the which.min() function. This number is put at the first position. At the same time, this element is removed from the vector. Therefore in the next step one has
x2 <- c(8,6,5,4,2,9)
c(1,selsort(x2))
Now the algorithm searches for the smallest number in x2, which is 2, puts that one on the front and removes it from the vector, leading to:
x3 <- c(8,6,5,4,9)
c(1,c(2,selsort(x3)))
This is repeated until the length of the vector is equal to one. Then there is nothing left to sort and the last number is returned, which is the largest element of the initial vector.
The assignments x1, x2, x3... are mentioned here only to illustrate the sequence of operation of the code. This is done implicitly in the recursive function which uses only one vector x and reduces it by one entry at each iteration.
Hope this helps.
Say I have a vector defined a= rep(NA, 10);
I want to give its ith element a value for each iteration.
for(i in 1:10){
indexUsed[i] = largestGradient(X, y, indexUsed[is.na(indexUsed)], score)
}
as you see, I want use index[1:(i-1)] to calculate ith element, but for the first element, I want a NULL or whatever, special value there to let my function knows that it is empty (then it will handles this in the case for assigning value to the first element which is different from the next steps).
I do not know my writing is a good way to do that, usually how you do?
I don't have a better way of doing this than with a for loop, but would love to see other people's responses. However, it does seem to me that your code should read
indexUsed[i] <- largestGradient(X, y, indexUsed[!is.na(indexUsed)], score)
For i=1, your indexUsed[!is.na(indexUsed)] will be empty, and should be your based case in your function. For every other iteration, it will retrieve elements 1 through i-1.