Comparing times within two vectores and finding nearest for each element in R - r

I have a problem going out of basic programming towards more sophisticated. Could you help me to adjust this code?
There are two vectors with dates and times, one is when activities happens, and another one - when triggers appear. The aim is to find nearest activities date/time to each of triggers, after each trigger happen. Final result is average of all differences.
I have this code. It works. But it's very slow when working with large dataset.
time_activities<- as.POSIXct(c("2008-09-14 22:15:14","2008-09-15 09:05:14","2008-09-16 14:05:14","2008-09-17 12:05:14"), , "%Y-%m-%d %H:%M:%S")
time_triggers<- as.POSIXct(c("2008-09-15 06:05:14","2008-09-17 12:05:13"), , "%Y-%m-%d %H:%M:%S")
for (j in 1:length(time_triggers))
{
for(i in 1:length(time_activities))
{
if(time_triggers[j]<time_activities[i])
{
result[j] = ceiling(difftime(time_activities[i], time_triggers[j], units = 'mins'))
break
}
}
}
print(mean(as.numeric(result)))
Can I somehow get rid of the loop, and do everything with vectors? Maybe you can give me some hint of which function I could use to compare dates at once?

delay=sapply(time_triggers,function(x) max(subset(difftime(x,time_activities,units='mins'),difftime(x,time_activities,units='mins')<0)))
mean(delay[is.finite(delay)])
This should do the trick. As always, the apply family of functions is a good replacement for a for loop.
This gives the average number of minutes that an activity occurred after a trigger.
If you want to see what the activity delay was after each trigger (rather than just the mean of all the triggers), you can just remove the mean() at the beginning. The values will then correspond to each value in time_triggers.
UPDATE:
I updated the code to ignore Inf values as requested. Sadly, this means the code should be 2 lines rather than 1. If you really want, you can make this all one line, but then you will be doing the majority of the computation twice (not very efficient).

Related

Finding duration in seconds between time stamps in R?

I am trying to find the duration in seconds between adjacent time stamps. The first issue I am having is that I am not sure if it is treating each timestamp as a number, I tried using
TimeStamp <- read.csv("DateStamps.csv", header = TRUE, colClasses = "character")
However, I am not sure if it is working, because, on empty spaces, where there should be an NA, there is nothing.
For the differences, I want to find 4 durations in seconds, (gather order - start, walk to car - gather order, handoff - walk to car, return to store - handoff) all of this through adjacent columns. However, I am not sure how to do this, or how to write a piece of code that would recognize the specific differences I want to calculate.
I think the NA issue is answered here:
Change the Blank Cells to "NA"
You need to stipulate to R what constitutes NA

Unrecognized index variable [i] in R for-loop

I scripted a simple for-loop to iterate over each row of a data set to calculate the distance between two coordinates. The code uses the 'geosphere' package and the 'distm' function which takes two sets of coordinates and returns the distance in meters (which I convert to miles by multiplying by 0.00062137).
Here is my loop:
##For loop to find distance in miles for each coordinate pair
miles <- 0
for (i in i:3303) {
miles[i] <- distm(x = c(clean.zips[i,4], clean.zips[i,3]), y = c(clean.zips[i,7], clean.zips[i,6]))[,1] * 0.00062137
}
However, when I run it I receive an error:
Error: object 'i' not found
The thing is, I've run this code before and it worked. Other times, I get this error. I'm not changing any code, it just seems to randomly work only some of the times. I feel the loop must be constructed correctly if it does what I want on occasion, but why would it only work sometimes?
OK, I'm not certain what justifies the down votes on this, but guess I apologize to whomever thought that necessary.
The issue seems to have just been starting the indexing with an actual numeric value like Zheyuan suggested (i.e. using '1:3303' rather than 'i:3303'). I feel like I've created loops before using 'i in i:xxx' without first defining 'i' but maybe not. Anyway, it's solved and thank you!

HW assignment for learning R from scratch

So I am taking a course that requires learning R and I am struggling with one of the questions:
In this question, you will practice calling one function from within another function. We will estimate the probability of rolling two sixes by simulating dice throws. (The correct probability to four decimal places is 0.0278, or 1 in 36).
(1) Create a function roll.dice() that takes a number ndice and returns the result of rolling ndice number of dice. These are six-sided dice that can return numbers between 1 and 6. For example roll.dice(ndice=2) might return 4 6. Use the sample() function, paying attention to the replace option.
(2) Now create a function prob.sixes() with parameter nsamples, that first sets j equal to 0, and then calls roll.dice() multiple times (nsample number of times). Every time that roll.dice() returns two sixes, add one to j. Then return the probability of throwing two sixes, which is j divided by nsamples.
I am fine with part one, or at least I think so, so this is what I have
roll.dice<-function(ndice)
{
roll<-sample(1:6,ndice,TRUE)
return(roll)
}
roll.dice(ndice=2)
but I am struggling with part two. This is what I have so far:
prob.sixes<-function(nsamples) {
j<-vector
j<-0
roll.dice(nsamples)
if (roll.dice==6) {
j<-j+1
return(j)
}
}
prob.sixes(nsamples=3)
Sorry for all the text, but can anybody help me?
Your code has a couple of problems that I can see. The first one is the interpretation of the question. The question says:
Now create a function prob.sixes() with parameter nsamples, that first sets j equal to 0, and then calls roll.dice() multiple times (nsample number of times).
Check on your code, are you doing this? Or are you calling roll.dice() a single time? Look for ways to do the same thing (in your case, roll.dice) several times; you may consider the function for. Also, here, you need to store the result of this function on a variable, something like
rolled = roll.dice(2)
Second problem:
Every time that roll.dice() returns two sixes, add one to j.
You are checking if roll.dice==6. But this has two problems. First, roll.dice is a function, not a variable. So it will never be equal to 6. Also, you don't want to check if this variable is equal to six. You should ask whether this variable is equal to a pair of sixes. How can you write "a pair of sixes"?

Stopping computation in R; will I lose results up to that point?

I am running some matrix algebra on a large data set. Each iteration of the outer most loop populates one row of two different vectors that are allocated to 64,797 rows. I am printing a counter to screen for the outer loop to check progress. This might not be ideal. R is still working, according to task manager and using a good bit of memory and processor. However, the R console is not responding and I can only read at the end that I am at least to row 31,000ish (there is scroll space, but I cannot scroll down to see the last number printed). I do not know if the program is "hung" (no longer iterating outer loop) and I am wasting my time waiting, or if I should stick it out. The machine has been running for a few days. Given the program's structure, I can END the process and restart from the last row populated. However, if I end the process, will I lose the previously assigned data in my vector I am populating? That would be bad, as I'd have to start all over. Here is the code below. The end goal are the vectors called: save.trace and save.trace2.
for (i in 1:nrow(coor.cal)){
print(i)
for (j in 1:nrow(coor.cal)){
dist<-( (coor.cal[i,1]-coor.cal[j,1])^2 + (coor.cal[i,2]-coor.cal[j,2])^2)^.5
#finding distances between observations
w[j]<-exp(-0.5*((dist/bw)^2))#computing weight matrix for observation i
if (dist>bw){w[j]<-0}
}
for (k in 1:27){
xv<-xmat[ ,k]
xtw[k, ]<-xv*w
}
xtwx<-xtw%*%xmat
xtwx.inv<-ginv(xtwx)
xtwx.inv.xtw<-xtwx.inv%*%xtw
xrow<-xmat[i, ]
temp<-xrow%*%xtwx.inv.xtw
save.trace[i]<-temp[i]
save.trace2[i]<-sum(temp*temp)
}
Here's a better example.
saved <- 0
for(i in 1:100)
{
saved <- i
Sys.sleep(0.1)
}
Run this code, and press escape sometime in the next 10 seconds (before the loop completes).
Take a look at the value of saved. It should be more than 0, indicating that your progress has been stored.
I did not have the memory to risk an experiment to answer my question. I just borrowed another machine, tried it, and indeed you CAN end a process and still retain previously stored information. I had not run into this problem before. I attempted to delete my question, but could not. I'll leave this in case it helps someone else.

Limiting Window Size and/or Removing Specific Rows of Time Values In R

I'm trying to figure out how to observe just one particular section of the data in the graph below (e.g. 5pm onwards). I know there are basically two methods of doing this:
1) Method 1: Limiting the window size, which requires the following function:
< symbols(Data$Times, Data$y, circles=Data$z, xlim=c("5:00pm","10:00pm"))
The problem is, I get an "invalid 'xlim' value" error when I try to input the two time endpoints.
2) Method 2: Clearing out the rows in Data$Times that have values over 5pm.
The problem here is that I'm not sure how to sort the rows by earliest time -> latest time OR how to define a new variable such that TimesPM <- Data$Times>"5pm" (what I typed just now obviously did not work.)
Any ideas? Thanks in advance.
ETA: This is what I plotted:
Times<-strptime(DATA$Time,format="%I:%M%p")
symbols(Times, y, circles=z, xaxt='n', inches=.4, fg="3", bg=(a), xlab="Times", ylab="y")
axis.POSIXct(1, at=Times, format="%I:%M%p")
Both approaches have the problem that in all likelihood your datetime format will not equal the values expressed just as a character vector like "5:00pm" even after coercion with the ">" comparison operator. To get the best advice you need to present str(DATA$Times) or dput(head(DATA$Times)) or class(Data$Times) . Generally plotting functions recognize either valid date or datetime classes or their numeric representation. If the ordering operation is not working, then it raises the question whether you have a proper class. But you appear to have an axis labeling that suggests a date-time format of some sort, and that we just need to figure out what class it really is.
Because you are creating a character vector from you Time column, you probably want to apply the restriction before you send the DATA$Time vector to strptime(). You still have not offered the requested clarifications, so I have no way to give tested or even very specific code, but you might be doing something like
Times<-strptime(DATA$Time[ as.POSIXlt(DATA$Time)$hour >= 17 &
as.POSIXlt(DATA$Time)$hour <= 22 ] ,
format="%I:%M%p")

Resources