my dataset looks like this (it shows how much time each person takes in going from one place to another, using different cars, taking two stops during the road).
my dataset
I have some NAs on columns "stop time 1" and "stop time 2" and I would like to impute them. Those NAs are not there because people decided to take only 1 or zero stops, but simply because they forgot to record it.
Can you please suggest me how to do this in R? I'm not a data scientist and I'm really new to this. I need to replace those NAs with time stamps (including only hour and minute, e.g. 08:34)
Thanks!!
P.s.: will the algorithm be able to impute on those two columns by taking into account that, since they represent stops between departure and arrival, they have to be comprehended between departure and arrival?
Related
I am working with some observation data and have run into a bit of an issue beyond my current capabilities. I surveyed different polygons (the column "PolygonID" in the screenshot) for lizards two times during a survey season. I want to determine the total search effort (shown in the column "Effort") for each individual polygon within each survey round. Problem is, the software I was using to collect the data sometimes creates unnecessary repeats for polygons within a survey round. There is an example of this in the screenshot for the rows with PolygonID P3.
Most of the time it does not affect the effort calculations because the start and end time for the rows (the fields used to calculate effort) are the same, and I know how to filter the dataset so it only shows one line per polygon per survey, but I have reason to be concerned there might be some lines where the software glitched and assigned incorrect start and end times for one of the repeat lines. Is there a way I can test whether start and end time match for any such repeats with R, rather than manually going through all the data?
Thank you!
I'm quite new to time series and I am wondering what is the best way to identify the starting date of a period with low values of a variable. So in this example I would in a first step want to i) identify whether there is such a period of let's say at least 5 values that are similarly low and ii) what the starting date of this period is.
So in this example (https://i.stack.imgur.com/IxLQg.png) for the first 3 individuals (c15793, c15798 and c3556) I want to figure out that there is such a period and that the starting date is on the 20th of May for c15798 and on the 22nd of May for the other two. But c5157 should be identified as not having such a period.
I have no clue on how I could identify such a period and I was hoping someone would have an idea and point me to a method or a point where I could start. Everything I can think of would require some sort of threshold (e.g. the difference between consecutive measurements) which I don't know how to choose. So if anyone has a more elegant idea or a good idea on how to set a threshold, I would be more than happy to learn about it.
Thanks so much in advance!
enter image description here
This is my first time ever asking a question on Stack Overflow and I'm a programming novice so any advice as to how to improve my question asking abilities would be appreciated.
Onto my question: I have two csv files, one containing three columns (date time in dd/mm/yyyy hh:(00 or 30) format, production of a certain product, and demand for said product), and the other containing several columns (decomposition of the date time into year, month, day, hour, and whether it is :00 or :30 represented by 1 or 2 respectively, alongside several columns for independent variables which may affect production/demand of said product).
I've only played around with the first csv file, converting the string into a datetime object but the ts() function won't recognise the datetime objects as my times. I've tried adjusting the frequency parameter but ultimately failed and have no idea how to create a time series using half hourly data. Would appreciate any help.
Thanks in advance!
My suggestion is to apply the "difftime" over all your time data. For instance, like following code, you can use your initial time (the time of first record) for all comparisons as time_start and the others as time_finish. Then it return the time intervals as number of seconds and then you are ready to use other column values as the value of the time stamps.
interval=as.integer(difftime(strptime(time_finish,"%H:%M"),strptime(time_start,"%H:%M"),units = "sec"))
Second 0 10 15 ....
I currently have a dataframe which includes a running timeline of POSIXct (see below).
Basically given a starting time of my choosing I want to be able to take rows at specific time interval. E.g. Say I start taking rows at four pm I then want to take 9 minutes, then not take anything for the next two, then nine again. I'm guessing the best approach is possibly using indexing but I also thought something like the lubridae package could be used but not sure how to exactly do it.
Thanks!
I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?