Extracting dates from columns and sort them - r

Dear colleagues I have the following dataset:
Time1 Signal1 Time2 Signal2 Time3 Signal 3
2018-05-06 17:41:44 Value 1 2018-05-06 17:32:39 Value 1 2018-05-07 00:06:00 .....
Time X columns are in POSIXct format, Because the time of the signals is different I am trying to make a custom resampling and I am trying to extract the timestamp of each signal.
I need to storage the time of each signal, putting this values in one vector and short this vector in ascending order.
I have try to:
NewTime<-sort(dataset[,c(1,3,5)])
Error: Can't use matrix or array for column indexing
Also with:
NewTime<-sort(unlist(Time_Trend[, c(1,3,5)]))
But with the last time I loose the date format, is there any way of doing this procedure without loosing the POSIXct format apart that having the vector in messy format.
Finally I have tried with this:
NewTime<-cbind(data$X1,data$X3, data$X5)
actualTime<-as.POSIXct(actualTime, origin="2018-05-06 07:50:32") #lowest value
But it returns me a vector with year date 2066. Anyone that has done this before?

If we want to order based on multiple columns
dataset[do.call(order, dataset[,c(1,3,5)]),]
If we are looking for creating a vector of datetime variables and then do the sort
sort(do.call(`c`, dataset[c(1, 3, 5)]))

Related

Comparing dates in a dataframe and appending info based on comparison result in R

so I am lost with the following problem:
I have a dataframe, in which one column contains (STARTED) the starting time of a survey, and several others information of the survey schedule of that survey participant (D5 to D10: only the planned survey dates, D17 to D50: planned send-out times of measurement per day). I'd like to create to columns that indicate now which survey day (1-6) and which measurement per day (1-6) this survey corresponds to.
First problem is the format (!)...
STARTED has the format %Y-%m-%d %H:%M:%S, D5 to D10 %d.%m.%Y and D17 to D50 %d.%m.%Y %H:%M.
I tried dmy_hms() from lubridate, parse_date_time(), and simply as.POSIXct(), but I always fail to get STARTED and the D17 to D50 section into a comparable format. Any solutions on this one?
After just separating STARTED into date & time columns, I was able to compare using ifelse() with D5 to D10 and to create the column of day running from 1 to 6.
This might be already more elegant with something like which(), but I was not able to create a vectorized version of this, as which(<<D5:D10>> == STARTED) would need to compare that per row. Does anyone have a solution for this?
And lastly, how on earth can I set up the second column indicating the measurement time? The first and last survey of the is easy, as there are also uniquely labelled, but for the other four ones I would need to compare per day whether the starting time is before the planned survey time of the following survey. I could imagine just checking whether STARTED falls in between two planned survey times just next to each other - as a POSIXct object that might work, if I can parse the different formats.
Help is greatly appreciated, thanks!
A screenshot from the beginning of the data:
Screenshot from R data using View()
For these first few rows, the intended variable day would need to be c(1,2,1,1,1,2,2) and measurement c(3,2,4,2,1,2,3).
Your other columns are not formatted with %d.%m.%Y, instead either %d.%m.%t (date only) or %d.%m.%y %H:%M. Note the change from %Y to %y.
Try:
as.Date("20.05.22", format = "%d.%m.%y")
# [1] "2022-05-20"
as.POSIXct("20.05.22 06:00", format = "%d.%m.%y %H:%M")
# [1] "2022-05-20 06:00:00 EDT"

Formatting 24-hour time variable to capture observations in different ranges

I currently have a data frame with a column for Start.Time (imported from a *.csv file), and the format is in 24 hour format (e.g., 20:00:00 equals 8pm). My goal is to capture observations with a start time in various intervals (e.g., between 9:00:00 and 10:00:00), which also meet other criteria. However, it seems that R sorts this 'character' variable in a way that does not align with how our day goes (e.g., 14:00:00 is considered a lower value than 9:00:00).
For example, below is a line of code that works as intended, where I am capturing observations on two different trail segments, which had a start time between 8:00:00 and 9:00:00.
RLLtoMist8.9<-sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) &
(dataset1$Start.Time>="8:00" & dataset1$Start.Time < "9:00"),
na.rm=TRUE)
RLLtoMist8.9
But, this code below does not work as intended, as R is 'valuing' 9:00:00 as greater than 10:00:00.
RLLtoMist9.10 <-
sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) &
(dataset1$Start.Time>="9:00:00 AM" & dataset1$Start.Time < "10:00:00 AM"),
na.rm=TRUE)
It's certainly true that character types are sorted so that "14:00" is less than "9:00". However R has a datetime class which would sort times correctly once a character representation has been parsed.
a <- as.POSIXct("14:00", format="%H:%M")
b <- as.POSIXct("8:00", format="%H:%M")
# test
> a < b
[1] FALSE
You would be able to convert an entire column with:
dataset1$Start.Time <- as.POSIXct(dataset1$Start.Time, format="%H:%M")
The dates of a and b were the system date at the time of conversion, so if you printed them you would see dates and times in the default format. There are packages, such as chron, that let you use just times, but POSIXt objects have dates and times necessarily. See ?DateTimeClasses. The lubridate package also has an 'interval' class and there exist a difftime function in base-R.
There's also seq.POSIXt and cut.POSIXt functions, either of which could be used to create multiple time or date boundaries for categorical transformations of datetimes.
Using the data.table library:
# convert to data table
dataset1<-data.table(dataset1)
# format to a date format rather that character
dataset1[, Start.Time := as.POSIXct(Start.Time, format="%H:%M:%S")]
#now do your filtering
dataset1[between(Start.Time, as.POSIXct("09:00:00", format="%H:%M:%S"), as.POSIXct("10:00:00", format="%H:%M:%S")) & (Trail.Segment==52 | Trail.Segment==55)]

Copy Timestamp of xts object to another Matrix in R

I am having a bit of difficulty properly extracting timestamps from an xts object and putting them into another matrix. Basically, I have an xts object with a timestamp column in a YYYY-MM-DD HH:MM:SS.SSS format, and I want to extract specific times (in order) and put them into a column in another matrix in the exact same format as they are in the xts object. For example, let's say I have an xts object with a timestamp column given as:
For example, let's say the timestamp column for a matrix called mat is given as follows:
2000-01-01 09:05:02.333
2000-01-01 09:06:03.212
2000-01-01 09:06:04.764
2000-01-01 09:07:02.211
Now let's say I want to take the 2nd and 4th times and put them into another matrix (which I'll call mat2), then ideally it should come out like this:
Time
---------------------------
2000-01-01 09:06:03.212
2000-01-01 09:07:02.211
Now, I know that by using the index() function on an xts object you can get the timestamp for that object at a particular index value. However, when I try to do this by writing (for example) mat2[i,"Time"] <- index(mat[i]), then rather than putting the date/time value from mat into mat2 it instead puts a number into the matrix, not a time, and I'm not sure why that happens. Is there a way to copy the timestamp of an xts object and put it into take two different time values from one matrix and put them into two separate columns in a different matrix?

Linking characters from one data.frame to other datasets

I have a data.frame with two columns. The first column contains various specific times during a day. The second column contains the animal behavior (behavior period) that I observed at each specific time:
Time; Behavior
10:20; feeding
10:25; feeding
10:30; resting
...
For each of those behavior periods I have an additional dataset (TimeSeries) which contains data about the actual animal movement (output from a movement sensor). Each TimeSeries has about 100 rows:
Time; Var1; Var2
10:20:01; 1345; 5232
10:20:02; 1423; 5271
...
Now I would like to link each TimeSeries with the behavior from the first dataset. So, that R knows that "feeding" is related to the TimeSeries of 10:20 and 10:25 and that "resting" is related to the TimeSeries of 10:30 and so on.
Afterwards I want to use this "knowledge" to calculate mean and sd from each TimeSeries. So I will have all the means and sd's from all TimeSeries for each behavior.
It is not clear whether your times are currently characters, factors, POSIXct, variables, etc. So you should first convert them (possibly in a new column) to a numeric variable, something like the number of seconds since midnight. Functions like strptime, difftime, and as.numeric may help.
Add a column to the first data frame that is just 1:nrow(firstdf). Then add a column to the second dataframe that is computed by the findInterval function:
seconddf$newcol <- findInterval( seconddf$seconds, firstdf$seconds )
Now you can merge the 2 data frames on the new columns and the finer grained times will be associated with the activity from the most recent time.

Converting hour, minutes columns in dataframe to time format

I have a dataframe containing 3 columns including minutes and hours.
I want to convert these columns (namely minutes and column) to time format. Given the data in drame:
Score Hour Min
10 10 56
23 17 01
I would like to get:
Score Time
10 10:56:00
23 17:01:00
You could use ISOdatetime to convert the numbers in the hour and min to a POSIXct object. However, a POSIXct object is only defined when it also includes a year, month and day. So depending on your needs to can do several things:
If you need a real time object which is correctly printed in graphs for example and can be used in arithmetic (addition, subtraction), you need to use ISOdatetime. ISOdatetime returns a so called POSIXct object, which is an R object which represents time. Then in ISOdatetime you just use fixed values for year, month, and day. This ofcourse only works if your dataset does not span multiple years.
If you just need a character column Time, you can convert the POSIXct output to string using strftime. By setting the format argument to "%H:%M:00". In this case however, you could also use sprintf to create the new character column without converting to POSIXct: sprintf("%s:%s:00", drame$Hour, drame$Min).
You can use paste() function to merge the two column data into a char and then use strptime() to convert to timestamp
x<-1:6
##1 2 3 4 5 6
y<-8:13
## 8 9 10 11 12 13
timestamp <- paste(x,":",y,":00",sep="")
timestamp
will result in
#1:8:00 2:9:00 3:10:00 4:11:00 5:12:00 6:13:00
If you prefer to convert this to timestamp object try using
strptime(mergedData,"%H:%M:%S")
## uses current date by default
if you happen to have Date in another column use paste() to make a char formattted date and use below to get date time
##strptime(mergedData,"%d/%m/%Y %H:%M:%S")

Resources