Random incremental timestamp in R - r

I want to generate a series of 500 TimeStamp starting Jan 1st, 2016 in such a way that the increment of time stamp should look like something as below.
Sample:
TimeStamp
2016-01-01 00:00:01
2016-01-01 00:00:12
2016-01-01 00:00:15
2016-01-01 00:01:23
2016-01-01 00:02:31
2016-01-01 00:02:38
2016-01-01 00:03:48
2016-01-01 00:03:55
.....
What I am doing as of now is:
SampleData <- as.data.frame(list(Var1=1:500, Var2=rnorm(1, 500, 500)))
rDate <- function(sDate, eDate, SampleData){
lenDate <- dim(sampledata)[1]
seqDays <- seq.Date(as.Date(sDate), as.Date(eDate), by="day")
aDay <- runif(lenDate, 1, length(seqDays))
Date <- seqDays[aDay]
}
SampleData$TimeStamp <- rDate("2016-01-01", "2016-12-31", SampleData)
SampleData <- SampleData[order(SampleData$TimeStamp), ]
row.names(SampleData) <- NULL
head(SampleData)
But this will produce the following result:
Var1 Var2 TimeStamp
1 200 1020.469 2016-01-01
2 100 1020.469 2016-01-02
3 344 1020.469 2016-01-02
4 447 1020.469 2016-01-04
5 453 1020.469 2016-01-05
6 478 1020.469 2016-01-05
Which is not what I wanted.
Could someone please help?

Just change seq.Date into seq.POSIXt and as.Date into as.POSIXct
SampleData <- as.data.frame(list(Var1=1:500, Var2=rnorm(1, 500, 500)))
rDate <- function(sDate, eDate, SampleData){
lenDate <- dim(SampleData)[1]
seqDays <- seq.POSIXt(as.POSIXct(sDate), as.POSIXct(eDate), by="secs")
aDay <- runif(lenDate, 1, length(seqDays))
Date <- seqDays[aDay]
}
SampleData$TimeStamp <- rDate("2016-01-01", "2016-12-31", SampleData)
SampleData <- SampleData[order(SampleData$TimeStamp), ]
row.names(SampleData) <- NULL
head(SampleData)
Var1 Var2 TimeStamp
1 29 660.4593 2016-01-01 13:25:31
2 213 660.4593 2016-01-02 07:17:10
3 115 660.4593 2016-01-05 01:07:48
4 358 660.4593 2016-01-05 06:24:41
5 276 660.4593 2016-01-06 10:02:18
6 49 660.4593 2016-01-06 21:56:25

Here I got something for you...
RandomTimeStamp <- function(M, sDate="2016/01/01", eDate="2016/12/31") {
sDate <- as.POSIXct(as.Date(sDate))
eDate <- as.POSIXct(as.Date(eDate))
dTime <- as.numeric(difftime(eDate, sDate, unit="sec"))
sTimeStamp <- sort(runif(M, 0, dTime))
TimeStamp <- sDate + sTimeStamp
}
print(RandomTimeStamp(500))
This produces the result as:
[1] "2012-01-01 18:26:53 IST" "2012-01-02 11:35:47 IST" "2012-01-02 15:02:23 IST" "2012-01-02 19:19:25 IST"
[5] "2012-01-03 04:48:13 IST" "2012-01-03 21:05:42 IST" "2012-01-03 21:16:06 IST" "2012-01-04 21:05:08 IST"
[9] "2012-01-05 05:47:13 IST" "2012-01-05 06:27:44 IST" "2012-01-05 06:40:42 IST" "2012-01-05 21:56:45 IST"
[13] "2012-01-06 22:36:40 IST" "2012-01-07 03:48:37 IST" "2012-01-07 12:55:25 IST" "2012-01-07 20:52:19 IST" .........
You might want to tweak around the code... :)

Maybe something like,
as.POSIXct("2016-01-01 00:00:00") + sort(sample(1:1000, 500))
We can check this for 5 samples
as.POSIXct("2016-01-01 00:00:00") + sort(sample(1:1000, 5))
#[1] "2016-01-01 00:01:53 IST" "2016-01-01 00:02:06 IST" "2016-01-01 00:03:19 IST"
#[4] "2016-01-01 00:07:31 IST" "2016-01-01 00:12:26 IST"
This will add randomly 1 to 1000 seconds in an incremental fashion in 1st of Jan 2016. To further increase the range we can increase the sequence from 1:1000 to any number you wish.
Another solution which takes advantage of entire range is
startTime <- as.POSIXct("2016-01-01")
endTime <- as.POSIXct("2016-12-31")
sample(seq(startTime, endTime, 1), 500)
Here we generate sequence for every second from our startTime to our endTime and then take random 500 values from it. Although, this is complete but this would become slow as the difference between startTime and endTime increases.

Related

Looping through a data frame of datetimes

I'm trying to create GPS schedules for satellite transmitters that are used to track the migration of a bird species I'm studying. The function below called 'sched_gps_fixes' takes a vector of datetimes and writes them to a .ASF file, which is uploaded to the satellite transmitter. This tells the transmitter what date and time to take a GPS fix. Using R and the sched_gps_fixes function allows me to quickly create a GPS schedule that starts on any day of the year. The software that comes with the transmitters does this as well, but I would have to painstakingly select each time and date I want the transmitter to take a GPS location.
So I want to: 1) create a data frame that contains every day of the year in 2018, and the time I want the transmitter to collect a GPS location, 2) use each row of the data frame as the start date for a sequence of datetimes (so starting on 2018-03-25 12:00:00 for example, I want to create a GPS schedule that takes a GPS point every other day after that, so 2018-03-25 12:00:00, 2018-03-27 12:00:00, etc.), and 3) create a .ASF file for each GPS schedule. Here's a simplified version of what I'm trying to accomplish below:
library(lubridate)
# set the beginning time
start_date <- ymd_hms('2018-01-01 12:00:00')
# create a sequence of datetimes starting January 1
days_df <- seq(ymd_hms(start_date), ymd_hms(start_date+days(10)), by='1 days')
tz(days_df) <- "America/Chicago"
days_df <- as.data.frame(days_df)
days_df
# to reproduce the example
days_df <- structure(list(days_df = structure(c(1514829600, 1514916000,
1515002400, 1515088800, 1515175200, 1515261600, 1515348000, 1515434400,
1515520800, 1515607200, 1515693600), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago")), .Names = "days_df", row.names = c(NA,
-11L), class = "data.frame")
# the data frame looks like this:
days_df
1 2018-01-01 12:00:00
2 2018-01-02 12:00:00
3 2018-01-03 12:00:00
4 2018-01-04 12:00:00
5 2018-01-05 12:00:00
6 2018-01-06 12:00:00
7 2018-01-07 12:00:00
8 2018-01-08 12:00:00
9 2018-01-09 12:00:00
10 2018-01-10 12:00:00
11 2018-01-11 12:00:00
I would like to loop through each datetime in the data frame, and create a vector for each row of the data frame. So each vector would have a particular row's datetime as the starting date for a GPS schedule, which would take a point every 2 days (something like this):
[1] "2018-01-01 12:00:00 UTC" "2018-01-03 12:00:00 UTC" "2018-01-05 12:00:00 UTC" "2018-01-07 12:00:00 UTC"
[5] "2018-01-09 12:00:00 UTC" "2018-01-11 12:00:00 UTC"
Each vector (or GPS schedule) would then be run in the following function as 'gps_schedule' to create a .ASF file for the transmitters:
sched_gps_fixes(gps_schedule, tz = "America/Chicago", out_file = "./gps_fixes")
So I'm wondering how to create a for loop that would produce a vector of datetimes for each day of 2018. This is pseudocode for what I'm attempting to do:
# create a loop called 'create_schedules' to make the GPS schedules and produce a .ASF file for each day of 2018
create_schedules <- function(days_df) {
for(row in 1:nrow(days_df)) {
seq(ymd_hms(days_df[[i]]), ymd_hms(days_df[[i]]+days(10)), by='2 days')
}
}
# run the function
create_schedules(days_df)
I'm guessing I need an output to store and name each vector by its start date, among other things?
Thanks,
Jay
One option is to use mapply to generate schedule for each row based on schedule definition provided by OP:
library(lubridate)
# For the sample data max_date needs to be calculated. Otherwise to generate
# schedule for whole 2018 max_date can be taken as 31-Dec-2018.
max_date = max(days_df$days_df)
mapply(function(x)seq(x, max_date, by="2 days"),days_df$days_df)
#Result : Only first 3 items from the list generated. It will continue
# [[1]]
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
#
# [[2]]
# [1] "2018-01-02 12:00:00 CST" "2018-01-04 12:00:00 CST" "2018-01-06 12:00:00 CST"
# [4] "2018-01-08 12:00:00 CST" "2018-01-10 12:00:00 CST"
#
# [[3]]
# [1] "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST" "2018-01-07 12:00:00 CST"
# [4] "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
# ....
# ....
# ....
# [[10]]
# [1] "2018-01-10 12:00:00 CST"
#
# [[11]]
# [1] "2018-01-11 12:00:00 CST"
If OP prefers to have names for items in result list then mapply can be used as:
Update: Based on OP's request to generate schedule for start+10 days. 10 days is equivalent to 10*24*3600 seconds.
mapply(function(x, y)seq(y, y+10*24*3600, by="2 days"),
as.character(days_df$days_df), days_df$days_df,
SIMPLIFY = FALSE,USE.NAMES = TRUE)
#Result
# $`2018-01-01 12:00:00`
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
#.......
#.......
#.......so on

How to deal with one column of two formats and single class?

I have one column with two different formats but the same class 'factor'.
D$date
2009-05-12 11:30:00
2009-05-13 11:30:00
2009-05-14 11:30:00
2009-05-15 11:30:00
42115.652
2876
8765
class(D$date)
factor
What I need is to convert the number to date.
D$date <- as.character(D$date)
D$date=ifelse(!is.na(as.numeric(D$date)),
as.POSIXct(as.numeric(D$date) * (60*60*24), origin="1899-12-30", tz="UTC"),
D$date)
Now the number was converted but to a strange number "1429630800".
I tried without ifelse:
as.POSIXct(as.numeric(42115.652) * (60*60*24), origin="1899-12-30", tz="UTC")
[1] "2015-04-21 15:38:52 UTC"
It was converted nicely.
The problem is that you are mixing up classes in the true/false halves of your ifelse. You can fix this by adding as.character like this
D$date = ifelse(!is.na(as.numeric(D$date)),
as.character(as.POSIXct(as.numeric(D$date) * (60*60*24), origin="1899-12-30", tz="UTC")),
D$date)
#D
# date
#1 2009-05-12 11:30:00
#2 2009-05-13 11:30:00
#3 2009-05-14 11:30:00
#4 2009-05-15 11:30:00
#5 2015-04-21 15:38:52
#6 1907-11-15 00:00:00
#7 1923-12-30 00:00:00
You can also create a function which transforms each value in POSIX, then using lapply and do.call.
b <- c("2009-05-12 11:30:00", "2009-05-13 11:30:00", "2009-05-14 11:30:00",
"2009-05-15 11:30:00", "42115.652", "2876", "8765")
foo <- function(x){
if(!is.na(as.numeric(x))){
as.POSIXct(as.numeric(x) * (60*60*24), origin="1899-12-30", tz="UTC")
}else{
as.POSIXct(x, origin="1899-12-30", tz="UTC")
}
}
do.call("c", lapply(b, foo))
[1] "2009-05-12 13:30:00 CEST" "2009-05-13 13:30:00 CEST" "2009-05-14 13:30:00 CEST" "2009-05-15 13:30:00 CEST"
[5] "2015-04-21 17:38:52 CEST" "1907-11-15 01:00:00 CET" "1923-12-30 01:00:00 CET"

R: Find missing timestamps in csv

as I failed to solve my problem with PHP/MySQL or Excel due to the data size, I'm trying to do my very first steps with R now and struggle a bit. The problem is this: I have a second-by-second CSV-file with half a year of data, that looks like this:
metering,timestamp
123,2016-01-01 00:00:00
345,2016-01-01 00:00:01
243,2016-01-01 00:00:02
101,2016-01-01 00:00:04
134,2016-01-01 00:00:06
As you see, there are some seconds missing every once in a while (don't ask me, why the values are written before the timestamp, but that's how I received the data…). Now I try to calculate the amount of values (= seconds) that are missing.
So my idea was
to create a vector that is correct (includes all sec-by-sec timestamps),
match the given CSV file with that new vector, and
sum up all the timestamps with no value.
I managed to make step 1 happen with the following code:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
write.csv(RegularTimeSeries, file = "RegularTimeSeries.csv")
To have an idea what I did I also exported the vector to a CSV that looks like this:
"1",2016-01-01 00:00:00
"2",2016-01-01 00:00:01
"3",2016-01-01 00:00:02
"4",2016-01-01 00:00:03
"5",2016-01-01 00:00:04
"6",2016-01-01 00:00:05
"7",2016-01-01 00:00:06
Unfortunately I have no idea how to go on with step 2 and 3. I found some very similar examples (http://www.r-bloggers.com/fix-missing-dates-with-r/, R: Insert rows for missing dates/times), but as a total R noob I struggled to translate these examples to my given sec-by-sec data.
Some hints for the greenhorn would be very very helpful – thank you very much in advance :)
In the tidyverse,
library(dplyr)
library(tidyr)
# parse datetimes
df %>% mutate(timestamp = as.POSIXct(timestamp)) %>%
# complete sequence to full sequence from min to max by second
complete(timestamp = seq.POSIXt(min(timestamp), max(timestamp), by = 'sec'))
## # A tibble: 7 x 2
## timestamp metering
## <time> <int>
## 1 2016-01-01 00:00:00 123
## 2 2016-01-01 00:00:01 345
## 3 2016-01-01 00:00:02 243
## 4 2016-01-01 00:00:03 NA
## 5 2016-01-01 00:00:04 101
## 6 2016-01-01 00:00:05 NA
## 7 2016-01-01 00:00:06 134
If you want the number of NAs (i.e. the number of seconds with no data), add on
%>% tally(is.na(metering))
## # A tibble: 1 x 1
## n
## <int>
## 1 2
You can check which values of your RegularTimeSeries are in your broken time series using which and %in%. First create BrokenTimeSeries from your example:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
BrokenTimeSeries <- RegularTimeSeries[-c(3,6,9)] # remove some seconds
This will give you the indeces of values within RegularTimeSeries that are not in BrokenTimeSeries:
> which(!(RegularTimeSeries %in% BrokenTimeSeries))
[1] 3 6 9
This will return the actual values:
> RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))]
[1] "2016-01-01 00:00:02 UTC" "2016-01-01 00:00:05 UTC" "2016-01-01 00:00:08 UTC"
Maybe I'm misunderstanding your problem but you can count the number of missing seconds simply subtracting the length of your broken time series from RegularTimeSeries or getting the length of any of the two resulting vectors above.
> length(RegularTimeSeries) - length(BrokenTimeSeries)
[1] 3
> length(which(!(RegularTimeSeries %in% BrokenTimeSeries)))
[1] 3
> length(RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))])
[1] 3
If you want to merge the files together to see the missing values you can do something like this:
#data with regular time series and a "step"
df <- data.frame(
RegularTimeSeries
)
df$BrokenTimeSeries[RegularTimeSeries %in% BrokenTimeSeries] <- df$RegularTimeSeries
df$BrokenTimeSeries <- as.POSIXct(df$BrokenTimeSeries, origin="2015-01-01", tz="UTC")
resulting in:
> df[1:12,]
RegularTimeSeries BrokenTimeSeries
1 2016-01-01 00:00:00 2016-01-01 00:00:00
2 2016-01-01 00:00:01 2016-01-01 00:00:01
3 2016-01-01 00:00:02 <NA>
4 2016-01-01 00:00:03 2016-01-01 00:00:02
5 2016-01-01 00:00:04 2016-01-01 00:00:03
6 2016-01-01 00:00:05 <NA>
7 2016-01-01 00:00:06 2016-01-01 00:00:04
8 2016-01-01 00:00:07 2016-01-01 00:00:05
9 2016-01-01 00:00:08 <NA>
10 2016-01-01 00:00:09 2016-01-01 00:00:06
11 2016-01-01 00:00:10 2016-01-01 00:00:07
12 2016-01-01 00:00:11 2016-01-01 00:00:08
If all you want is the number of missing seconds, it can be done much more simply. First find the number of seconds in your timerange, and then subtract the number of rows in your dataset. This could be done in R along these lines:
n.seconds <- difftime("2016-06-01 00:00:00", "2016-01-01 00:00:00", units="secs")
n.rows <- nrow(my.data.frame)
n.missing.values <- n.seconds - n.rows
You might change the time range and the variable of your data frame.
Hope it helps
d <- (c("2016-01-01 00:00:01",
"2016-01-01 00:00:02",
"2016-01-01 00:00:03",
"2016-01-01 00:00:04",
"2016-01-01 00:00:05",
"2016-01-01 00:00:06",
"2016-01-01 00:00:10",
"2016-01-01 00:00:12",
"2016-01-01 00:00:14",
"2016-01-01 00:00:16",
"2016-01-01 00:00:18",
"2016-01-01 00:00:20",
"2016-01-01 00:00:22"))
d <- as.POSIXct(d)
for (i in 2:length(d)){
if(difftime(d[i-1],d[i], units = "secs") < -1 ){
c[i] <- d[i]
}
}
class(c) <- c('POSIXt','POSIXct')
c
[1] NA NA NA
NA NA
[6] NA "2016-01-01 00:00:10 EST" "2016-01-01 00:00:12
EST" "2016-01-01 00:00:14 EST" "2016-01-01 00:00:16 EST"
[11] "2016-01-01 00:00:18 EST" "2016-01-01 00:00:20 EST" "2016-01-01
00:00:22 EST"

Text process using R

I am quite new in programming and R Software.
My data-set includes date-time variables as following:
2007/11/0103
2007/11/0104
2007/11/0105
2007/11/0106
I need an operator which count from left up to the character number 10 and then execute a space and copy the last two characters and then add :00 for all columns.
Expected results:
2007/11/01 03:00
2007/11/01 04:00
2007/11/01 05:00
2007/11/01 06:00
If you want to actually turn your data into a "POSIXlt" "POSIXt" class in R (so you could subtract/add days, minutes and etc from/to it) you could do
# Your data
temp <- c("2007/11/0103", "2007/11/0104", "2007/11/0105", "2007/11/0106")
temp2 <- strptime(temp, "%Y/%m/%d%H")
## [1] "2007-11-01 03:00:00 IST" "2007-11-01 04:00:00 IST" "2007-11-01 05:00:00 IST" "2007-11-01 06:00:00 IST"
You could then extract hours for example
temp2$hour
## [1] 3 4 5 6
Add hours
temp2 + 3600
## [1] "2007-11-01 04:00:00 IST" "2007-11-01 05:00:00 IST" "2007-11-01 06:00:00 IST" "2007-11-01 07:00:00 IST"
And so on. If you just want the format you mentioned in your question (which is just a character string), you can also do
format(strptime(temp, "%Y/%m/%d%H"), format = "%Y/%m/%d %H:%M")
#[1] "2007/11/01 03:00" "2007/11/01 04:00" "2007/11/01 05:00" "2007/11/01 06:00"
Try
library(lubridate)
dat <- read.table(text="2007/11/0103
2007/11/0104
2007/11/0105
2007/11/0106",header=F,stringsAsFactors=F)
dat$V1 <- format(ymd_h(dat$V1),"%Y/%m/%d %H:%M")
dat
# V1
# 1 2007/11/01 03:00
# 2 2007/11/01 04:00
# 3 2007/11/01 05:00
# 4 2007/11/01 06:00
Suppose your dates are a vector named dates
library(stringr)
paste0(paste(str_sub(dates, end=10), str_sub(dates, 11)), ":00")
paste and substr are your friends here. Type ? before either to see the documentation
my.parser <- function(a){
paste0(substr(a, 0,10),' ',substr(a,11,12),':00') # paste0 is like paste but does not add whitespace
}
a<- '2007/11/0103'
my.parser(a) # = "2007/11/01 03:00"

Splitting a factor at a space in R

I want to split x (which is a factor)
dd = data.frame(x = c("29-4-2014 06:00:00", "9-4-2014 12:00:00", "9-4-2014 00:00:00", "6-5-2014 00:00:00" ,"7-4-2014 00:00:00" , "29-5-2014 00:00:00"))
x
29-4-2014 06:00:00
9-4-2014 12:00:00
9-4-2014 00:00:00
6-5-2014 00:00:00
7-4-2014 00:00:00
29-5-2014 00:00:00
at the horizontal space and get two columns as:
x.date x.time
29-4-2014 06:00:00
9-4-2014 12:00:00
9-4-2014 00:00:00
6-5-2014 00:00:00
7-4-2014 00:00:00
29-5-2014 00:00:00
Any suggestion is appreciated!
strsplit is typically used here, but you can also use read.table:
read.table(text = as.character(dd$x))
# V1 V2
# 1 29-4-2014 06:00:00
# 2 9-4-2014 12:00:00
# 3 9-4-2014 00:00:00
# 4 6-5-2014 00:00:00
# 5 7-4-2014 00:00:00
# 6 29-5-2014 00:00:00
Other option (better)
# Convert to POSIXct objects
times <- as.POSIXct(dd$x, format="%d-%m-%Y %T")
# You may also want to specify the time zone
times <- as.POSIXct(dd$x, format="%d-%m-%Y %T", tz="GMT")
Then, to extract times
strftime(times, "%T")
[1] "06:00:00" "12:00:00" "00:00:00" "00:00:00" "00:00:00" "00:00:00"
or dates
strftime(times, "%D")
[1] "04/29/14" "04/09/14" "04/09/14" "05/06/14" "04/07/14" "05/29/14"
or, any format you want, really
strftime(times, "%d %b %Y at %T")
[1] "29 Apr 2014 at 06:00:00" "09 Apr 2014 at 12:00:00"
[3] "09 Apr 2014 at 00:00:00" "06 May 2014 at 00:00:00"
[5] "07 Apr 2014 at 00:00:00" "29 May 2014 at 00:00:00"
See, for more info: ?as.POSIXct and ?strftime
Here is another approach using lubridate:
dd = data.frame(x = c("29-4-2014 06:00:00", "9-4-2014 12:00:00", "9-4-2014 00:00:00", "6-5-2014 00:00:00" ,"7-4-2014 00:00:00" , "29-5-2014 00:00:00"),
stringsAsFactors = FALSE)
Note the use of stringsAsFactors = FALSE, which prevents your dates from being read as factors.
library(lubridate)
dd2 <- transform(dd,x2 = dmy_hms(x))
transform(dd2, the_year = year(x2))
x x2 the_year
1 29-4-2014 06:00:00 2014-04-29 06:00:00 2014
2 9-4-2014 12:00:00 2014-04-09 12:00:00 2014
3 9-4-2014 00:00:00 2014-04-09 00:00:00 2014
4 6-5-2014 00:00:00 2014-05-06 00:00:00 2014
5 7-4-2014 00:00:00 2014-04-07 00:00:00 2014
6 29-5-2014 00:00:00 2014-05-29 00:00:00 2014

Resources