I'm trying to create GPS schedules for satellite transmitters that are used to track the migration of a bird species I'm studying. The function below called 'sched_gps_fixes' takes a vector of datetimes and writes them to a .ASF file, which is uploaded to the satellite transmitter. This tells the transmitter what date and time to take a GPS fix. Using R and the sched_gps_fixes function allows me to quickly create a GPS schedule that starts on any day of the year. The software that comes with the transmitters does this as well, but I would have to painstakingly select each time and date I want the transmitter to take a GPS location.
So I want to: 1) create a data frame that contains every day of the year in 2018, and the time I want the transmitter to collect a GPS location, 2) use each row of the data frame as the start date for a sequence of datetimes (so starting on 2018-03-25 12:00:00 for example, I want to create a GPS schedule that takes a GPS point every other day after that, so 2018-03-25 12:00:00, 2018-03-27 12:00:00, etc.), and 3) create a .ASF file for each GPS schedule. Here's a simplified version of what I'm trying to accomplish below:
library(lubridate)
# set the beginning time
start_date <- ymd_hms('2018-01-01 12:00:00')
# create a sequence of datetimes starting January 1
days_df <- seq(ymd_hms(start_date), ymd_hms(start_date+days(10)), by='1 days')
tz(days_df) <- "America/Chicago"
days_df <- as.data.frame(days_df)
days_df
# to reproduce the example
days_df <- structure(list(days_df = structure(c(1514829600, 1514916000,
1515002400, 1515088800, 1515175200, 1515261600, 1515348000, 1515434400,
1515520800, 1515607200, 1515693600), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago")), .Names = "days_df", row.names = c(NA,
-11L), class = "data.frame")
# the data frame looks like this:
days_df
1 2018-01-01 12:00:00
2 2018-01-02 12:00:00
3 2018-01-03 12:00:00
4 2018-01-04 12:00:00
5 2018-01-05 12:00:00
6 2018-01-06 12:00:00
7 2018-01-07 12:00:00
8 2018-01-08 12:00:00
9 2018-01-09 12:00:00
10 2018-01-10 12:00:00
11 2018-01-11 12:00:00
I would like to loop through each datetime in the data frame, and create a vector for each row of the data frame. So each vector would have a particular row's datetime as the starting date for a GPS schedule, which would take a point every 2 days (something like this):
[1] "2018-01-01 12:00:00 UTC" "2018-01-03 12:00:00 UTC" "2018-01-05 12:00:00 UTC" "2018-01-07 12:00:00 UTC"
[5] "2018-01-09 12:00:00 UTC" "2018-01-11 12:00:00 UTC"
Each vector (or GPS schedule) would then be run in the following function as 'gps_schedule' to create a .ASF file for the transmitters:
sched_gps_fixes(gps_schedule, tz = "America/Chicago", out_file = "./gps_fixes")
So I'm wondering how to create a for loop that would produce a vector of datetimes for each day of 2018. This is pseudocode for what I'm attempting to do:
# create a loop called 'create_schedules' to make the GPS schedules and produce a .ASF file for each day of 2018
create_schedules <- function(days_df) {
for(row in 1:nrow(days_df)) {
seq(ymd_hms(days_df[[i]]), ymd_hms(days_df[[i]]+days(10)), by='2 days')
}
}
# run the function
create_schedules(days_df)
I'm guessing I need an output to store and name each vector by its start date, among other things?
Thanks,
Jay
One option is to use mapply to generate schedule for each row based on schedule definition provided by OP:
library(lubridate)
# For the sample data max_date needs to be calculated. Otherwise to generate
# schedule for whole 2018 max_date can be taken as 31-Dec-2018.
max_date = max(days_df$days_df)
mapply(function(x)seq(x, max_date, by="2 days"),days_df$days_df)
#Result : Only first 3 items from the list generated. It will continue
# [[1]]
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
#
# [[2]]
# [1] "2018-01-02 12:00:00 CST" "2018-01-04 12:00:00 CST" "2018-01-06 12:00:00 CST"
# [4] "2018-01-08 12:00:00 CST" "2018-01-10 12:00:00 CST"
#
# [[3]]
# [1] "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST" "2018-01-07 12:00:00 CST"
# [4] "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
# ....
# ....
# ....
# [[10]]
# [1] "2018-01-10 12:00:00 CST"
#
# [[11]]
# [1] "2018-01-11 12:00:00 CST"
If OP prefers to have names for items in result list then mapply can be used as:
Update: Based on OP's request to generate schedule for start+10 days. 10 days is equivalent to 10*24*3600 seconds.
mapply(function(x, y)seq(y, y+10*24*3600, by="2 days"),
as.character(days_df$days_df), days_df$days_df,
SIMPLIFY = FALSE,USE.NAMES = TRUE)
#Result
# $`2018-01-01 12:00:00`
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
#.......
#.......
#.......so on
Related
Suppose there is a csv file named ta_sample.csv as under:
"BILL_DT","AMOUNT"
"2015-07-27T18:30:00Z",16000
"2015-07-07T18:30:00Z",6110
"2015-07-26T18:30:00Z",250
"2015-07-22T18:30:00Z",1000
"2015-07-06T18:30:00Z",2640000
Reading the above using read_csv_arrow and customizing the column types which is always needed in actual production data:
library(arrow)
read_csv_arrow(
"ta_sample.csv",
col_names = c("BILL_DT", "AMOUNT"),
col_types = "td",
skip = 1,
timestamp_parsers = c("%Y-%m-%dT%H:%M:%SZ"))
the result is as under:
# A tibble: 5 x 2
BILL_DT AMOUNT
<dttm> <dbl>
1 2015-07-28 00:00:00 16000
2 2015-07-08 00:00:00 6110
3 2015-07-27 00:00:00 250
4 2015-07-23 00:00:00 1000
5 2015-07-07 00:00:00 2640000
The issue here is that the dates are increased by one day and the time disappears. It is worth mentioning here that data.table::fread() as well as readr::read_csv() read it properly, eg,
library(readr)
read_csv("ta_sample.csv")
# A tibble: 5 x 2
BILL_DT AMOUNT
<dttm> <dbl>
1 2015-07-27 18:30:00 16000
2 2015-07-07 18:30:00 6110
3 2015-07-26 18:30:00 250
4 2015-07-22 18:30:00 1000
5 2015-07-06 18:30:00 2640000
Parsing example values in BILL_DT column with strptime also work perfectly as under:
strptime(c("2015-07-27T18:30:00Z", "2015-07-07T18:30:00Z"), "%Y-%m-%dT%H:%M:%SZ")
[1] "2015-07-27 18:30:00 IST" "2015-07-07 18:30:00 IST"
What parameters in read_csv_arrow need to be adjusted to get results identical to that given by readr::read_csv() ?
There are a few things going on here, but they all relate to timezones + how they are interpreted by various parts of R + Arrow + other packages.
When Arrow reads in timestamps, it treats the values as if they were UTC. Arrow does not yet have the ability to specify alternative timezones when parsing[1], so stores these values as timezoneless (and assumes UTC). Though in this case, since the timestamps you have are UTC (according to ISO_8601, the Z at the end means UTC) they are stored correctly in Arrow as timezoneless UTC timestamps. The values of the timestamps are the same (that is, they represent the same time in UTC), the difference is in how they are displayed: are they displayed as the time in UTC or are they displayed in the local timezone.
When the timestamps are converted into R, the timezonelessness is preserved:
> from_arrow <- read_csv_arrow(
+ "ta_sample.csv",
+ col_names = c("BILL_DT", "AMOUNT"),
+ col_types = "td",
+ skip = 1,
+ timestamp_parsers = c("%Y-%m-%dT%H:%M:%SZ"))
>
> attr(from_arrow$BILL_DT, "tzone")
NULL
R defaults to displaying timestamps without a tzone attribute in the local timezone (for me it's currently CDT, for you it looks like it's IST). And, note that timestamps with an explicit timezone are displayed in that timezone.
> from_arrow$BILL_DT
[1] "2015-07-27 13:30:00 CDT" "2015-07-07 13:30:00 CDT"
[3] "2015-07-26 13:30:00 CDT" "2015-07-22 13:30:00 CDT"
[5] "2015-07-06 13:30:00 CDT"
If you would like to display the UTC timestamps, you can do a few things:
Explicitly set the tzone attribute (or you could use lubridate::with_tz() for the same operation):
> attr(from_arrow$BILL_DT, "tzone") <- "UTC"
> from_arrow$BILL_DT
[1] "2015-07-27 18:30:00 UTC" "2015-07-07 18:30:00 UTC"
[3] "2015-07-26 18:30:00 UTC" "2015-07-22 18:30:00 UTC"
[5] "2015-07-06 18:30:00 UTC"
You can set the timezone in your R session so that when R goes to display the time it uses UTC (Note: the tzone attribute is still unset here, but the display is UTC because the session timezone is set to UTC)
> Sys.setenv(TZ="UTC")
> from_arrow <- read_csv_arrow(
3. "ta_sample.csv",
4. col_names = c("BILL_DT", "AMOUNT"),
5. col_types = "td",
6. skip = 1,
7. timestamp_parsers = c("%Y-%m-%dT%H:%M:%SZ"))
> from_arrow$BILL_DT
[1] "2015-07-27 18:30:00 UTC" "2015-07-07 18:30:00 UTC"
[3] "2015-07-26 18:30:00 UTC" "2015-07-22 18:30:00 UTC"
[5] "2015-07-06 18:30:00 UTC"
> attr(from_arrow$BILL_DT, "tzone")
NULL
You could read the data into an Arrow table, and cast the timestamp to have an explicit timezone in Arrow before pulling the data into R with collect(). This csv -> Arrow table -> data.frame is what happens under the hood, so there are no additional conversions going on here (other than the cast). And it can be useful + more efficient to do operations on the Arrow table if you have other transformations you are applying, though it is more code than the first two.
> library(arrow)
> library(dplyr)
> tab <- read_csv_arrow(
+ "ta_sample.csv",
+ col_names = c("BILL_DT", "AMOUNT"),
+ col_types = "td",
+ skip = 1,
+ as_data_frame = FALSE)
>
> tab_df <- tab %>%
+ mutate(BILL_DT_cast = cast(BILL_DT, timestamp(unit = "s", timezone = "UTC"))) %>%
+ collect()
> attr(tab_df$BILL_DT, "tzone")
NULL
> attr(tab_df$BILL_DT_cast, "tzone")
[1] "UTC"
> tab_df
# A tibble: 5 × 3
BILL_DT AMOUNT BILL_DT_cast
<dttm> <dbl> <dttm>
1 2015-07-27 13:30:00 16000 2015-07-27 18:30:00
2 2015-07-07 13:30:00 6110 2015-07-07 18:30:00
3 2015-07-26 13:30:00 250 2015-07-26 18:30:00
4 2015-07-22 13:30:00 1000 2015-07-22 18:30:00
5 2015-07-06 13:30:00 2640000 2015-07-06 18:30:00
This is also made a bit more confusing because base R's strptime() doesn't parse timezones (which is why you're seeing the same clock time but with IST in your example above). lubridate's[2] parsing functions do respect this, and you can see the difference here:
> lubridate::parse_date_time(c("2015-07-27T18:30:00Z", "2015-07-07T18:30:00Z"), "YmdHMS")
[1] "2015-07-27 18:30:00 UTC" "2015-07-07 18:30:00 UTC"
[1] Though we have two issues related to adding this functionality https://issues.apache.org/jira/browse/ARROW-12820 and https://issues.apache.org/jira/browse/ARROW-13348
[2] And, lubridate's docs even mention this:
ISO8601 signed offset in hours and minutes from UTC. For example -0800, -08:00 or -08, all represent 8 hours behind UTC. This format also matches the Z (Zulu) UTC indicator. Because base::strptime() doesn't fully support ISO8601 this format is implemented as an union of 4 orders: Ou (Z), Oz (-0800), OO (-08:00) and Oo (-08). You can use these four orders as any other but it is rarely necessary. parse_date_time2() and fast_strptime() support all of the timezone formats.
https://lubridate.tidyverse.org/reference/parse_date_time.html
I have a column of dates in an R data frame, that look like this,
Date
2020-08-05
2020-08-05
2020-08-05
2020-08-07
2020-08-08
2020-08-08
So the dates are formatted as 'yyyy-mm-dd'.
I am writing this data frame to a CSV that needs to be formatted in a very specific manner. I need to convert these dates to the format 'mm/dd/yyyy hh:mm:ss', so this is what I want the columns to look like:
Date
8/5/2020 12:00:00 AM
8/5/2020 12:00:00 AM
8/5/2020 12:00:00 AM
8/7/2020 12:00:00 AM
8/8/2020 12:00:00 AM
8/8/2020 12:00:00 AM
The dates do not have a timestamp attached to begin with, so all dates will need a midnight timestamp in the format shown above.
I spent quite some time trying to coerce this format yesterday and was unable. I am easily able to change 2020-08-05 to 8/5/2020 using as.Date(), but the issue arises when I attempt to add the midnight time stamp.
How can I add a midnight timestamp to these reformatted dates?
Thanks so much for any help!
You can use format:
df <- data.frame(Date = as.Date(c("2020-08-05", "2020-08-07")))
format(df$Date, "%d-%m-%Y 12:00:00 AM")
[1] "05-08-2020 12:00:00 AM" "07-08-2020 12:00:00 AM"
dat <- data.frame(
Date = as.Date("2020-08-05") + c(0, 0, 0, 2, 3, 3)
)
dat[["Date"]] <- format(dat[["Date"]], "%m/%d/%Y %I:%M:%S %p")
dat[["Date"]] <- sub("([ap]m)$", "\\U\\1", dat[["Date"]], perl = T)
dat
## Date
## 1 08/05/2020 12:00:00 AM
## 2 08/05/2020 12:00:00 AM
## 3 08/05/2020 12:00:00 AM
## 4 08/07/2020 12:00:00 AM
## 5 08/08/2020 12:00:00 AM
## 6 08/08/2020 12:00:00 AM
Try this:
format(as.POSIXct("2022-11-08", tz = "Australia/Sydney"), "%Y-%m-%d %H:%M:%S")
as I failed to solve my problem with PHP/MySQL or Excel due to the data size, I'm trying to do my very first steps with R now and struggle a bit. The problem is this: I have a second-by-second CSV-file with half a year of data, that looks like this:
metering,timestamp
123,2016-01-01 00:00:00
345,2016-01-01 00:00:01
243,2016-01-01 00:00:02
101,2016-01-01 00:00:04
134,2016-01-01 00:00:06
As you see, there are some seconds missing every once in a while (don't ask me, why the values are written before the timestamp, but that's how I received the data…). Now I try to calculate the amount of values (= seconds) that are missing.
So my idea was
to create a vector that is correct (includes all sec-by-sec timestamps),
match the given CSV file with that new vector, and
sum up all the timestamps with no value.
I managed to make step 1 happen with the following code:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
write.csv(RegularTimeSeries, file = "RegularTimeSeries.csv")
To have an idea what I did I also exported the vector to a CSV that looks like this:
"1",2016-01-01 00:00:00
"2",2016-01-01 00:00:01
"3",2016-01-01 00:00:02
"4",2016-01-01 00:00:03
"5",2016-01-01 00:00:04
"6",2016-01-01 00:00:05
"7",2016-01-01 00:00:06
Unfortunately I have no idea how to go on with step 2 and 3. I found some very similar examples (http://www.r-bloggers.com/fix-missing-dates-with-r/, R: Insert rows for missing dates/times), but as a total R noob I struggled to translate these examples to my given sec-by-sec data.
Some hints for the greenhorn would be very very helpful – thank you very much in advance :)
In the tidyverse,
library(dplyr)
library(tidyr)
# parse datetimes
df %>% mutate(timestamp = as.POSIXct(timestamp)) %>%
# complete sequence to full sequence from min to max by second
complete(timestamp = seq.POSIXt(min(timestamp), max(timestamp), by = 'sec'))
## # A tibble: 7 x 2
## timestamp metering
## <time> <int>
## 1 2016-01-01 00:00:00 123
## 2 2016-01-01 00:00:01 345
## 3 2016-01-01 00:00:02 243
## 4 2016-01-01 00:00:03 NA
## 5 2016-01-01 00:00:04 101
## 6 2016-01-01 00:00:05 NA
## 7 2016-01-01 00:00:06 134
If you want the number of NAs (i.e. the number of seconds with no data), add on
%>% tally(is.na(metering))
## # A tibble: 1 x 1
## n
## <int>
## 1 2
You can check which values of your RegularTimeSeries are in your broken time series using which and %in%. First create BrokenTimeSeries from your example:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
BrokenTimeSeries <- RegularTimeSeries[-c(3,6,9)] # remove some seconds
This will give you the indeces of values within RegularTimeSeries that are not in BrokenTimeSeries:
> which(!(RegularTimeSeries %in% BrokenTimeSeries))
[1] 3 6 9
This will return the actual values:
> RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))]
[1] "2016-01-01 00:00:02 UTC" "2016-01-01 00:00:05 UTC" "2016-01-01 00:00:08 UTC"
Maybe I'm misunderstanding your problem but you can count the number of missing seconds simply subtracting the length of your broken time series from RegularTimeSeries or getting the length of any of the two resulting vectors above.
> length(RegularTimeSeries) - length(BrokenTimeSeries)
[1] 3
> length(which(!(RegularTimeSeries %in% BrokenTimeSeries)))
[1] 3
> length(RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))])
[1] 3
If you want to merge the files together to see the missing values you can do something like this:
#data with regular time series and a "step"
df <- data.frame(
RegularTimeSeries
)
df$BrokenTimeSeries[RegularTimeSeries %in% BrokenTimeSeries] <- df$RegularTimeSeries
df$BrokenTimeSeries <- as.POSIXct(df$BrokenTimeSeries, origin="2015-01-01", tz="UTC")
resulting in:
> df[1:12,]
RegularTimeSeries BrokenTimeSeries
1 2016-01-01 00:00:00 2016-01-01 00:00:00
2 2016-01-01 00:00:01 2016-01-01 00:00:01
3 2016-01-01 00:00:02 <NA>
4 2016-01-01 00:00:03 2016-01-01 00:00:02
5 2016-01-01 00:00:04 2016-01-01 00:00:03
6 2016-01-01 00:00:05 <NA>
7 2016-01-01 00:00:06 2016-01-01 00:00:04
8 2016-01-01 00:00:07 2016-01-01 00:00:05
9 2016-01-01 00:00:08 <NA>
10 2016-01-01 00:00:09 2016-01-01 00:00:06
11 2016-01-01 00:00:10 2016-01-01 00:00:07
12 2016-01-01 00:00:11 2016-01-01 00:00:08
If all you want is the number of missing seconds, it can be done much more simply. First find the number of seconds in your timerange, and then subtract the number of rows in your dataset. This could be done in R along these lines:
n.seconds <- difftime("2016-06-01 00:00:00", "2016-01-01 00:00:00", units="secs")
n.rows <- nrow(my.data.frame)
n.missing.values <- n.seconds - n.rows
You might change the time range and the variable of your data frame.
Hope it helps
d <- (c("2016-01-01 00:00:01",
"2016-01-01 00:00:02",
"2016-01-01 00:00:03",
"2016-01-01 00:00:04",
"2016-01-01 00:00:05",
"2016-01-01 00:00:06",
"2016-01-01 00:00:10",
"2016-01-01 00:00:12",
"2016-01-01 00:00:14",
"2016-01-01 00:00:16",
"2016-01-01 00:00:18",
"2016-01-01 00:00:20",
"2016-01-01 00:00:22"))
d <- as.POSIXct(d)
for (i in 2:length(d)){
if(difftime(d[i-1],d[i], units = "secs") < -1 ){
c[i] <- d[i]
}
}
class(c) <- c('POSIXt','POSIXct')
c
[1] NA NA NA
NA NA
[6] NA "2016-01-01 00:00:10 EST" "2016-01-01 00:00:12
EST" "2016-01-01 00:00:14 EST" "2016-01-01 00:00:16 EST"
[11] "2016-01-01 00:00:18 EST" "2016-01-01 00:00:20 EST" "2016-01-01
00:00:22 EST"
I have a table in R like:
start duration
02/01/2012 20:00:00 5
05/01/2012 07:00:00 6
etc... etc...
I got to this by importing a table from Microsoft Excel that looked like this:
date time duration
2012/02/01 20:00:00 5
etc...
I then merged the date and time columns by running the following code:
d.f <- within(d.f, { start=format(as.POSIXct(paste(date, time)), "%m/%d/%Y %H:%M:%S") })
I want to create a third column called 'end', which will be calculated as the number of hours after the start time. I am pretty sure that my time is a POSIXct vector. I have seen how to manipulate one datetime object, but how can I do that for the entire column?
The expected result should look like:
start duration end
02/01/2012 20:00:00 5 02/02/2012 01:00:00
05/01/2012 07:00:00 6 05/01/2012 13:00:00
etc... etc... etc...
Using lubridate
> library(lubridate)
> df$start <- mdy_hms(df$start)
> df$end <- df$start + hours(df$duration)
> df
# start duration end
#1 2012-02-01 20:00:00 5 2012-02-02 01:00:00
#2 2012-05-01 07:00:00 6 2012-05-01 13:00:00
data
df <- structure(list(start = c("02/01/2012 20:00:00", "05/01/2012 07:00:00"
), duration = 5:6), .Names = c("start", "duration"), class = "data.frame", row.names = c(NA,
-2L))
You can simply add dur*3600 to start column of the data frame. E.g. with one date:
start = as.POSIXct("02/01/2012 20:00:00",format="%m/%d/%Y %H:%M:%S")
start
[1] "2012-02-01 20:00:00 CST"
start + 5*3600
[1] "2012-02-02 01:00:00 CST"
I am quite new in programming and R Software.
My data-set includes date-time variables as following:
2007/11/0103
2007/11/0104
2007/11/0105
2007/11/0106
I need an operator which count from left up to the character number 10 and then execute a space and copy the last two characters and then add :00 for all columns.
Expected results:
2007/11/01 03:00
2007/11/01 04:00
2007/11/01 05:00
2007/11/01 06:00
If you want to actually turn your data into a "POSIXlt" "POSIXt" class in R (so you could subtract/add days, minutes and etc from/to it) you could do
# Your data
temp <- c("2007/11/0103", "2007/11/0104", "2007/11/0105", "2007/11/0106")
temp2 <- strptime(temp, "%Y/%m/%d%H")
## [1] "2007-11-01 03:00:00 IST" "2007-11-01 04:00:00 IST" "2007-11-01 05:00:00 IST" "2007-11-01 06:00:00 IST"
You could then extract hours for example
temp2$hour
## [1] 3 4 5 6
Add hours
temp2 + 3600
## [1] "2007-11-01 04:00:00 IST" "2007-11-01 05:00:00 IST" "2007-11-01 06:00:00 IST" "2007-11-01 07:00:00 IST"
And so on. If you just want the format you mentioned in your question (which is just a character string), you can also do
format(strptime(temp, "%Y/%m/%d%H"), format = "%Y/%m/%d %H:%M")
#[1] "2007/11/01 03:00" "2007/11/01 04:00" "2007/11/01 05:00" "2007/11/01 06:00"
Try
library(lubridate)
dat <- read.table(text="2007/11/0103
2007/11/0104
2007/11/0105
2007/11/0106",header=F,stringsAsFactors=F)
dat$V1 <- format(ymd_h(dat$V1),"%Y/%m/%d %H:%M")
dat
# V1
# 1 2007/11/01 03:00
# 2 2007/11/01 04:00
# 3 2007/11/01 05:00
# 4 2007/11/01 06:00
Suppose your dates are a vector named dates
library(stringr)
paste0(paste(str_sub(dates, end=10), str_sub(dates, 11)), ":00")
paste and substr are your friends here. Type ? before either to see the documentation
my.parser <- function(a){
paste0(substr(a, 0,10),' ',substr(a,11,12),':00') # paste0 is like paste but does not add whitespace
}
a<- '2007/11/0103'
my.parser(a) # = "2007/11/01 03:00"