library(data.table)
I am trying to do this.
wd <- structure(list(Year = c(2006L, 2006L, 2006L), day = c(361L, 361L,
360L), hour = c(14L, 8L, 8L), mint = c(30L, 0L, 30L), valu1 = c(0.5,
0.3, 0.4), Date = structure(c(1167229800, 1167206400, 1167121800
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Year",
"day", "hour", "mint", "valu1", "Date"), row.names = c(NA, -3L
), class = "data.frame")
wg <- c("2006/12/27 14:23:59", "2006/12/27 16:47:59", "2006/12/27 19:12:00")
w <- c("0.4", "0.2", "0.5")
wf=data.frame(wg,w)
wg <- as.POSIXct(wf$wg, format = "%Y/%m/%d %T", tz = "UTC")
WG <- data.table(start = wg, end = wg)
setkey(WG)
## Do the same for `wd` adding +/- 30 minutes
setDT(wd)[, `:=`(start = Date - 1800L, end = Date + 1800L)]
## Run foverlaps and extract the match `valu1` column
foverlaps(wd, WG, nomatch = 0L)[, .(wdDate = Date, valu1, WGDate = start)]
wdDate valu1 WGDate
1: 2006-12-27 14:30:00 0.5 2006-12-27 14:23:59
As you can see in the final results only valu1 was extracted from wd but I would like also to extract the corresponding values from w in wf.
So I want something like this:
wdDate valu1 WGDate w
1: 2006-12-27 14:30:00 0.5 2006-12-27 14:23:59 0.4
Any idea is welcome.
Real data:
head(wf)
date1 date2 date3n wg w whyt
1 <NA> 2003-01-01 <NA> <NA> NA NA
2 <NA> 2003-01-02 <NA> <NA> NA NA
3 <NA> 2003-01-03 <NA> 2003/01/03 10:30:00 0.2137352 0.34
4 <NA> 2003-01-04 <NA> <NA> NA NA
Facing a problem here:
In my previous answer I've created WG because you provided wg as a single vector. If you already have a data set called wf, this whole proccess is not needed. You just need to adjust wf correctly and then run foverlaps. In other words, forget about WG and do the following
setDT(wf)[, wg := as.POSIXct(wg, format = "%Y/%m/%d %T", tz = "UTC")]
wf[, `:=`(start = wg, end = wg)]
setkey(wf, start, end)
setDT(wd)[, `:=`(start = Date - 1800L, end = Date + 1800L)]
foverlaps(wd, wf, nomatch = 0L)[, .(wdDate = Date, valu1, WGDate = start, w)]
# wdDate valu1 WGDate w
# 1: 2006-12-27 14:30:00 0.5 2006-12-27 14:23:59 0.4
Related
Updated to a more realistic example; this time added duplicates in interp_b.
I am trying to populate a field in one dataframe (interp_b) using the values from a second dataframe (bait). I want to look at each row's obs_datetime in interp_b, and determine when that plot-station-year was last baited, prior to the obs_datetime. This will later be used to calculate a time-since-bait for each obs_datetime. Bait times are in the bait dataframe in column bait_datetime. The results should go in a field called latestbait_datetime in the interp_b dataframe.
I was visualizing an iterative process where interp_b "latestbait_datetime" keeps getting recalculated until the last row in the bait dataframe is reached. The for-loop I tried is clearly running through the rows and doing the specified calculations but I can't seem to get the output in the format I want; it is producing output for each loop rather than rewriting and updating the interp_b dataframe.
Here is some code to build the two dataframes; interp_b and bait (please excuse the inelegance)
# interp_b dataframe----
structure(list(plot_station_year = c("Cow_C2_2019", "RidingStable_C3_2018",
"RidingStable_C3_2018", "Raf_C1_2018", "Metcalfe_C2_2019"), obs_datetime = structure(c(1559487600,
1544954400, 1541084400, 1515160800, 1567756800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), latestbait_datetime = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct",
"POSIXt"))), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
# bait dataframe----
structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019",
"RidingStable_C3_2018", "Raf_C1_2018"), bait_datetime = structure(c(1557500400,
1559746800, 1543676400, 1491318000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -4L), spec = structure(list(
cols = list(plot_station_year = structure(list(), class = c("collector_character",
"collector")), bait_datetime = structure(list(format = "%d-%m-%Y %H:%M"), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
and the desired result would look like this
Below are two of my attempts. The first resulted in a dataframe that only contained the final run of the loop and the second attempt resulted in a dataframe containing all of the run results (as expected with the bind).
library(tidyverse)
#attempt #1----
for (i in 1:nrow(bait)) {
print(paste("row =",i))
interpbait <- interp_b %>%
mutate(latestbait_datetime = if_else((plot_station_year == bait$plot_station_year[i] & (obs_datetime >= bait$bait_datetime[i] & (is.na(latestbait_datetime) | latestbait_datetime < bait$bait_datetime[i]))), bait$bait_datetime[i], latestbait_datetime))
}
#attempt #2----
resultb <- data.frame()
for (i in 1:nrow(bait)) {
print(paste("row =",i))
interpbait2 <- interp_b %>%
mutate(latestbait_datetime = if_else((plot_station_year == bait$plot_station_year[i] & (obs_datetime >= bait$bait_datetime[i] & (is.na(latestbait_datetime) | latestbait_datetime < bait$bait_datetime[i]))), bait$bait_datetime[i], latestbait_datetime))
resultb <- bind_rows(resultb, interpbait2)
print(resultb)
}
Any help would be greatly appreciated.
I'm not sure how long this will take, but here is a tidyverse solution. For each row in interp_b, we filter the bait dataframe to the correct plot_station_year, and ensure that all date-times are less than the row in interp_b. Then, we arrange the filtered bait data by descending datetime (so that the most recent dates are on top). We slice the first row of that dataframe so that we only get the most recent date. Then we "pull out" the date-time from the dataframe, and add it onto the appropriate row in interp_b.
library(tidyverse)
library(progress) # for progress bar
# create progress bar to update, so that you can estimate the amount of time it will take to finish the entire loop
pb <- progress_bar$new(total = nrow(interp_b))
for (i in 1:nrow(interp_b)) {
last_time_baited <- bait %>%
#filter bait dataframe to appropriate plot, station, year based on
# the row in interp_b
filter(plot_station_year == interp_b$plot_station_year[i],
# ensure all datetimes are less than that row in interp_b
bait_datetime < interp_b$obs_datetime[i]) %>%
# arrange by datetime (most recent datetimes first)
arrange(desc(bait_datetime)) %>%
# take the top row - this will be the most recent date-time that
# the plot-station was baited
slice(1) %>%
# "pull" that value out of the dataframe so you have a value,
# not a tibble
pull(bait_datetime) #
# update the row in interp_b with the date_time baited
interp_b$latestbait_datetime[i] <- last_time_baited
pb$tick() # print progress
}
The resulting table matches your expected output (interp_b):
# A tibble: 5 x 3
plot_station_year obs_datetime latestbait_datetime
<chr> <dttm> <dttm>
1 Cow_C2_2019 2019-06-02 15:00:00 2019-05-10 11:00:00
2 RidingStable_C3_2018 2018-12-16 10:00:00 2018-12-01 10:00:00
3 RidingStable_C3_2018 2018-11-01 15:00:00 NA
4 Raf_C1_2018 2018-01-05 14:00:00 2017-04-04 11:00:00
5 Metcalfe_C2_2019 2019-09-06 08:00:00 NA
You could perform an outer join with data.table, and then select the highest bait_datetime for each plot_station_year.
Edit: I edited my answer to reflect the possibility that there could be multiple obs_datetime for a given unique plot_station_year in interp2. To preserve these, we index them and include the index in the filtering step.
One potential improvement with large files (not tested) could be to merge using roll, instead of performing an outer merge and then to filter.
That version is shown in the end of the reproducible example:
library(data.table)
interp2 <- structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019", "RidingStable_C3_2018",
"Raf_C1_2018", "Metcalfe_C2_2019"), obs_datetime = structure(c(1559487600, 1559487300,
1544954400, 1515160800, 1567756800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), latestbait_datetime = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_), class = c("POSIXct", "POSIXt"))), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))
bait2 <- structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019", "Cow_C2_2019",
"RidingStable_C3_2018", "Raf_C1_2018"), bait_datetime = structure(c(1557500400,
1496674800, 1576674800, 1543676400, 1491318000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(plot_station_year = structure(list(), class = c("collector_character",
"collector")), bait_datetime = structure(list(format = "%d-%m-%Y %H:%M"), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
# add index idx by plot_station_year, remove empty column, set keys
setDT(interp2)[, "latestbait_datetime" := NULL][, idx := 1:.N, by=plot_station_year]
setkeyv(interp2, c("plot_station_year", "idx", "obs_datetime"))
# same for bait2: set as data.table, set keys
setDT(bait2, key=c("plot_station_year", "bait_datetime"))
## option 1: merge files, then filter
# outer join on interp2 and bait2 on first column (and order by bait_datetime)
expected_out <- merge(interp2, bait2, by="plot_station_year", all=TRUE)
# set keys for sorting
setkey(expected_out, plot_station_year, idx, bait_datetime)
# select highest bait_datetime below obs_datetime by plot_station_year and idx
expected_out <- expected_out[is.na(bait_datetime) | bait_datetime < obs_datetime][,
tail(.SD, 1), by=.(plot_station_year, idx)]
# rename and sort columns
setnames(expected_out, old="bait_datetime", new="latestbait_datetime")
setorder(expected_out, -latestbait_datetime, idx, na.last = TRUE)[]
#> plot_station_year idx obs_datetime latestbait_datetime
#> 1: Cow_C2_2019 1 2019-06-02 15:00:00 2019-05-10 15:00:00
#> 2: Cow_C2_2019 2 2019-06-02 14:55:00 2019-05-10 15:00:00
#> 3: RidingStable_C3_2018 1 2018-12-16 10:00:00 2018-12-01 15:00:00
#> 4: Raf_C1_2018 1 2018-01-05 14:00:00 2017-04-04 15:00:00
#> 5: Metcalfe_C2_2019 1 2019-09-06 08:00:00 <NA>
## option 2 (might use less memory): rolling join
bait2[, latestbait_datetime := bait_datetime]
out_alt <- bait2[interp2, .(plot_station_year, obs_datetime, idx, latestbait_datetime),
on=c("plot_station_year", "bait_datetime==obs_datetime"), roll=Inf]
# order
setorder(out_alt, -latestbait_datetime, idx, na.last = TRUE)[]
#> plot_station_year obs_datetime idx latestbait_datetime
#> 1: Cow_C2_2019 2019-06-02 15:00:00 1 2019-05-10 15:00:00
#> 2: Cow_C2_2019 2019-06-02 14:55:00 2 2019-05-10 15:00:00
#> 3: RidingStable_C3_2018 2018-12-16 10:00:00 1 2018-12-01 15:00:00
#> 4: Raf_C1_2018 2018-01-05 14:00:00 1 2017-04-04 15:00:00
#> 5: Metcalfe_C2_2019 2019-09-06 08:00:00 1 <NA>
setcolorder(out_alt, c(1,3,2,4))[]
#> plot_station_year idx obs_datetime latestbait_datetime
#> 1: Cow_C2_2019 1 2019-06-02 15:00:00 2019-05-10 15:00:00
#> 2: Cow_C2_2019 2 2019-06-02 14:55:00 2019-05-10 15:00:00
#> 3: RidingStable_C3_2018 1 2018-12-16 10:00:00 2018-12-01 15:00:00
#> 4: Raf_C1_2018 1 2018-01-05 14:00:00 2017-04-04 15:00:00
#> 5: Metcalfe_C2_2019 1 2019-09-06 08:00:00 <NA>
## test that both options give the same result:
identical(expected_out, out_alt)
#> [1] TRUE
Error in seq.Date(as.Date(retail$Valid_from), as.Date(retail$Valid_to), :
'from' must be of length 1
I have tried both the methods as mentioned in the question :
How should I deal with 'from' must be of length 1 error?
I basically want to repeat the quantity for each day in a given date range :
HSD_RSP Valid_from Valid_to
70 1/1/2018 15/1/2018
80 1/16/2018 1/31/2018
.
.
.
Method 1 :
byDay = ddply(retail, .(HSD_RSP), transform,
day=seq(as.Date(retail$Valid_from), as.Date(retail$Valid_to), by="day"))
Method 2 :
dt <- data.table(retail)
dt <- dt[,seq(as.Date(Valid_from),as.Date(Valid_to),by="day"),
by=list(HSD_RSP)]
HSD_RSP final_date
70 1/1/2018
70 2/1/2018
70 3/1/2018
70 4/1/2018
.
.
.
output of
dput(head(retail))
structure(list(HSD_RSP = c(61.68, 62.96, 63.14, 60.51, 60.34,
61.63), Valid_from = structure(c(1483315200, 1484524800, 1487116800,
1491004800, 1491523200, 1492300800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Valid_to = structure(c(1484438400, 1487030400,
1490918400, 1491436800, 1492214400, 1493510400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Convert to date, create a sequence of dates between Valid_from and Valid_to and unnest
library(tidyverse)
df %>%
mutate_at(vars(starts_with("Valid")), as.Date, "%m/%d/%Y") %>%
mutate(Date = map2(Valid_from, Valid_to, seq, by = "1 day")) %>%
unnest(Date) %>%
select(-Valid_from, -Valid_to)
# HSD_RSP Date
# <int> <date>
# 1 70 2018-01-01
# 2 70 2018-01-02
# 3 70 2018-01-03
# 4 70 2018-01-04
# 5 70 2018-01-05
# 6 70 2018-01-06
# 7 70 2018-01-07
# 8 70 2018-01-08
# 9 70 2018-01-09
#10 70 2018-01-10
# … with 21 more rows
data
df <- structure(list(HSD_RSP = c(70L, 80L), Valid_from = structure(1:2,
.Label = c("1/1/2018", "1/16/2018"), class = "factor"), Valid_to =
structure(1:2, .Label = c("1/15/2018", "1/31/2018"), class = "factor")),
class = "data.frame", row.names = c(NA, -2L))
Using Ronak Shah's data structure, using data.table:
library(data.table)
dt <- as.data.table(df1)
dt[, .(final_date = seq(as.Date(Valid_from, "%m/%d/%Y"), as.Date(Valid_to, "%m/%d/%Y"), by = "day")),
by = HSD_RSP]
HSD_RSP final_date
1: 70 2018-01-01
2: 70 2018-01-02
3: 70 2018-01-03
4: 70 2018-01-04
....
data:
df <- structure(list(HSD_RSP = c(70L, 80L), Valid_from = structure(1:2,
.Label = c("1/1/2018", "1/16/2018"), class = "factor"), Valid_to =
structure(1:2, .Label = c("1/15/2018", "1/31/2018"), class = "factor")),
class = "data.frame", row.names = c(NA, -2L))
I have two data frames with timestamps (in as.POSIXct, format="%Y-%m-%d %H:%M:%S") as below.
df_ID1
ID DATETIME TIMEDIFF EV
A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
B 2019-04-03 08:00:00 2019-04-03 02:00:00 1
B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
df_ID0
ID DATETIME
A 2019-03-26 00:02:00
A 2019-03-26 04:55:00
A 2019-03-26 11:22:00
B 2019-04-02 20:43:00
B 2019-04-04 11:03:00
B 2019-04-06 03:12:00
I want to compare the DATETIME in df_ID1 with the DATETIME in df_ID0 that is with the same ID and the DATETIME is "smaller than but closest to" the one in df_ID1,
For the pair in two data frames that matches, I want to further compare the TIMEDIFF in df_ID1 to the matched DATETIME in df_ID0, if TIMEDIFF in df_ID1 greater than the DATETIME in df_ID0, change EV 1 to 4 in df_ID1.
My desired result is
df_ID1
ID DATETIME TIMEDIFF EV
A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
I've checked how to compare timestamps and calculate the time difference, also how to change values based on criteria...
But I cannot find anything to select the "smaller than but closest to" timestamps and cannot figure out how to apply all these logic too..
Any help would be appreciate!
You can do this with a for loop keeping in mind that if your actual data base is very big then the overhead would be quite bad performance wise.
for(i in 1:nrow(df_1)){
sub <- subset(df_0, ID == df_1$ID[i]) # filter on ID
df_0_dt <- max(sub[sub$DATETIME < df_1$DATETIME[i],]$DATETIME) # Take max of those with DATETIME less than (ie less than but closest to)
if(df_0_dt < df_1$TIMEDIFF[i]){ # final condition
df_1[i, "EV"] <- 4
}
}
df_1
# A tibble: 3 x 4
ID DATETIME TIMEDIFF EV
<chr> <dttm> <dttm> <dbl>
1 A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
2 B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
3 B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
One option using nested mapply, is to first split df_ID1 and df_ID0 based on ID. Calculate the difference in time between each value in df_ID1 with that of df_ID0 of same ID. Get the index of "smaller than but closest to" and store it in inds and change the value to 4 if the value of corresponding TIMEDIFF column is greater than the matched DATETIME value.
df_ID1$EV[unlist(mapply(function(x, y) {
mapply(function(p, q) {
vals = as.numeric(difftime(p, y$DATETIME))
inds = which(vals == min(vals[vals > 0]))
q > y$DATETIME[inds]
}, x$DATETIME, x$TIMEDIFF)
}, split(df_ID1, df_ID1$ID), split(df_ID0, df_ID0$ID)))] <- 4
df_ID1
# ID DATETIME TIMEDIFF EV
#1 A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
#2 B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
#3 B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
data
df_ID0 <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("A",
"B"), class = "factor"), DATETIME = structure(c(1553529720, 1553547300,
1553570520, 1554208980, 1554346980, 1554491520), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -6L), class = "data.frame")
df_ID1 <- structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), DATETIME = structure(c(1553551980, 1554249600,
1554352500), class = c("POSIXct", "POSIXt"), tzone = ""), TIMEDIFF =
structure(c(1553530380,
1554228000, 1554330900), class = c("POSIXct", "POSIXt"), tzone = ""),
EV = c(1, 1, 1)), row.names = c(NA, -3L), class = "data.frame")
I have 3 data frames, df1 = a time interval, df2 = list of IDs, df3 = list of IDs with associated date.
df1 <- structure(list(season = structure(c(2L, 1L), .Label = c("summer",
"winter"), class = "factor"), mindate = structure(c(1420088400,
1433131200), class = c("POSIXct", "POSIXt")), maxdate = structure(c(1433131140,
1448945940), class = c("POSIXct", "POSIXt")), diff = structure(c(150.957638888889,
183.040972222222), units = "days", class = "difftime")), .Names = c("season",
"mindate", "maxdate", "diff"), row.names = c(NA, -2L), class = "data.frame")
df2 <- structure(list(ID = c(23796, 23796, 23796)), .Names = "ID", row.names = c(NA,
-3L), class = "data.frame")
df3 <- structure(list(ID = c("23796", "123456", "12134"), time = structure(c(1420909920,
1444504500, 1444504500), class = c("POSIXct", "POSIXt"), tzone = "US/Eastern")), .Names = c("ID",
"time"), row.names = c(NA, -3L), class = "data.frame")
The code should compare if df2$ID == df3$ID. If true, and if df3$time >= df1$mindate and df3$time <= df1$maxdate, then df1$maxdate - df3$time, else df1$maxdate - df1$mindate. I tried using the ifelse function. This works when i manually specify specific cells, but this is not what i want as I have many more (uneven rows) for each of the dfs.
df1$result <- ifelse(df2[1,1] == df3[1,1] & df3[1,2] >= df1$mindate & df3[1,2] <= df1$maxdate,
difftime(df1$maxdate,df3[1,2],units="days"),
difftime(df1$maxdate,df1$mindate,units="days")
EDIT: The desired output is (when removing last row of df2):
season mindate maxdate diff result
1 winter 2015-01-01 2015-05-31 23:59:00 150.9576 days 141.9576
2 summer 2015-06-01 2015-11-30 23:59:00 183.0410 days 183.0410
Any ideas? I don't see how I could merge dfs to make them of the same length. Note that df2 can be of any row length and not affect the code. Issues arise when df1 and df3 differ in # of rows.
The > and < are vectorized:
transform(df1,result=ifelse(df3$ID%in%df2$ID & df3$time>mindate & df3$time <maxdate, difftime(maxdate,df3$time),difftime(maxdate,mindate)))
season mindate maxdate diff result
1 winter 2014-12-31 21:00:00 2015-05-31 20:59:00 150.9576 days 141.9576
2 summer 2015-05-31 21:00:00 2015-11-30 20:59:00 183.0410 days 183.0410
You can also use the between function from data.table library
library(data.table)
transform(df1,result=ifelse(df3$ID%in%df2$ID&df3$time%between%df1[2:3],
difftime(maxdate,df3$time),difftime(maxdate,mindate)))
season mindate maxdate diff result
1 winter 2014-12-31 21:00:00 2015-05-31 20:59:00 150.9576 days 141.9576
2 summer 2015-05-31 21:00:00 2015-11-30 20:59:00 183.0410 days 183.0410
I have one text file that look like:
wd <- read.table("C:\\Users\\value.txt", sep ='' , header =TRUE)
head(wd) # hourly values
# Year day hour mint valu1
# 1 2002 1 7 30 0.5
# 2 2002 1 8 0 0.3
# 3 2002 1 8 30 0.4
I want to add another column with format od date like this:
"2002-01-01 07:30:00 UTC"
Thanks for your help
Try this. No packages are used:
transform(wd,
Date = as.POSIXct(paste(Year, day, hour, mint), format = "%Y %j %H %M", tz = "UTC")
)
## Year day hour mint valu1 Date
## 1 2002 1 7 30 0.5 2002-01-01 07:30:00
## 2 2002 1 8 0 0.3 2002-01-01 08:00:00
## 3 2002 1 8 30 0.4 2002-01-01 08:30:00
Note: Input is:
wd <- structure(list(Year = c(2002L, 2002L, 2002L), day = c(1L, 1L,
1L), hour = c(7L, 8L, 8L), mint = c(30L, 0L, 30L), valu1 = c(0.5,
0.3, 0.4)), .Names = c("Year", "day", "hour", "mint", "valu1"
), class = "data.frame", row.names = c(NA, -3L))
You might be able to simplify things with a package like lubridate but I think to illustrate the solution this will work for you. Next time it would save time for people answering if you provide code to create the sample data like I've done here.
d <- read.table(header=T, stringsAsFactors=F, text="
Year day hour mint valu1
2002 1 7 30 0.5
2002 1 8 0 0.3
2002 1 8 30 0.4
")
require(stringr)
d$datetime <- strptime(
paste0(
d$Year, "-",
str_pad(d$day,3,pad="0"),
str_pad(d$hour,2,pad="0"),
":",
str_pad(d$mint, 2, pad="0")
),
format="%Y-%j %H:%M"
)