When I turn the date vector into a dataframe the dates turn into a integer. This integer has a very strange format and I can not understand how to turn it in to a date (including seconds).
vdates <- seq(c(ISOdate(2018,10,01)), by = "3 sec", length.out = 200000)
dfDates <- as.data.frame((matrix(vdates)))
dfDates$id <- 1:nrow(dfDates)
colnames(dfDates) <- c("Dates", "Id")
Related
I have created a sequence of dates with this script:
dates<-seq(
from=as.POSIXct("2015-1-1 0","%Y-%m-%d %H", tz="UTC"),
to=as.POSIXct("2015-12-31 24", "%Y-%m-%d %H", tz="UTC"),
by="hour"
)
Now I want to store the result to the first column of empty dataframe:
df<-data.frame(Date=as.POSIXct(character()),Area=character(), Application=character(), Type= character(),
Reading=double())
using this code
df$Date<-dates
but it gives me error:
Error in `$<-.data.frame`(`*tmp*`, "Date", value = c(1420070400, 1420074000, :
replacement has 8761 rows, data has 0
Can anyone help me to sort out this issue please?
A data.frame needs columns of equal length and cannot have one column containing 8761 observations, and the rest 0. A workaround is to initialize a data.frame with the correct dimensions for your data, filled by NA; and then assign columns.
# Initialize df
df <- data.frame(matrix(NA, nrow = length(dates), ncol = 5))
# Define names of cols and add column
names(df) <- c("Date", "Area", "Application", "Type", "Reading")
df$Date <- dates
I have a data frame with daily data. I need to bind it to hourly data, but first I need to convert it to a suitable posixct format. This looks like this:
set.seed(42)
df <- data.frame(
Date = seq.Date(from = as.Date("2015-01-01", "%Y-%m-%d"), to = as.Date("2015-01-29", "%Y-%m-%d"), by = "day"),
var1 = runif(29, min = 5, max = 10)
)
result <- data.frame(
Date = d <- seq.POSIXt(from = as.POSIXct("2015-01-01 00:00:00", "%Y-%m-%d %H:%M:%S", tz = ""),
to = as.POSIXct("2015-01-29 23:00:00", "%Y-%m-%d %H:%M:%S", tz = ""), by = "hour"),
var1 = rep(df$var1, each = 24) )
However, my data is not as easy to work with as the above. I have lots of missing dates, so I need to be able to take the specific df$Date-vector and convert it to a posixct frame, with the matching daily values.
I've looked high and low but been unable to find anything on this.
The way I went about this was to find the min and max of the dataset and deem them hour 0 and hour 23.
hourly <- data.frame(Hourly=seq(min(as.POSIXct(paste0(df$Date, "00:00:00"),tz="")),max(as.POSIXct(paste0(df$Date, "23:00:00"),tz="")),by="hour"))
hourly[,"Var1"] <- df[match(x = as.Date(hourly$Hourly),df$Date),"var1"]
This achieves a result of the daily values becoming hourly with the daily var1 assigned to each hour that contains the day. In this respect missing daily values should not be an issue and if there is no match, it will add in NA's.
Lets say I have dataframe consisting of 3 columns with dates:
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data.frame(index.new, begin.new, end.new)
My problem: I want to select (subset) the rows, where the interval of begin and end-date is within 4 days before the index-day. This is obviously only in row no 2.
Can you help me out here?
Your way to express the problem is messy, in the first case dates.new[1]>dates.new[2] and in the second case dates.new[3]<dates.new[4]. Making things proper:
interval1 = c(dates.new[2], dates.new[1])
interval2 = c(dates.new[3],dates.new[4])
If you wanna check interval2 CONTAINS interval1:
all.equal(findInterval(interval1, interval2),c(1,1))
Pleas let me know if this works and if is what you want
library("timeDate")
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data <- data.frame(index.new, begin.new, end.new)
apply(data, 1, function(x){paste(x[1]) %in% paste(timeSequence(x[2], x[3], by = "day"))})
I am subtracting dates in xts i.e.
library(xts)
# make data
x <- data.frame(x = 1:4,
BDate = c("1/1/2000 12:00","2/1/2000 12:00","3/1/2000 12:00","4/1/2000 12:00"),
CDate = c("2/1/2000 12:00","3/1/2000 12:00","4/1/2000 12:00","9/1/2000 12:00"),
ADate = c("3/1/2000","4/1/2000","5/1/2000","10/1/2000"),
stringsAsFactors = FALSE)
x$ADate <- as.POSIXct(x$ADate, format = "%d/%m/%Y")
# object we will use
xxts <- xts(x[, 1:3], order.by= x[, 4] )
#### The subtractions
# anwser in days
transform(xxts, lag = as.POSIXct(BDate, format = "%d/%m/%Y %H:%M") - index(xxts))
# asnwer in hours
transform(xxts, lag = as.POSIXct(CDate, format = "%d/%m/%Y %H:%M") - index(xxts))
Question: How can I standardise the result so that I always get the answer in hours. Not by multiplying the days by 24 as I will not know before han whther the subtratcion will round to days or hours....
Unless I can somehow check if the format is in days perhaps using grep and regexand then multiply within an if clause.
I have tried to work through this and went for the grep regex apprach but this doesnt even keep the negative sign..
p <- transform(xxts, lag = as.POSIXct(BDate, format = "%d/%m/%Y %H:%M") - index(xxts))
library(stringr)
ind <- grep("days", p$lag)
p$lag[ind] <- as.numeric( str_extract_all(p$lag[ind], "\\(?[0-9,.]+\\)?")) * 24
p$lag
#2000-01-03 2000-01-04 2000-01-05 2000-01-10
# 36 36 36 132
I am convinced there is a more elegant solution...
ok difftime works...
transform(xxts, lag = difftime(as.POSIXct(BDate, format = "%d/%m/%Y %H:%M"), index(xxts), unit = "hours"))
I have a data.frame of a time series of data, I would like to thin the data by only keeping the entries that are measured on every even day number. For example:
set.seed(1)
RandData <- rnorm(100,sd=20)
Locations <- rep(c('England','Wales'),each=50)
today <- Sys.Date()
dseq <- (seq(today, by = "1 days", length = 100))
Date <- as.POSIXct(dseq, format = "%Y-%m-%d")
Final <- data.frame(Loc = Locations,
Doy = as.numeric(format(Date,format = "%j")),
Temp = RandData)
So, how would I reduce this data frame to only contain every entry that is measured on even numbered days such as Lloc, day, and temp on day 172, day 174 and so on...
What about:
Final[Final$Doy%%2==0,]