i would like to load the following data structure as a time series into R:
Date 06:00 07:00 .... 22:00
01.11.2011 1 4 .... 42
02.11.2011 6 2 .... 21
...
is this loadable with R ? Do i need to transform my data ?
can anybody help me with this?
First create some data:
Lines <- "Date 06:00 07:00 08:00
01.11.2011 1 4 42
02.11.2011 6 2 21"
DF <- read.table(text = Lines, header = TRUE, check.names = FALSE)
Now create zoo object z using chron date/times:
library(zoo)
library(chron)
tt <- as.chron(outer(DF[[1]], names(DF)[-1], paste), format = "%d.%m.%Y %H:%M")
z <- zoo(c(as.matrix(DF[-1])), tt)
(Replacing as.chron with as.POSIXct would give POSIXct date/times.)
Related
I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:
dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24
The date and time format is in the wrong way for R, so I had to adjust it:
extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
I ordered the data with frequency in a table using the following code:
timeserieswithoutzeros <- table(extracted_times)
The data looks something like this now:
2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00
1 2 5
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00
2 1 1
As you may see there are a lot of unobserved dates and times.
I want to add these unobserved dates and times with the frequency of 0.
I tried the complete function, but the error states that it can't best used, because I use as.POSIXct().
Any ideas?
As already mentinoned in the comments by #eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.
timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1
Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.
bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at),
max(bedrijf.CSV$viewed_at),
by = "1 mins")
tmp <- data.frame(viewed_at = new)
Now see if these values are in the original data.
tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)
sum(tbl != 0)
#[1] 7
Final clean up.
rm(new, tmp)
I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"
I had a column with date datatype. in my column the dates are in 4/1/2007 format. now I want to extract month value from that column and date value from that column in different column in R. my date are from 01/01/2012 to 01/01/ 2015 plz help me.
If your variable is date type (as you say in the post) simply use following to extract month:
month_var = format(df$datecolumn, "%m") # this will give output like "09"
month_var = format(df$datecolumn, "%b") # this will give output like "Sep"
month_var = format(df$datecolumn, "%B") # this will give output like "September"
If your date variable in not in date format, then you will have to convert them into date format.
df$datecolumn<- as.Date(x, format = "%m/%d/%Y")
Assuming your initial data is character and not POSIX.
df <- data.frame(d = c("4/1/2007", "01/01/2012", "02/01/2015"),
stringsAsFactors = FALSE)
df
# d
# 1 4/1/2007
# 2 01/01/2012
# 3 02/01/2015
These are not yet "dates", just strings.
df$d2 = as.POSIXct(df$d, format = "%m/%d/%Y")
df
# d d2
# 1 4/1/2007 2007-04-01
# 2 01/01/2012 2012-01-01
# 3 02/01/2015 2015-02-01
Now they proper dates (in the R fashion). These two lines extract just a single component from each "date"; see ?strptime for details on all available formats.
df$dY = format(df$d2, "%Y")
df$dm = format(df$d2, "%m")
df
# d d2 dY dm
# 1 4/1/2007 2007-04-01 2007 04
# 2 01/01/2012 2012-01-01 2012 01
# 3 02/01/2015 2015-02-01 2015 02
An alternative method would be to extract the substrings from each string, but now you're getting into regex-pain; for that, I'd suggest sticking with somebody else's regex lessons-learned, and translate through POSIXct (or even POSIXlt if you want).
The format of my excel data file is:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:04:00 1
I open my file with this:
ts = read.csv(file=pathfile, header=TRUE, sep=",")
How can I add additional rows with zero number in column “value” into the data frame. Output example:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:03:00 0
01-01-2000 00:04:00 1
This is now completely automated in the padr package. Takes only one line of code.
original <- data.frame(
day = as.POSIXct(c("01-01-2000 00:00:00",
"01-01-2000 00:01:00",
"01-01-2000 00:02:00",
"01-01-2000 00:04:00"), format="%m-%d-%Y %H:%M:%S"),
value = c(4, 3, 1, 1))
library(padr)
library(dplyr) # for the pipe operator
original %>% pad %>% fill_by_value(value)
See vignette("padr") or this blog post for its working.
I think this is a more general solution, which relies on creating a sequence of all timestamps, using that as the basis for a new data frame, and then filling in your original values in that df where applicable.
# convert original `day` to POSIX
ts$day <- as.POSIXct(ts$day, format="%m-%d-%Y %H:%M:%S", tz="GMT")
# generate a sequence of all minutes in a day
minAsNumeric <- 946684860 + seq(0,60*60*24,by=60) # all minutes of your first day
minAsPOSIX <- as.POSIXct(minAsNumeric, origin="1970-01-01", tz="GMT") # convert those minutes to POSIX
# build complete dataframe
newdata <- as.data.frame(minAsPOSIX)
newdata$value <- ts$value[pmatch(newdata$minAsPOSIX, ts$day)] # fill in original `value`s where present
newdata$value[is.na(newdata$value)] <- 0 # replace NAs with 0
Try:
ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]
Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:
ts.out$day = as.factor(ts.out$day)
Tidyr offers the nice complete function to generate rows for implicitly missing data. I use replace_na to turn NA values to 0 in second step.
ts%>%
tidyr::complete(day=seq.POSIXt(min(day), max(day), by="min"))%>%
dplyr::mutate(value=tidyr::replace_na(value,0))
Notice that I set the granularity of the dates to minutes since your dataset expects a row every minute.
After converting a date/time character string into POSIXlt using strptime, I am left with the following (data truncated for ease here):
DateTime North South West East Seast System
1 2008-09-12 01:00:00 1919.9 3721.4 2085.9 2565.5 2571.1 12863.8
2 2008-09-12 02:00:00 1827.0 3518.1 1965.3 2396.9 2410.7 12118.0
3 2008-09-12 03:00:00 1755.4 3388.4 1866.8 2338.7 2335.2 11684.5
4 2008-09-12 04:00:00 1733.5 3327.1 1810.0 2295.6 2290.2 11456.4
5 2008-09-12 05:00:00 1742.7 3327.3 1831.4 2314.2 2302.3 11517.9
6 2008-09-12 06:00:00 1912.2 3504.4 1986.7 2515.0 2502.6 12420.9
I then have aggregated the data (seemingly right) into year-month averages using the following snippet of code:
North_Monthly_Avg <- aggregate(North, list(Date=format(DateTime, "%Y-%m")),mean)
which yields the following:
Date x
1 2008-09 2192.066
2 2008-10 1885.074
3 2008-11 1675.373
4 2008-12 1637.231
5 2009-01 1752.693
6 2009-02 1743.393
I can plot the 'x' values but cannot get the year-months to label properly on the x-axis since it is only plotting the index. Not sure what I am missing...I have played around with axis.POSIXct, but have no luck.
Try zoo and lattice:
library(zoo)
library(lattice)
dat <- 'Date Time North South West East Seast System
2008-09-12 01:00:00 1919.9 3721.4 2085.9 2565.5 2571.1 12863.8
2008-09-12 02:00:00 1827.0 3518.1 1965.3 2396.9 2410.7 12118.0
2008-09-12 03:00:00 1755.4 3388.4 1866.8 2338.7 2335.2 11684.5
2008-09-12 04:00:00 1733.5 3327.1 1810.0 2295.6 2290.2 11456.4
2008-09-12 05:00:00 1742.7 3327.3 1831.4 2314.2 2302.3 11517.9
2008-09-12 06:00:00 1912.2 3504.4 1986.7 2515.0 2502.6 12420.9'
z <- read.zoo(text = dat, header = TRUE, index.column = 1:2, tz = "")
xyplot(z)
zAgg <- aggregate(z$North, by = as.yearmon, FUN = mean)
dat2 <- 'Date x
2008-09 2192.066
2008-10 1885.074
2008-11 1675.373
2008-12 1637.231
2009-01 1752.693
2009-02 1743.393'
zAgg <- read.zoo(text = dat2, header = TRUE, FUN = as.yearmon)
plot(zAgg, xaxt = "n")
tt <- time(zAgg)
m <- format(tt, "%m")
axis(side = 1, at = tt, labels = ifelse(m == "01", trunc(tt), m), cex.axis = .7)
Try using as.integer() on the date
North_Monthly_Avg <- aggregate(North, list(Date=as.integer(format(DateTime, "%Y-%m"))),mean)
#user1062431,
To edit the tick names to your preferred format, edit the m <- format(tt, "%m") line in the answer of Oscar.
To get the format 12 - 2008 you need to modify:
m <- format(tt, "%m") to m <- format(tt, "%m - %Y")
To get the format dec 2008 you need to modify:
m <- format(tt, "%m") to m <- format(tt, "%b %Y")
I think the problem is that there is no date. You will have to settle with a 1st of the month or 15th of the month and apply that to your aggregated table.
I came up with this:
North_Monthly_Avg=aggregate(North,by=list(format(DateTime,'%Y-%m')),mean)
names(North_Monthly_Avg)=c('Month','North')
North_Monthly_Avg$day=15
North_Monthly_Avg$Date=paste(North_Monthly_Avg$Month,North_Monthly_Avg$day,sep='-')
North_Monthly_Avg$Date=strptime(North_Monthly_Avg$Date,'%Y-%m-%d')
plot(m$Date,m$North,xaxt='n') # the xaxt='n' removes any ticks on the x axis
axis(1,as.numeric(m$Date),labels=format(m$Date,'%Y-%m')) # formats the x axis to your liking
I am fairly new to R, so this may not be the most elegant solution, but it will work.
Replace the 15 with 1 in the $day line if you prefer 1st of the month and the sep in paste should be changed to '-0'.
The problem you're having is because you are using format to create the groupings to use for the subdivision. This makes the values into strings, so that plotting functions don't know to plot them like dates.
The cut function has a cut.POSIXlt variant that will do exactly what you need, and preserve the type information so that all the plotting stuff will just work.
Instead of
North_Monthly_Avg <- aggregate(North, list(Date=format(DateTime, "%Y-%m")),mean)
Just use
North_Monthly_Avg <- aggregate(North, cut(DateTime, "month"), mean)
You could can try the package openair and use it's function timeAverage
Hourly to monthly
library(openair)
mydata$date <- as.POSIXct(strptime(mydata$date, format = "%d/%m/%Y %H:%M", tz = "GMT"))
hourly<-timeAverage(mydata, average.time = "day")