I have a following dataframe in R
Date Value
1986-01-02 25.67
1986-01-03 23.56
1986-01-06 34.56
1986-01-07 23.77
1986-01-08 25.67
1986-01-09 26.56
1986-01-10 25.56
1986-01-13 28.77
.
.
.
2018-07-03 73.45
2018-07-04 74.34
2018-07-05 73.45
2018-07-06 74.34
2018-07-09 72.34
Date column is in POSIXct format and excluding weekends (Saturday and Sunday).I want to convert it into a daily time series in R.
I am doing following
ts_object <- ts(df,frequency = 365)
It gives me following ts
Time Series:
Start = c(1, 1)
End = c(23, 193)
Frequency = 365
Date Value
1.000000 505008000 25.67
1.002740 505094400 23.56
1.005479 505353600 34.56
Why its not taking date in correct format. Am I setting the frequency right for daily time series object?
You will have to add missing data (Saturday and Sunday) because frequency = 365 doesn't exclude the missing data.
One way to generate that is as follows
df <- data.frame( Date = seq(as.Date("1986-01-02"), as.Date("1986-01-07"), 1))
df$Date <- as.character(df$Date)
ds <- read.table(text = "Date Value
1986-01-02 25.67
1986-01-03 23.56
1986-01-06 34.56
1986-01-07 23.77", header = T)
df <- merge(df, ds, by = "Date", all.x = T)
df[is.na(df)] <- 0
df
Date Value
1 1986-01-02 25.67
2 1986-01-03 23.56
3 1986-01-04 0.00
4 1986-01-05 0.00
5 1986-01-06 34.56
6 1986-01-07 23.77
Related
year_month amount_usd
201501 -390217.24
201502 230944.09
201503 367259.69
201504 15000.00
201505 27000.21
201506 38249.65
df <- structure(list(year_month = 201501:201506, amount_usd = c(-390217.24,
230944.09, 367259.69, 15000, 27000.21, 38249.65)), class = "data.frame", row.names = c(NA,
-6L))
I want to bring it in to DD/MM/YYYY format for usability in Prophet Forecasting code.
this is what i have tried so far.
for (loopitem in loopvec){
df2 <- subset(df, account_id==loopitem)
df3 <- df2[,c("year_month","amount_usd")]
df3$year_month <- as.Date(df3$year_month, format="YYYY-MM", origin="1/1/1970")
try <- prophet(df3, seasonality.mode = 'multiplicative')
}
Error in fit.prophet(m, df, ...) :
Dataframe must have columns 'ds' and 'y' with the dates and values respectively.
You need to paste the day number (I'm just using the first) to the year_month values, then can use the ymd() function from lubridate to convert the column to a date object.
library(dplyr)
library(lubridate)
mutate_at(df, "year_month", ~ymd(paste(., "01")))
year_month amount_usd
1 2015-01-01 -390217.24
2 2015-02-01 230944.09
3 2015-03-01 367259.69
4 2015-04-01 15000.00
5 2015-05-01 27000.21
6 2015-06-01 38249.65
I'm trying to figure out the way of creating sequence of dates and time in this format: 2018-01-01 01:00 till 2018-03-30 01:00
for each Patient and fill the new empty value with random numbers.
My data look like :
Patients temperature
Patient1 37
Patient2 36
Patient3 35.4
I want to get the data looks like
Patients temperature Time
Patient1 37 2018-01-01 01:00
Patient2 36 2018-01-01 01:00
Patient3 35.4 2018-01-01 01:00
Patient1 NA 2018-01-01 02:00
Patient2 NA 2018-01-01 02:00
Patient3 NA 2018-01-01 02:00
Patient1 NA 2018-01-01 03:00
Patient2 NA 2018-01-01 03:00
Patient3 NA 2018-01-01 03:00
So the Time variable will be till 2018-03-30 01:00 and the temperature can be NA and then I generate random numbers but not repeating the same values of the temperature of each Patient.
I tried this commands but didn't work and I don't know how to assign the time to each Patient
Time <- seq (from=as.POSIXct("2018-1-1 01:00"), to=as.POSIXct("2018-3-30 01:00", tz="UTC"), by="hour")
And I tried too this command but I got error message:
dt = data.table(ID = Sensor7$StationID,Time = seq (from=as.POSIXct("2018-01-01 02:00"), to=as.POSIXct("2018-03-30 01:00",format = "%Y-%m-%d %H:%M",by="hour")))
But it gave me error message:
Error in seq.POSIXt(from = as.POSIXct("2018-01-01 00:00"), to = as.POSIXct("2018-03-30 23:00", :
exactly two of 'to', 'by' and 'length.out' / 'along.with' must be specified
Does anyone have any idea how to get the data in the format I'm looking for pleas?
You weren't too far off. Try this:
# I reproduce your data:
library(data.table)
data = data.table::fread(input =
"Patients,temperature
Patient1,37
Patient2,36
Patient3,35.4")
library(dplyr)
Time <- seq (from=as.POSIXct("2018-1-1 01:00"), to=as.POSIXct("2018-3-30 01:00", tz="UTC"), by="hour")
And this should do what you want:
data %>%
group_by(Patients) %>%
do({data.frame("temperature" = c(.data$temperature, rep(NA,length(Time) - nrow(.data))), Time)})
Here's one way:
dat = data.frame(Patients=paste0("Patients", 1:3), temperature=c(37,36,35.4))
Time = seq(as.POSIXct("2018-01-01 01:00"), as.POSIXct("2018-03-30 01:00"), by="hour")
new.data = data.frame(
Patient = rep(dat$Patients, each=length(Time)),
Time = rep(Time, length(dat$Patients))
)
I'm not sure how you want to generate the random values, but here's a generic method:
new.data$Random.Temperature = rnorm(nrow(new.data), 35, 1)
In a dataframe, I have wind speed data measured four times a day, at 00:00, 06:00, 12:00 and 18:00 o'clock. To combine these with other data, I need to fill the time in between towards a resolution of 15 minutes. I would like to fill the gaps by simple interpolation.
The following example produces two corresponding sample dataframes. df1 and df2 need to be merged. In the resulting merged dataframe, the gap values between the 6-hourly values (where var == NA?) need to be filled by a simply mean interpolation. My problem is how to merge both and do the concrete interpolation between the given values.
First dataframe
Creation:
# create a corresponding sample data frame
df1 <- data.frame(
date = seq.POSIXt(
from = ISOdatetime(2015,10,1,0,0,0, tz = "GMT"),
to = ISOdatetime(2015,10,14,23,59,0, tz= "GMT"),
by = "6 hour"
),
windspeed = abs(rnorm(14*4, 10, 4)) # abs() because windspeed shoud be positive
)
Resulting dataframe:
> # show the head of the dataframe
> head(df1)
date windspeed
1 2015-10-01 00:00:00 17.928217
2 2015-10-01 06:00:00 11.306025
3 2015-10-01 12:00:00 6.648131
4 2015-10-01 18:00:00 10.320146
5 2015-10-02 00:00:00 2.138559
6 2015-10-02 06:00:00 9.076344
Second dataframe
Creation:
# create a 2nd corresponding sample data frame
df2 <- data.frame(
date = seq.POSIXt(
from = ISOdatetime(2015,10,1,0,0,0, tz = "GMT"),
to = ISOdatetime(2015,10,14,23,59,0, tz= "GMT"),
by = "15 min"
),
var = abs(rnorm(14*24*4, 300, 100))
)
Resulting dataframe:
> # show the head of the 2nd dataframe
> head(df2)
date var
1 2015-10-01 00:00:00 198.2657
2 2015-10-01 00:15:00 472.9041
3 2015-10-01 00:30:00 605.8776
4 2015-10-01 00:45:00 429.0949
5 2015-10-01 01:00:00 400.2390
6 2015-10-01 01:15:00 317.1503
This is a solution
First merge them to get using all = TRUE to get all values
df3 <- merge(df1, df2, all = TRUE)
Then use approx for Interpolation
df3$windspeed <- approx(x = df1$date, y = df1$windspeed, xout = df2$date)$y
The only problem there is that the las ones will be NA unless your last value of windspeed is there, but everything in between will be there
Date DE VE
12/1/2016 93.387 0.095
11/1/2016 77.968 0.095
10/1/2016 65.184 0.095
9/1/2016 63.984 0.095
8/1/2016 67.657 0.095
%m/%d/%Y
DE and VE are daily averages. How to convert from daily average to monthly total in R based on the actual days in that month? Total for 12/2016 =93.387*31. Need to calculate the monthly total for all 10*12 months from 2006-01 to 2016-12.
To find the number of days in a month you can use the days_in_month function in the lubridate package.
The argument takes a datetime object so you have to convert your Date column to a known date/datetime-based class (i.e. "POSIXct, POSIXlt, Date, chron, yearmon, yearqtr, zoo, zooreg, timeDate, xts, its, ti, jul, timeSeries, and fts objects").
Then you can just mutate your df with the multiplicated daily averages.
library(lubridate)
library(dplyr)
myDf <- read.table(text = "Date DE VE
12/1/2016 93.387 0.095
11/1/2016 77.968 0.095
10/1/2016 65.184 0.095
9/1/2016 63.984 0.095
8/1/2016 67.657 0.095", header = TRUE)
mutate(myDf, Date = as.Date(Date, format = "%m/%d/%Y"),
monthlyTotalDE = DE * days_in_month(Date),
monthlyTotalVE = VE * days_in_month(Date))
# Date DE VE monthlyTotalDE monthlyTotalVE
# 1 2016-12-01 93.387 0.095 2894.997 2.945
# 2 2016-11-01 77.968 0.095 2339.040 2.850
# 3 2016-10-01 65.184 0.095 2020.704 2.945
# 4 2016-09-01 63.984 0.095 1919.520 2.850
# 5 2016-08-01 67.657 0.095 2097.367 2.945
EDIT
In mutate if you use a new column name, it will append this column to the data frame.
If you want to avoid to add columns, you have to keep the columns names that already exist, it will overwrite these columns e.g.
mutate(myDf, Date = as.Date(Date, format = "%m/%d/%Y"),
DE = DE * days_in_month(Date),
VE = VE * days_in_month(Date))
# Date DE VE
# 1 2016-12-01 2894.997 2.945
# 2 2016-11-01 2339.040 2.850
# 3 2016-10-01 2020.704 2.945
# 4 2016-09-01 1919.520 2.850
# 5 2016-08-01 2097.367 2.945
If you have a lot of columns to compute, I suggest you to use mutate_each, it's very powerfull and will save you the pain to do it manualy with mutate or the loss of performance by doing a traditional loop.
Use vars to include/exclude variables in mutate.
You can exclude variables manualy using the variable name prececed by a minus :
vars = -Date or use a vector to exclude several variables vars = c(Date, DE).
Or you can also use special specification functions as in dplyr::select, see ?dplyr::select for more informations.
Warning : If you use vars to include variables, don't explicit the named argument vars = in your function if you want to keep the column names.
one_of(c("DE", "VE")), DE:VE... To drop variables, use - before the function : -contains("Date")
myDf %>%
mutate(Date = as.Date(Date, format = "%m/%d/%Y")) %>%
mutate_each(funs(. * days_in_month(Date)),
vars = -Date)
# Date DE VE
# 1 2016-12-01 2894.997 2.945
# 2 2016-11-01 2339.040 2.850
# 3 2016-10-01 2020.704 2.945
# 4 2016-09-01 1919.520 2.850
# 5 2016-08-01 2097.367 2.945
I am quite new to R, and I am trying to find a way to average continuous data into a specific period of time.
My data is a month recording of several parameters with 1s time steps
The table via read.csv has a date and time in one column and several other columns with values.
TimeStamp UTC Pitch Roll Heave(m)
05-02-13 6:45 0 0 0
05-02-13 6:46 0.75 -0.34 0.01
05-02-13 6:47 0.81 -0.32 0
05-02-13 6:48 0.79 -0.37 0
05-02-13 6:49 0.73 -0.08 -0.02
So I want to average the data in specific intervals: 20 min for example in a way that the average for hour 7:00, takes all the points from hour 6:41 to 7:00 and returns the average in this interval and so on for the entire dataset.
The time interval will look like this :
TimeStamp
05-02-13 19:00 462
05-02-13 19:20 332
05-02-13 19:40 15
05-02-13 20:00 10
05-02-13 20:20 42
Here is a reproducible dataset similar to your own.
meteorological <- data.frame(
TimeStamp = rep.int("05-02-13", 1440),
UTC = paste(
rep(formatC(0:23, width = 2, flag = "0"), each = 60),
rep(formatC(0:59, width = 2, flag = "0"), times = 24),
sep = ":"
),
Pitch = runif(1440),
Roll = rnorm(1440),
Heave = rnorm(1440)
)
The first thing that you need to do is to combine the first two columns to create a single (POSIXct) date-time column.
library(lubridate)
meteorological$DateTime <- with(
meteorological,
dmy_hm(paste(TimeStamp, UTC))
)
Then set up a sequence of break points for your different time groupings.
breaks <- seq(ymd("2013-02-05"), ymd("2013-02-06"), "20 mins")
Finally, you can calculate the summary statistics for each group. There are many ways to do this. ddply from the plyr package is a good choice.
library(plyr)
ddply(
meteorological,
.(cut(DateTime, breaks)),
summarise,
MeanPitch = mean(Pitch),
MeanRoll = mean(Roll),
MeanHeave = mean(Heave)
)
Please see if something simple like this works for you:
myseq <- data.frame(time=seq(ISOdate(2014,1,1,12,0,0), ISOdate(2014,1,1,13,0,0), "5 min"))
myseq$cltime <- cut(myseq$time, "20 min", labels = F)
> myseq
time cltime
1 2014-01-01 12:00:00 1
2 2014-01-01 12:05:00 1
3 2014-01-01 12:10:00 1
4 2014-01-01 12:15:00 1
5 2014-01-01 12:20:00 2
6 2014-01-01 12:25:00 2
7 2014-01-01 12:30:00 2
8 2014-01-01 12:35:00 2
9 2014-01-01 12:40:00 3
10 2014-01-01 12:45:00 3
11 2014-01-01 12:50:00 3
12 2014-01-01 12:55:00 3
13 2014-01-01 13:00:00 4