Hello can you help me with difference in (hours) in R from one column.
I use only basic package R. I would like to create new column with hours
so the column look like
hours<-c(0,24,23,21,31,26,28)
time<-c('10. 4. 2018 10:16:11',
'11. 4. 2018 10:16:15',
'12. 4. 2018 10:13:31',
'13. 4. 2018 8:16:31',
'14. 4. 2018 15:16:21',
'15. 4. 2018 17:16:31',
'16. 4. 2018 19:15:31')
I have one colum (time) and i would like to create new column (hours)
thanks
Enhancing Sotos' approach,
c(0, round(diff(as.POSIXct(time, format = '%d. %m. %Y %H:%M:%S'), units = "hours")))
comes close to OP's expected result
[1] 0 24 24 22 31 26 26
Data
time <- c(
'10. 4. 2018 10:16:11',
'11. 4. 2018 10:16:15',
'12. 4. 2018 10:13:31',
'13. 4. 2018 8:16:31',
'14. 4. 2018 15:16:21',
'15. 4. 2018 17:16:31',
'16. 4. 2018 19:15:31'
)
Another way is the following.
First coerce to class POSIXct.
time <- as.POSIXct(time, format = "%d. %m. %Y %H:%M:%S")
Now use difftime, it will give the result in the required units.
c(0, difftime(time[-1], time[-length(time)]))
#[1] 0.00000 24.00111 23.95444 22.05000 30.99722 26.00278 25.98333
The rounded output is simple to obtain.
round(c(0, difftime(time[-1], time[-length(time)])))
#[1] 0 24 24 22 31 26 26
Related
My data set is monthly from Jan 1997 to Dec 2021. I need the month code to be in the correct format, however as.date doesn't recognise the cell contents as they are. Please help.
Month BrentSpot GDP Agriculture Production Construction Services
1 Jan-1997 23.54 63.8229 53.5614 81.9963 87.2775 59.4453
2 Feb-1997 20.85 64.7182 53.9091 82.1917 87.8350 60.5018
3 Mar-1997 19.13 64.9264 54.2569 81.6142 88.6714 60.8375
4 Apr-1997 17.56 65.2327 55.1264 82.0006 89.5170 61.0981
5 May-1997 19.02 64.7336 55.8220 82.0093 89.8144 60.4470
6 Jun-1997 17.58 65.1322 56.3438 82.3350 89.4891 60.8886
Gdp_Brent_Table$Month = seq(ymd('1997-01-01'),ymd('2021-12-01'), by = 'months')
(this seemed to do the trick)
I have the following vector, which contains data for each day of December.
vector1 <- c(1056772, 674172, 695744, 775040, 832036,735124,820668,1790756,1329648,1195276,1267644,986716,926468,828892,826284,749504,650924,822256,3434204,2502916,1262928,1025980,1828580,923372,658824,956916,915776,1081736,869836,898736,829368)
Now I want to create a time series object on a weekly basis and used the following code snippet:
weeklyts = ts(vector1,start=c(2016,12,01), frequency=7)
However, the starting and end points are not correct. I always get the following time series:
> weeklyts
Time Series:
Start = c(2017, 5)
End = c(2021, 7)
Frequency = 7
[1] 1056772 674172 695744 775040 832036 735124 820668 1790756 1329648 1195276 1267644 986716 926468 828892 826284 749504
[17] 650924 822256 3434204 2502916 1262928 1025980 1828580 923372 658824 956916 915776 1081736 869836 898736 829368
Does anybody nows what I am doing wrong?
To get a timeseries that starts and ends as you would expect, you need to think about the timeserie. You have 31 days from december 2016.
The timeserie start option handles 2 numbers, not 3. So something like c(2016, 1) if you start with month 1 in 2016. See following example.
ts(1:12, start = c(2016, 1), frequency = 12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 1 2 3 4 5 6 7 8 9 10 11 12
Now ts and daily data is an annoyance. ts cannot handle leap years. That is why you see people using a frequency of 365.25 to get an annual timeseries. To get a good december 2016 series we can do the following:
ts(vector1, start = c(2016, 336), frequency = 366)
Time Series:
Start = c(2016, 336)
End = c(2016, 366)
Frequency = 366
[1] 1056772 674172 695744 775040 832036 735124 820668 1790756 1329648 1195276 1267644 986716 926468 828892 826284 749504
[17] 650924 822256 3434204 2502916 1262928 1025980 1828580 923372 658824 956916 915776 1081736 869836 898736 829368
Note the following things that are going on:
Frequence is 366 because 2016 is a leap year
start is c(2016, 336), because 336 is the day in the year on "2016-12-01"
Personally I use xts package (and zoo) to handle daily data and use the functions in xts to aggregate to weekly timeseries. These can then be used with packages that like ts timeseries like forecast.
edit: added small xts example
my_df <- data.frame(dates = seq.Date(as.Date("2016-12-01"), as.Date("2017-01-31"), by = "day"),
var1 = rep(1:31, 2))
library(xts)
my_xts <- xts(my_df[, -1], order.by = my_df$dates)
# rollup to weekly. Dates shown are the last day in the weekperiod.
my_xts_weekly <- period.apply(my_xts, endpoints(my_xts, on = "weeks"), colSums)
head(my_xts_weekly)
[,1]
2016-12-04 10
2016-12-11 56
2016-12-18 105
2016-12-25 154
2017-01-01 172
2017-01-08 35
Depending on your needs you can transform this back into data.frames etc etc. Read the help for period.apply as you can specify your own functions in the rolling mechanism. And read the xts (and zoo) vignettes.
I have previous experience of matlab but very new to R. The basic problem that I am having is like this -
I have a data which has 10 columns. The first 6 columns correspond to year, month, day, hour min and secs.
E.g data_example =
2013 6 15 11 15 0 ...
2013 6 15 11 20 0 ...
2013 6 15 11 25 0 ...
In matlab for dealing with dates as numbers I used to easily compute that using datenum(data_example(:,1:6))
but in R what is the best way to go about getting similar numerical representation of the 6 columns.
Here are some alternatives. They all make use of ISOdatetime :
1) Assuming DF is your data frame try ISOdatetime like this:
DF$datetime <- ISOdatetime(DF[[1]], DF[[2]], DF[[3]], DF[[4]], DF[[5]], DF[[6]])
2) or like this:
DF$datetime <- do.call(ISOdatetime, setNames(as.list(DF[1:6]), NULL))
3a) If this is a time series suitable for zoo (distinct times and all numeric) then we could use read.zoo in the zoo package together with ISOdatetime like this:
library(zoo)
z <- read.zoo(DF, index = 1:6, FUN = ISOdatetime)
3b) or using read.zoo to read from a file or character string (latter shown here):
# sample input lines
Lines <- "2013 6 15 11 15 0 1
2013 6 15 11 20 0 2
2013 6 15 11 25 0 3
"
library(zoo)
z <- read.zoo(text = Lines, index = 1:6, FUN = ISOdatetime)
which gives this zoo series:
> z
2013-06-15 11:15:00 2013-06-15 11:20:00 2013-06-15 11:25:00
1 2 3
Use the parse_date_time function from the Lubridate package.
x <- paste0(data_example[,1:6])
x <- parse_date_time(x,"%y%m%d %H%M")
More information in the documentation
EDIT
#joran told me to test it, and it didn't work, so I made some modifications:
data_example = data.frame(t(c(13,2,9,14,30)))
x <- paste0(data_example[,1:3],collapse="-")
y <- paste0(data_example[,4:5],collapse=":")
xy<- paste(x,y)
xy <- parse_date_time(xy,"%y%m%d %H%M")
xy
# "2013-02-09 14:30:00 UTC"
I don't know if there is a cleaner way to do it
The units of the returned value are a bit different in R than in Matlab (see comment in code). Also, since you have other columns in your data frame, you will first need to subset the data frame to contain only the relevant (6) date columns, then add them back to the data frame as a new column at the end.
test <- data.frame("year"=c(2013, 2013, 2013, 2001, 1970)
, "month"=c(6,6, 6, 4, 1)
, "day"=c(15,15, 15, 19, 1)
, "hour"=c(11,11, 11, 11, 0)
, "min"=c(15,20, 25, 30, 0)
, "second"=c(0,0, 0 ,0, 0))
# pad to the right # of digits
dates00 <- apply(test, c(1,2), sprintf, fmt="%02s")
# combine the date components in each row into a single string
dates0 <- apply(dates00, 1, paste, collapse=" ")
#format to a date object
dates <- as.POSIXct(dates0, format="%Y %m %d %H %M %S")
# numbers are seconds since "1970-01-01 00:00:00 UTC"; according
# to the help file for daynum, Matlab returns the number (from
# daynum) as fractional days since "January 0, 0000"
as.numeric(dates)
I am trying to do something which seems simple but is proving a bit of a challenge so I hope someone can help!
I have a time series of observations of temperature:
Lines <-"1971-01-17 298.9197
1971-01-17 298.9197
1971-02-16 299.0429
1971-03-17 299.0753
1971-04-17 299.3250
1971-05-17 299.5606
1971-06-17 299.2380
2010-07-14 298.7876
2010-08-14 298.5529
2010-09-14 298.3642
2010-10-14 297.8739
2010-11-14 297.7455
2010-12-14 297.4790"
DF <- read.table(textConnection(Lines), col.names = c("Date", "Value"))
DF$Date <- as.Date(DF$Date)
mean.ts <- aggregate(DF["Value"], format(DF["Date"], "%m"), mean)
This produces:
> mean.ts
Date Value
1 01 1.251667
2 02 1.263333
This is just an example -- my data is for many years so I can calculate a full monthly average of the data.
What I then want to do is calculate the difference in for all of the January's (individually) with the mean January I have calculated above.
If I move away from using Date/Time class I could do this with some loops but I want to see if there is a "neat" way to do this in R? Any ideas?
You can just add the year as an aggregating variable. This is easier using the formula interface:
> aggregate(Value~format(Date,"%m")+format(Date,"%Y"),data=DF,mean)
format(Date, "%m") format(Date, "%Y") Value
1 01 1971 298.9197
2 02 1971 299.0429
3 03 1971 299.0753
4 04 1971 299.3250
5 05 1971 299.5606
6 06 1971 299.2380
7 07 2010 298.7876
8 08 2010 298.5529
9 09 2010 298.3642
10 10 2010 297.8739
11 11 2010 297.7455
12 12 2010 297.4790
At least as I understand your question you want the differences of each month with the mean of those months, so you probably you want to use ave rather than aggregate:
diff.mean.ts <- ave(DF[["Value"]],
list(format(DF[["Date"]], "%m")), FUN=function(x) x-mean(x) )
If you wanted it in the same dataframe, then just assign it as a column:
DF$ diff.mean.ts <- diff.mean.ts
The ave function is designed for adding columns to existing dataframes because it returns a vector of the same length as the number of values in the its first argument, in this case DF[["Value"]]. In the present instance it returns all 0's which is the correct answer because there is only one value for each month.
I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.