Extract numbers from string as numeric or dates in R - r

I am working with some hdf5 data sets. However, the dates are stored in the file and no hint of these dates from the file name. The attribute file consists of day of the year, month of the year, day of the month and year columns.
I would like to pull out data to create time series identity for each of the files i.e.year month date format that can be used for time series.
A sample of the data can be downloaded here:
[ ftp://l5eil01.larc.nasa.gov/tesl1l2l3/TES/TL3COD.003/2007.08.31/TES-Aura_L3-CO_r0000006311_F01_09.he5 ]
There is an attribute group file and a data group file.
I use the R library "rhdf5" to explore the hdf5 files. E.g
CO1<-h5ls ("TES-Aura_L3-CO_r0000006311_F01_09.he5")
Attr<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5","HDFEOS INFORMATION/coremetadata")
Data<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5", "HDFEOS\SWATHS\ColumnAmountNO2\Data Fields\ColumnAmountNO2Trop")
The Attr when read consist of a long string with the only required information being "2007-08-31" which is the date of acquisition. I have been able to extract this using the Stringr library:
regexp <- "([[:digit:]]{4})([-])([[:digit:]]{2})([-])([[:digit:]]{2})"
Date<-str_extract(Attr,pattern=regexp)
which returns the Date as:
"2007-08-31"
The only problem left now is that the Date isnt recognised as numeric or date. How do I change this as I need to bind the Date with the data for all days to create a time series (more like an identifier as the data sets are irregular), please? a sample of how it looks after extracting the dates from string and binding with the CO values for each date is below
Dates CO3b
[1,] "2011-03-01" 1.625811e+18
[2,] "2011-03-04" 1.655504e+18
[3,] "2011-03-11" 1.690428e+18
[4,] "2011-03-15" 1.679871e+18
[5,] "2011-03-17" 1.705987e+18
[6,] "2011-03-17" 1.661198e+18
[7,] "2011-03-17" 1.662694e+18
[8,] "2011-03-20" 1.520328e+18
[9,] "2011-03-21" 1.510642e+18
[10,] "2011-03-21" 1.556637e+18
However, R recognises these dates as character and not as date. I need to convert them to a time series I can work with.

Seems like you've already done all the hard work! Based off your comment, here's how you could take it across the finish line.
From your comment, seems like you have the strings in a good format. Given that your variable is named date, simply go
dateObjects<-as.Date(Date) #where Date is your variable
and either the single value or vector of character strings (as the format you gave in the comment) will now be date objects, which you could use with a library like zoo to create time series.
If your strings are not necessarily in the format you've described, then refer to the following link to see how to format other string forms as dates.
http://www.statmethods.net/input/dates.html
Given your example data frame you can create a time series in the following way, using the package zoo.
library(zoo)
datavect<-as.zoo(df$CO3b)
index(datavect)<-as.Date(df$Date)
here we take your CO data, covert it to a zoo object, then assign the appropriate date to each entry, converting it from a character to a date object. Now if you print datavect, you'll see each data entry attached to a date. This allows you to take advantage of zoo methods, such as merge and window.

Here is one approach not using string extraction. If you know how long your time series should be, which you should based on the length of your dataset and knowledge of its periodicity, you could just create a regular date series and then add that into a data.frame with other variables of interest. Assuming you have daily data the below would work. Obviously your length.out would be different.
d1 <- ISOdate(year=2007,month=8,day=31)
d2 <- as.Date(format(seq(from=d1,by="day",length.out=10),"%Y-%m-%d"))
[1] "2007-08-31" "2007-09-01" "2007-09-02" "2007-09-03" "2007-09-04" "2007-09-05" "2007-09-06" "2007-09-07" "2007-09-08" "2007-09-09"
class(d2)
[1] "Date"
Edit of Original:
Oh I see. Well after reading in your new data example the below worked for me. It was a pretty straight forward transform. cheers
library(magrittr) # Needed for the pipe operator %>% it makes it really easy to string steps together.
dateData
Dates CO3b
1 2011-03-01 1.63e+18
2 2011-03-04 1.66e+18
3 2011-03-11 1.69e+18
4 2011-03-15 1.68e+18
5 2011-03-17 1.71e+18
6 2011-03-17 1.66e+18
7 2011-03-17 1.66e+18
8 2011-03-20 1.52e+18
9 2011-03-21 1.51e+18
10 2011-03-21 1.56e+18
dateData %>% sapply(class) # classes before transforming (character,numeric)
dateData[,1] <- as.Date(dateData[,1]) # Transform to date
dateData %>% sapply(class) # classes after transforming (Date,numeric)
str(dateData) # one more check
'data.frame': 10 obs. of 2 variables:
$ Dates: Date, format: "2011-03-01" "2011-03-04" "2011-03-11" "2011-03-15" ...
$ CO3b : num 1.63e+18 1.66e+18 1.69e+18 1.68e+18 1.71e+18 ...

Related

Extract time from factor column in R

I would like to extract the time from a table column sd_data$start in R with the following characteristics:
str(sd_data$start)
Factor w/ 122 levels "01/03/2017 08:00",..: 1 2 5 10 12 14 18 19 20 21 ...
I found similar questions on the forum but so far all the answers have only given me NAs or blank values (00:00:00) so I see no other option than raise the question again specifically for my dataset.
I have managed to extract the dates and move them to a new column in the table with little effort and I am very surprised how difficult it is (for me at least) to do the same for hours, minutes and seconds. I must be overlooking something.
sd_data$start_date <- as.Date(sd_data$start,format='%d/%m/%Y')
sd_data$start_time <-
Thanks in advance for helping me to find the right lines of code to complete this task.
Here an example of what I am trying to do and where I am failing to get the time out.
smpldata <- "01/03/2017 08:00"
smpltime <-as.Date(as.character(smpldata),format='%d/%m/%Y %M:%S')
smpltime
# [1] 08:00 = what I would like to see
# [1] "2017-03-01" = what I am seeing
Maybe using as.character() to convert to character before convert to date, because the factor type is not well transformed. And including the other string elements on the date format as suggested above by Sotos.
sd_data$start_date <-
as.Date(as.character(sd_data$start),
format='%d/%m/%Y %H:%M:%S')
Another tip is to take a look at lubridate package. It's very usefull for this kind of task.
library(lubridate)
smpldata <- as.factor("01/03/2017 08:00")
(smpltime <-dmy_hm(as.character(smpldata)))
[1] "2017-03-01 08:00:00 UTC"
Here you still see the date. You can handle just the time for plots and other needs using hour() and minute().
hour(smpltime)
[1] 8
minute(smpltime)
[1] 0
Or you can use the format() function to get exactly what you want.
format(smpltime, "%H:%M:%S")
[1] "08:00:00"
format(smpltime, "%H:%M")
[1] "08:00"

How to extract date from time series and convert it to date in R

I have a dataset consisting of 4 variables namely date,Gold price,crude price and dollar price. I converted the class of the data to time series using ts() function. After converting, the dates got changed into some value. I am able to retrieve the dates from converted values using as.Date() function.Now I want to replace the date values by date itself in the ts object's date variable.
#CONVERTING DATA FRAME TO TIME SERIES OBJECT.
Gold.ts <- ts(Gold,start=Gold$DATE[1])
head(Gold.ts)
#OUTPUT.
DATE GOLD.PRICE CRUDE DOLLAR.INR
[1,] 13152 533.9 63.42 44.705
[2,] 13153 526.3 62.79 44.600
[3,] 13154 539.7 64.21 44.320
head(as.Date(index(Gold.ts)))
[1] "2006-01-04" "2006-01-05" "2006-01-06" "2006-01-07" "2006-01-08" "2006- 01-09"
Gold.ts$DATE <- as.Date(index(Gold.ts)) # This won't work because $ is not acceptable to extract variables from a time series object.
index(Gold.ts) <- as.Date(index(Gold.ts)) #This should work but gives error. How to display date instead of values in time series object i.e Gold.ts?
What is the right way to do it?

Convert characters to dates in a panel data set

I am downloading and using the following panel data,
# load / install package
library(rsdmx)
library(dplyr)
# Total
Assets.PIT <- readSDMX("http://widukind-api.cepremap.org/api/v1/sdmx/IMF/data/IFS/..Q.BFPA-BP6-USD")
Assets.PIT <- as.data.frame(Assets.PIT)
names(Assets.PIT)[10]<-"A.PI.T"
names(Assets.PIT)[6]<-"Code"
AP<-Assets.PIT[c("WIDUKIND_NAME","Code","TIME_PERIOD","A.PI.T")]
AP<-rename(AP, Country=WIDUKIND_NAME, Year=TIME_PERIOD)
My goal is to convert the column vector Year in the dataframe AP into a vector of class dates. In other words, I want R to understand the time series part of my panel data. For your information, I have quarterly data, with unbalanced date range across cross sections (in my case countries).
head(AP$Year)
[1] "2008-Q2" "2008-Q3" "2008-Q4" "2009-Q1" "2009-Q2" "2009-Q3"
Or,
AP$Year<-as.factor(AP$Year)
head(AP$Year)
[1] 2008-Q2 2008-Q3 2008-Q4 2009-Q1 2009-Q2 2009-Q3
264 Levels: 1950-Q1 1950-Q2 1950-Q3 1950-Q4 1951-Q1 1951-Q2 1951-Q3 1951-Q4 1952-Q1 1952-Q2 1952-Q3 ... 2015-Q4
Is there any easy solution to convert these character dates into time-series dates?
library(zoo)
as.Date(as.yearqtr(AP$year, format ='%YQ-%q'))
This should do it.

Converting numeric dates into Hours since format

I'm having problems converting dates into an "hours since" format, I don't think I need to apply some of the more complex methods of date manipulation, but I might be wrong, I was hoping someone might know a good solution?
The data I have is in a table format, which I read in from a text file. A 3 line example of the 5,000+ rows of data is;
date1 <- matrix(c(2007,2007,2007, 12,12,12,1,2,3,0.365,0.096,-0.416),nrow=3)
Which prints out as:
date1
[,1] [,2] [,3] [,4]
[1,] 2007 12 1 0.365
[2,] 2007 12 2 0.096
[3,] 2007 12 3 -0.416
The first column is the year, the second the month, and third the day. The value in the 4th column is an index value relevant to my study.
The data I would like to match the index value is in a slightly odd format, of hours since "1800-01-01"
ftime <- c(1822548, 1822572, 1822596)
ftime can be printed as just a date, via the following function.
as.Date(ftime/24,"1800-01-01")
[1] "2007-12-01" "2007-12-02" "2007-12-03"
My code all uses the numeric values in ftime to match data, but I cannot seem to work out how to format the new data (data1) into the same.
I have the feeling it should be a simple solution, but cannot seem to get it to work.
Help is always greatly appreciated!
You can use the difftime function, if I got what you want:
#setting the origin
myorigin<-as.Date("1800-01-01")
#converting date1 to Date objects
myDates<-as.Date(do.call(function(...) paste(...,sep="-"),as.data.frame(date1[,1:3])))
#get the results
difftime(myDates,myorigin,units="hour")
#Time differences in hours
#[1] 1822536 1822560 1822584

Converting hour, minutes columns in dataframe to time format

I have a dataframe containing 3 columns including minutes and hours.
I want to convert these columns (namely minutes and column) to time format. Given the data in drame:
Score Hour Min
10 10 56
23 17 01
I would like to get:
Score Time
10 10:56:00
23 17:01:00
You could use ISOdatetime to convert the numbers in the hour and min to a POSIXct object. However, a POSIXct object is only defined when it also includes a year, month and day. So depending on your needs to can do several things:
If you need a real time object which is correctly printed in graphs for example and can be used in arithmetic (addition, subtraction), you need to use ISOdatetime. ISOdatetime returns a so called POSIXct object, which is an R object which represents time. Then in ISOdatetime you just use fixed values for year, month, and day. This ofcourse only works if your dataset does not span multiple years.
If you just need a character column Time, you can convert the POSIXct output to string using strftime. By setting the format argument to "%H:%M:00". In this case however, you could also use sprintf to create the new character column without converting to POSIXct: sprintf("%s:%s:00", drame$Hour, drame$Min).
You can use paste() function to merge the two column data into a char and then use strptime() to convert to timestamp
x<-1:6
##1 2 3 4 5 6
y<-8:13
## 8 9 10 11 12 13
timestamp <- paste(x,":",y,":00",sep="")
timestamp
will result in
#1:8:00 2:9:00 3:10:00 4:11:00 5:12:00 6:13:00
If you prefer to convert this to timestamp object try using
strptime(mergedData,"%H:%M:%S")
## uses current date by default
if you happen to have Date in another column use paste() to make a char formattted date and use below to get date time
##strptime(mergedData,"%d/%m/%Y %H:%M:%S")

Resources