Error in converting character [duplicate] - r

This question already has answers here:
How to convert time to decimal
(3 answers)
Closed 5 years ago.
I have data like this.
> head(new3)
Date Hour Dayahead Actual Difference
1 2015-01-01 0:00 42955 42425 530
2 2015-01-01 0:15 42412 42021 391
3 2015-01-01 0:30 41901 42068 -167
4 2015-01-01 0:45 41355 41874 -519
5 2015-01-01 1:00 40710 41230 -520
6 2015-01-01 1:15 40204 40810 -606
Their characteristics are as below:
> str(new3)
'data.frame': 35044 obs. of 5 variables:
$ Date : Date, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-
01-01" ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead : chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
39671 ...
$ Difference: chr "530" "391" "-167" "-519" ...
I tried to change Hour and Dayahead as numberic by doing as.numeric. But it shows me this.
> new3$Dayahead<-as.numeric(new3$Dayahead)
Warning message:
NAs introduced by coercion
> new3$Hour<-as.numeric(new3$Hour)
Warning message:
NAs introduced by coercion
So when I checked with str again, it showed me this.
> str(new3)
'data.frame': 35044 obs. of 5 variables:
$ Date : Date, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-
01-01" ...
$ Hour : num NA NA NA NA NA NA NA NA NA NA ...
$ Dayahead : num 42955 42412 41901 41355 40710 ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
39671 ...
$ Difference: chr "530" "391" "-167" "-519" ...
questions is,
1) why do I have 'NAs introduced by coercion' warning message?
2) How can I solve the problem above?
3) Why do I get NA data for Hour and how can I solve it?
Thank you.

As already mentioned in the comments, if your string contains a non-numeric character (i.e., ":" in your Hour column), you cannot convert it to numeric, that's why you get NA.
I am not sure why do you want to convert your times to numeric, but if you'd like to perform some operations on it (e.g., calculate time differences) then you should convert your dates to Posix format. In your case run:
new3$fulldate <- as.POSIXlt(paste(new3$Date, new3$Hour, sep = " "))

Try this:
hour <- c("0:00", "0:15", "0:30", "0:45", "1:00", "1:15")
replace the : per . And you could convert
hour <- gsub(":", ".", hour)
hour <- as.numeric(hour)
hour
[1] 0.00 0.15 0.30 0.45 1.00 1.15

Related

How to convert a column(class is POSIXct) in a preloaded dataset in R to a character column?

I have this preloaded dataset in R that I got from library(dplyr). The dataset names pedestrian.
enter image description here
I want to convert a column Date_Time whose class is S3: POSIXct to chr.
I used as.character() but it didn't give me what I want. My goal is to create a new dataset that exclude Date_Time column. Select() still keep Date_Time column.
You have some incorrect understandings of where you got the data. There is no such dataset in dplyr. How do I know this?
data(package="dplyr")
# ---- returns ---
Data sets in package ‘dplyr’:
band_instruments Band membership
band_instruments2 Band membership
band_members Band membership
starwars Starwars characters
storms Storm tracks data
There is a dataset named "pedestrian" in the naniar package and it does have entries that match values in the fragment of data imaged in the link (although the column names and column order are different). Specifically:
grep("Birrarung", (naniar::pedestrian$sensor_name) ) # returns 8455+1001 values)
But your dataset has too many entries to be an exact match. Your dataset has 66,037 lines, somewhat less than double the number of rows in naniar::pedestrian.
str(naniar::pedestrian)
tibble [37,700 × 9] (S3: tbl_df/tbl/data.frame)
$ hourly_counts: int [1:37700] 883 597 294 183 118 68 47 52 120 333 ...
$ date_time : POSIXct[1:37700], format: "2016-01-01 00:00:00" "2016-01-01 01:00:00" "2016-01-01 02:00:00" "2016-01-01 03:00:00" ...
$ year : int [1:37700] 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ month : Ord.factor w/ 12 levels "January"<"February"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ month_day : int [1:37700] 1 1 1 1 1 1 1 1 1 1 ...
$ week_day : Ord.factor w/ 7 levels "Sunday"<"Monday"<..: 6 6 6 6 6 6 6 6 6 6 ...
$ hour : int [1:37700] 0 1 2 3 4 5 6 7 8 9 ...
$ sensor_id : int [1:37700] 2 2 2 2 2 2 2 2 2 2 ...
$ sensor_name : chr [1:37700] "Bourke Street Mall (South)" "Bourke Street Mall (South)" "Bourke Street Mall (South)" "Bourke Street Mall (South)" ...
Your request seems a bit confusing. "My goal is to create a new dataset that exclude Date_Time column. Select() still keep Date_Time column." There is no column with that name in the pedestrian dataset (since capitalization needs to be exact), but if there were, it would be simple matter to exclude it:
new_dat <- old_dat[!"Date_Time" %in% names(old_dat)]
# And maybe
new_dat <- old_dat %>% select(-Date_Time)
It would also be a simple matter to destructively convert to character with:
new_dat$Date_Time <- as.character(old_dat$Date_Time)
Since you have not shown what code you tried and cannot even tell us where the data comes from, we can only speculate what you did and how you are failing.
Maybe someone else created a datset from the publicly accessible data at https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-Monthly-counts-per-hour/b2ak-trbp
AHA! Found it!
https://tsibble.tidyverts.org/reference/pedestrian.html
str(pedestrian)
tbl_ts [66,037 × 5] (S3: tbl_ts/tbl_df/tbl/data.frame)
$ Sensor : chr [1:66037] "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" ...
$ Date_Time: POSIXct[1:66037], format: "2015-01-01 00:00:00" "2015-01-01 01:00:00" "2015-01-01 02:00:00" "2015-01-01 03:00:00" ...
$ Date : Date[1:66037], format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
$ Time : int [1:66037] 0 1 2 3 4 5 6 7 8 9 ...
$ Count : int [1:66037] 1630 826 567 264 139 77 44 56 113 166 ...
- attr(*, "key")= tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
..$ Sensor: chr [1:4] "Birrarung Marr" "Bourke Street Mall (North)" "QV Market-Elizabeth St (West)" "Southern Cross Station"
..$ .rows : list<int> [1:4]
.. ..$ : int [1:14566] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ : int [1:16414] 14567 14568 14569 14570 14571 14572 14573 14574 14575 14576 ...
.. ..$ : int [1:17518] 30981 30982 30983 30984 30985 30986 30987 30988 30989 30990 ...
.. ..$ : int [1:17539] 48499 48500 48501 48502 48503 48504 48505 48506 48507 48508 ...
.. ..# ptype: int(0)
..- attr(*, ".drop")= logi TRUE
- attr(*, "index")= chr "Date_Time"
..- attr(*, "ordered")= logi TRUE
- attr(*, "index2")= chr "Date_Time"
- attr(*, "interval")= interval [1:1] 1h
..# .regular: logi TRUE
And now I see that the inimitable Rob Hyndman is behind this project: https://robjhyndman.com/hyndsight/tsibbles/
And I guess I am a bit behind the times or perhaps I should say I'm behind the time series. There are over 500 hits for an SO search on "[r] tsibble". Now I need to ask; did this answer your question?
Testing:
new_ped <- pedestrian %>% select(-Date_Time) # fails
pedestrian$Date_Time <- as.character(pedestrian$Date_Time) # succeeds
The tsibble::pedestrian dataset is an S4 object and may not behave in the same manner as ordinary R objects. But my error was in using the wrong operator for column removal.
?select # need to choose dplyr version rather than MASS version
Need to use ! rather than "-":
new_ped <- pedestrian %>% select(!Date_Time)
> str(new_ped)
tibble [66,037 × 4] (S3: tbl_df/tbl/data.frame)
$ Sensor: chr [1:66037] "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" ...
$ Date : Date[1:66037], format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
$ Time : int [1:66037] 0 1 2 3 4 5 6 7 8 9 ...
$ Count : int [1:66037] 1630 826 567 264 139 77 44 56 113 166 ...

Why does cut() turn my POSIXct vector into a factor vector and what can I do to stop this?

How can I use cut while maintaining the POSIXct class of my date.time vector?
library(data.table)
library(lubridate)
Some data:
air.temp <- c(-1.7202,-1.6524,-1.5689,-1.6785,-1.6060,-1.8843)
soil.temp <- c(3.6972,3.6839,3.6716,3.6586,3.6460,3.6701)
date.time <- c('2007-01-01 00:05:00','2007-01-01 00:10:00',
'2007-01-01 00:15:00','2007-01-01 00:20:00',
'2007-01-01 00:25:00','2007-01-01 00:30:00')
DT <- data.table(date.time, air.temp, soil.temp)
DT[, date.time := parse_date_time(date.time, 'YmdHMS')]
Structure shows the date.time column is in the desired POSIXTct format:
str(DT)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ date.time: POSIXct, format: "2007-01-01 00:05:00" ...
$ air.temp : num -1.72 -1.65 -1.57 -1.68 -1.61 ...
$ soil.temp: num 3.7 3.68 3.67 3.66 3.65 ...
- attr(*, ".internal.selfref")=<externalptr>
Now I cut five minute data to fifteen minute:
DT_15_min <- DT[, lapply(.SD, mean), by=(date.time = cut(date.time, "15 min"))]
Structure shows the conversion to factor vector:
str(DT_15_min)
Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
$ date.time: Factor w/ 2 levels "2007-01-01 00:05:00",..: 1 2
$ air.temp : num -1.65 -1.72
$ soil.temp: num 3.68 3.66
- attr(*, ".internal.selfref")=<externalptr>
Is it possible to cut while maintaining POSIXct vector class?
My desired result is to have my data agregated from a five minute interval to a fifteen minute interval while maintaining the original class of the vector (POSIXct in this case).
As always, I am grateful for any advice.
cut is designed to return factors. If you want to group by 15 min intervals, you could try using the rounding functions from lubridate, e.g.
DT_15_min <- DT[, lapply(.SD, mean), by=(date.time = floor_date(date.time, "15 mins"))]
str(DT_15_min)
Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
$ date.time: POSIXct, format: "2007-01-01 00:00:00" "2007-01-01 00:15:00" ...
$ air.temp : num -1.69 -1.62 -1.88
$ soil.temp: num 3.69 3.66 3.67
- attr(*, ".internal.selfref")=<externalptr>
you can also use dplyr:
df=tibble(date.time, air.temp, soil.temp)%>%mutate(date.time=ceiling_date(ymd_hms(date.time),unit="15 mins"))%>%
group_by(date.time)%>%summarize_all(funs(mean))

ifelse Statement Returning Number Instead Of Date

I have a series of dates in my code that are in an ifelse statement, that are returning a single numerical value instead of a date.
osa <- read.delim("C:/RMathew/RScripts/osaevents/osaevents.txt", stringsAsFactors=TRUE)
#
osa$datetime <- ymd_hms(osa$datetime)
osa$date <- as.Date(osa$datetime)
sixoclock <- 6*60*60
osa$daystart <- ymd_hms(ymd(osa$date) + sixoclock)
osa$dateplus <- osa$date + 1
osa$dateminus <- osa$date - 1
osa$dayend <- ymd_hms(ymd(osa$dateplus) + sixoclock)
osa$dateloca <- osa$datetime >= osa$daystart
osa$datelocb <- osa$datetime < osa$dayend
osa$milldate <- ifelse(osa$dateloca==TRUE & osa$datelocb==TRUE,
osa$date,osa$dateminus)
The place where this data originates considers the time between 6 AM on any given day to 6 AM the following day, as one day. The code above is trying to compare the date to the question of is it after 6 AM on a particular day, but before 6 AM on the following day, to assign it the earlier day's date (for whatever day it might be).
So far so good, but it returns a single number for the osa$milldate instead of the dates in the ifelse columns.
'data.frame': 897 obs. of 16 variables:
$ datetime : POSIXct, format: "2015-08-13 15:11:53" "2015-08-13 14:53:26" "2015-08-13 14:34:58" "2015-08-13 14:16:18" ...
$ stream : Factor w/ 1 level "fc": 1 1 1 1 1 1 1 1 1 1 ...
$ fe : num 18.1 18 17.6 18.1 18.5 ...
$ ni : num 2.97 2.99 2.92 3.2 3.32 ...
$ cu : num 3.41 3.35 2.99 3.58 3.73 ...
$ pd : num 138 157 139 166 183 ...
$ mg : num 13.8 13.8 14.4 14.3 13.9 ...
$ so : num 9.67 9.81 9.65 10.58 11.37 ...
$ date : Date, format: "2015-08-13" "2015-08-13" "2015-08-13" "2015-08-13" ...
$ daystart : POSIXct, format: "2015-08-13 06:00:00" "2015-08-13 06:00:00" "2015-08-13 06:00:00" "2015-08-13 06:00:00" ...
$ dateplus : Date, format: "2015-08-14" "2015-08-14" "2015-08-14" "2015-08-14" ...
$ dateminus: Date, format: "2015-08-12" "2015-08-12" "2015-08-12" "2015-08-12" ...
$ dayend : POSIXct, format: "2015-08-14 06:00:00" "2015-08-14 06:00:00" "2015-08-14 06:00:00" "2015-08-14 06:00:00" ...
$ dateloca : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ datelocb : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ milldate : num 16660 16660 16660 16660 16660 ...
Thoughts? Also, there is likely to be a more elegant way to do this.
See the help file for ifelse
Warning:
The mode of the result may depend on the value of ‘test’ (see the
examples), and the class attribute (see ‘oldClass’) of the result
is taken from ‘test’ and may be inappropriate for the values
selected from ‘yes’ and ‘no’.
Sometimes it is better to use a construction such as
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
, possibly extended to handle missing values in ‘test’.
This describes precisely what is going on in your example -- the date class attribute is lost -- and a work around -- a multi-step approach.
osa$milldate <- osa$date
ind<- osa$dateloca==TRUE & osa$datelocb==TRUE
osa$milldate[!ind] <- osa$dateminus
Another option is replace.
A. Webb set me on the right path. The ifelse class was stripping the answer of the date format. The solution above with the index seemed to jumble the date order for some reason. As A. Webb pointed out, in the help file, the following line fixed it immediately.
class(osa$milldate) <- class(osa$date)

Create ITime intervals in data.table

I have a datetime variable (vardt) as a character in large data table. E.g. "21/07/2011 15:54:57"
I can turn it into ITime class (e.g. 15:54:57) with DT[,newtimevar:=as.ITime(substr(DT$vardt,12,19))] but I would like to create groups of minutes, so from 21/07/2011 15:54:57 I would obtain 15:54:00 or 15:54.
I have tried: DT[,cuttime := as.ITime(cut(DT$vardt, breaks = "1 min",))]
but it didn't work. I am reading the zoo package documentation but I haven't found anything yet. Any idea/function that could be useful for this case in a large data table?
Here are two possible approaches:
library(data.table)
##
x <- Sys.time()+sample(seq(0,24*3600,60),101,TRUE)
x <- gsub(
"(\\d+)\\-(\\d+)\\-(\\d+)",
"\\3/\\2/\\1",
x)
##
DT <- data.table(vardt=x)
##
DT[,time:=as.ITime(substr(vardt,12,19))]
##
DT[,hour_min:=as.ITime(
gsub("(\\d+)\\:(\\d+)\\:(\\d+)",
"\\1\\:\\2\\:00",time))]
DT[,c_hour_min:=substr(time,1,5)]
##
R> head(DT)
vardt time hour_min c_hour_min
1: 28/01/2015 05:38:30 05:38:30 05:38:00 05:38
2: 27/01/2015 14:15:30 14:15:30 14:15:00 14:15
3: 28/01/2015 06:03:30 06:03:30 06:03:00 06:03
4: 28/01/2015 00:37:30 00:37:30 00:37:00 00:37
5: 27/01/2015 17:59:30 17:59:30 17:59:00 17:59
6: 28/01/2015 03:46:30 03:46:30 03:46:00 03:46
R> str(DT,vec.len=2)
Classes ‘data.table’ and 'data.frame': 101 obs. of 4 variables:
$ vardt : chr "28/01/2015 05:38:30" "27/01/2015 14:15:30" ...
$ time :Class 'ITime' int [1:101] 20310 51330 21810 2250 64770 ...
$ hour_min :Class 'ITime' int [1:101] 20280 51300 21780 2220 64740 ...
$ c_hour_min: chr "05:38" "14:15" ...
- attr(*, ".internal.selfref")=<externalptr>
The first case, hour_min, preserves the ITime class, while the second case, c_hour_min, is just a character vector.

R Converting POSIXlt to xts

I have a time series in the format:
> str(Y$Date)
POSIXlt[1:174110], format: "2001-01-01 12:00:00" "2001-01-01 05:30:00" "2001-01-02 01:30:00" "2001-01-02 02:00:00" "2001-01-02 02:00:00" "2001-01-02 02:01:00" "2001-01-02 04:00:00" "2001-01-02 04:00:00" ...
> summary(Y$Date)
Min. 1st Qu. Median Mean 3rd Qu. Max.
"2001-01-01 05:30:00" "2004-03-15 10:40:30" "2007-01-03 04:00:00" "2006-11-11 15:53:11" "2009-08-13 12:00:00" "2011-12-30 12:30:00"
> length(Y$Date)
[1] 174110
which I need to convert to a xts format. In order to do so I have done the following:
date <- Y$Date
date <- as.xts(date)
> xtsible(date) #tests wheather or not the data is convertibal to format xts
[1] TRUE
However:
> str(date)
An 'xts' object of zero-width
> length(date)
[1] 0
> head(date['2001'])
[,1]
2001-01-01 05:30:00 NA
2001-01-01 12:00:00 NA
2001-01-02 01:30:00 NA
2001-01-02 02:00:00 NA
2001-01-02 02:00:00 NA
2001-01-02 02:00:00 NA
and in order to get the data back into the data frame:
> Y$date <- date
Error in `$<-.data.frame`(`*tmp*`, "date", value = numeric(0)) :
replacement has 0 rows, data has 174110
and
> as.data.frame(date)
Error in data.frame(`coredata(x)` = c(NA_character_, NA_character_, NA_character_, :
duplicate row.names: 2001-01-02 02:00:00, ... , 2001-01-08 06:00:00, 200
In addition: Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
> str(Y)
'data.frame': 174110 obs. of 17 variables:
$ Date : POSIXlt, format: "2001-01-01 12:00:00" "2001-01-01 05:30:00" "2001-01-02 01:30:00" "2001-01-02 02:00:00" ...
$ C : chr "MA" "IN" "SI" "ID" ...
$ Event : chr "MALAY VEHICLE SALES" "Interbank Offer Rate - Percent" "Advance GDP Estimate (YoY)" "Foreign Reserves" ...
$ News : num NA NA NA NA NA NA NA NA NA NA ...
$ Growth : num 148 NA 0.3 387.2 0 ...
$ Surv.M : num NA NA NA NA NA NA NA NA NA NA ...
$ Act : num 30892 NA 10.5 29281.4 12500 ...
$ Prior : num 30744 8100 10.2 28894.2 12500 ...
$ Revised : num NA NA NA NA NA ...
$ Type : chr NA NA "%" "$B" ...
$ Freq. : chr "M" "NA" "Q" "M" ...
$ Ticker : chr "MAVSTTL Index" "IMIBOR Index" "SGAVYOY% Index" "IDGFA Index" ...
$ Period : chr "Nov" "12/31/13" "4Q" "Dec" ...
$ Category: chr "NA" "NA" "NA" "NA" ...
$ Time : chr "12:00:00 AM" "05:30:00 AM" "01:30:00 AM" "02:00:00 AM" ...
$ Country : chr "Malaysia" "India" "Singapore" "Indonesia" ...
$ date : POSIXlt, format: "2001-01-01 12:00:00" "2001-01-01 05:30:00" "2001-01-02 01:30:00" "2001-01-02 02:00:00" ...
I don"t know why I can't properly convert the data into the xts format and then get them back into the dataframe.
Your help is very much appreciated.
I have answered a similar question asked by you previously. I guess it has caused some confusion. When you see ?xts, it says that xts creates an "extensible time-series" object. First we have to specify x, the object of which time-series has to be made and then specify the index, i.e. the time-series itself (Y$Date in your case).
Here is a simplified solution:
Y_new <- xts(x = Y[,-1], order.by = Y$Date]
This creates a new object Y_new in the time-series format with all the data of Y and with an added benefit easily choosing desired time intervals.

Resources