convert date and time from factor to numerics in R - r

I have a dataframe which contains date and time for the columns. Let's name this dataframe date_time. Since the data type is factor type, I would like to convert the whole column of date_time to numerics without changing anything, eg 2020-01-20 14:02:50 to 20200120140250.
I have about 1000 rows of data. Does anyone knows how to produce the output? I have tried as.numeric and gsub but they doesnt work. I think using POSIXct might work but I do not understand the reasoning behind it.
example of my data:
2020-07-08 21:40:26
2020-07-08 16:48:57
2020-07-01 15:54:10
2020-07-13 20:27:06
2020-07-27 16:08:12
and the list goes on.

You can try:
gsub("[[:punct:] ]", "", as.character(as.POSIXct("2020-01-20 14:02:50")))
The as.character keeps the visual output instead working with the underlying numbers.
UDPATE:
date_time <- data.frame(time = as.POSIXct(
c("2020-07-08 21:40:26", "2020-07-08 16:48:57", "2020-07-01 15:54:10",
"2020-07-13 20:27:06", "2020-07-27 16:08:12", "2020-01-20 14:02:50")))
date_time$num_time <- gsub("[[:punct:] ]", "", as.character(date_time$time))

Solution with lubricdate
dt1 <- as.factor(c("2020-07-08 21:40:26", "2020-07-08 16:48:57", "2020-07-01 15:54:10",
"2020-07-13 20:27:06", "2020-07-27 16:08:1"))
dt <- data.frame(date=ymd_hms(dt1))
dt
class(dt$date)
Result
date
1 2020-07-08 21:40:26
2 2020-07-08 16:48:57
3 2020-07-01 15:54:10
4 2020-07-13 20:27:06
5 2020-07-27 16:08:01
> class(dt$date)
[1] "POSIXct" "POSIXt"

Related

How can I save Time data to csv file using write_csv without T or Z character?

I'm trying to save some time information included data with write_csv.
But it keeps showing T and Z character (ISO8601 format as I know).
For example, 2022-12-12 08:00:00 is shown as 2022-12-12T08:00:00Z on csv file open with notepad.
I want keep original data format after saving csv file, but I couldnt find option for this.
Just saw a article about this problem but there is no answer.
Other Q
Here are two solutions.
First write a data set to a temp file.
library(readr)
df1 <- data.frame(datetime = as.POSIXct("2022-12-12 08:00:00"),
x = 1L, y = 2)
csvfile <- tempfile(fileext = ".csv")
# write the data, this is the problem instruction
write_csv(df1, file = csvfile)
Created on 2023-01-31 with reprex v2.0.2
1. Change nothing
This is probably not what you want but read_csv recognizes write_csv's ISO8601 output format, so if the data is written to file with write_csv and read in from disk with read_csv the problem doesn't occur.
# read from file as text, problem format is present
readLines(csvfile)
#> [1] "datetime,x,y" "2022-12-12T08:00:00Z,1,2"
# read from file as spec_tbl_df, problem format is not present
read_csv(csvfile, show_col_types = FALSE)
#> # A tibble: 1 × 3
#> datetime x y
#> <dttm> <dbl> <dbl>
#> 1 2022-12-12 08:00:00 1 2
Created on 2023-01-31 with reprex v2.0.2
2. Coerce to "character"
If the datetime column of class "POSIXct" is coerced to character the ISO8601 format is gone and everything is OK. And afterwards read_csv will recognize the datetime column.
This is done in a pipe, below with the base pipe operator introduced in R 4.1, in order not to change the original data.
# coerce the problem column to character and write to file
# done in a pipe it won't alter the original data set
df1 |>
dplyr::mutate(datetime = as.character(datetime)) |>
write_csv(file = csvfile)
# check result, both are OK
readLines(csvfile)
#> [1] "datetime,x,y" "2022-12-12 08:00:00,1,2"
read_csv(csvfile, show_col_types = FALSE)
#> # A tibble: 1 × 3
#> datetime x y
#> <dttm> <dbl> <dbl>
#> 1 2022-12-12 08:00:00 1 2
Created on 2023-01-31 with reprex v2.0.2
Final clean up.
unlink(csvfile)

How can I convert character vector to numeric-time but keep same structure

I need to convert the "ride_length" column from character vector to numeric while preferably keeping same format HHH:MM:SS. If the only way to accomplish this is to convert to only seconds or minutes, that is an acceptable alternative. Ultimately I need to be able to analyze this data in a meaningful way, which I cannot do while it is in character vector. I have tried strptime(), chron(), POSIXct(), as.numeric() all with no success. "ride_length" was created in EXCEL before being imported.
I found a workaround by creating a new "ride_length" column and then converting to numeric using:
q1_2021$ride_length <- difftime(q1_2021$ended_at, q1_2021$started_at)
q1_2021$ride_length <- as.numeric(as.character(q1_2021$ride_length))
But (if possible) I want to understand how to answer the original question using the EXCEL created "ride_length" column.
Updating with dput(head()) which I'm hoping provides reproducible data. I removed the unnecessary columns:
structure(list(started_at = c("2/6/2021 15:56", "2/5/2021 14:22", "2/6/2021 20:21", "2/27/2021 21:07", "2/20/2021 23:23", "2/28/2021 17:50" ), ended_at = c("2/27/2021 14:06", "2/26/2021 9:42", "2/13/2021 11:28", "3/5/2021 15:11", "2/25/2021 16:12", "3/5/2021 2:14"), ride_length = c("502:09:14", "499:20:38", "159:07:08", "138:04:00", "112:49:54", "104:24:14" )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame" ))
Since there is not a base R function that works for you, why not create your own?
The following function converts durations in the given format to seconds:
as_seconds <- function(durations) {
vapply(strsplit(durations, ":"), function(x) {
sum(c(rep(0, 3 - length(x)), as.numeric(x)) * c(3600, 60, 1))
}, 1)
}
Now, since you don't have reproducible data (we can't copy-paste data from a screen shot), let's create a simple sample vector:
times <- c("332:21:46", "254:12:01", "1:22", "13:12:01")
So we can do:
as_seconds(times)
#> [1] 1196506 915121 82 47521
It's quite reasonable to just use the number of seconds for analysis: remember you can store these in a different column so you can still have the durations in character format for display. There are other things you can do with the seconds, for example convert them into durations using the lubridate package:
lubridate::seconds_to_period(as_seconds(times))
#> [1] "13d 20H 21M 46S" "10d 14H 12M 1S" "1M 22S" "13H 12M 1S"
If you only want to keep the character format in your data frame, you can just convert to seconds on demand. For example, we can use order along with our as_seconds function to put the durations in order:
times[order(as_seconds(times))]
#> [1] "1:22" "13:12:01" "254:12:01" "332:21:46"
Or reverse order:
times[order(-as_seconds(times))]
#> [1] "332:21:46" "254:12:01" "13:12:01" "1:22"
Created on 2022-02-16 by the reprex package (v2.0.1)

How to read quarterly data with R?

I'm trying to use Bayesian VAR, but I can't even get my data right properly. I get them from https://sdw.ecb.europa.eu/ but since a lot of them are quarterly data I have a problem to merge my variables since I'm unable to convert for example "2020-Q1" from char to date with as.Date.
I used the sub function to get 2020-1 for example and then tried as.Date(, format="%Y-%q) but it doesn't work, so I'm stuck.
textData <- "yearQuarter,Amount
2019-Q1,1000
2019-Q2,2000
2019-Q3,3000"
df <- read.csv(text=textData,header = TRUE,stringsAsFactors = FALSE)
as.Date(df$yearQuarter,format="%Y-%q")
...which produces:
> as.Date(df$yearQuarter,format="%Y-%q")
[1] NA NA NA
Thank you for your help !
library(lubridate)
d = yq("2020-Q1")
d
# [1] "2020-01-01"
year(d)
# [1] 2020
quarter(d)
# [1] 1

For Loop to Rename Column Names of Many Objects R

I am looking for a way to rename the columns of several objects with a for loop or other method in R. Ultimately, I want to be able to bind the rows of each Stock object into one large data frame, but cannot due to differing column names. Example below:
AAPL <-
Date AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted Stock pct_change
2020-05-14 304.51 309.79 301.53 309.54 39732300 309.54 AAPL 0.61
2020-05-15 300.35 307.90 300.21 307.71 41561200 307.71 AAPL -0.59
GOOG <-
Date GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Adjusted Stock pct_change
2020-05-14 1335.02 1357.420 1323.910 1356.13 1603100 1356.13 GOOG 0.50
2020-05-15 1350.00 1374.480 1339.000 1373.19 1705700 1373.19 GOOG 1.26
For this example I have 2 objects (AAPL and GOOG), but realistically I would be working with many more. Can I create a for loop to iterate through each object, and rename the 2nd column of each to "Open", 3rd column to "High", 4th column to "Low",.... etc so I can then bind all these objects together?
I already have a column named "Stock", so I do not need the Ticker part of the column name.
Using quantmod we can read a set of stock ticker symbols, clean their names & rbind() into a single data frame.
There are three key features illustrated within this answer, including:
Use of get() to access the objects written by quantmod::getSymbols() once they are loaded into memory.
Use of the symbol names passed into lapply() to add a symbol column to each data frame.
Conversion of the dates stored as row names in the xts objects written by getSymbols() to a data frame column.
First, we'll use getSymbols() to read data from yahoo.com.
library(quantmod)
from.dat <- as.Date("12/02/19",format="%m/%d/%y")
to.dat <- as.Date("12/06/19",format="%m/%d/%y")
theSymbols <- c("AAPL","AXP","BA","CAT","CSCO","CVX","XOM","GS","HD","IBM",
"INTC","JNJ","KO","JPM","MCD","MMM","MRK","MSFT","NKE","PFE","PG",
"TRV","UNH","UTX","VZ","V","WBA","WMT","DIS","DOW")
getSymbols(theSymbols,from=from.dat,to=to.dat,src="yahoo")
# since quantmod::getSymbols() writes named data frames, need to use
# get() with the symbol names to access each data frame
head(get(theSymbols[[1]]))
> head(get(theSymbols[[1]]))
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
2019-12-02 267.27 268.25 263.45 264.16 23621800 262.8231
2019-12-03 258.31 259.53 256.29 259.45 28607600 258.1370
2019-12-04 261.07 263.31 260.68 261.74 16795400 260.4153
2019-12-05 263.79 265.89 262.73 265.58 18606100 264.2359
Having illustrated how to access the symbol objects in the global environment, we'll use lapply() to extract the dates from the row names, clean the column headings, and write the symbol name as a column for each symbol's data object.
# convert to list
symbolData <- lapply(theSymbols,function(x){
y <- as.data.frame(get(x))
colnames(y) <- c("open","high","low","close","volume","adjusted")
y$date <- rownames(y)
y$symbol <- x
y
})
Finally, we convert the list of data frames to a single data frame.
#combine to single data frame
combinedData <- do.call(rbind,symbolData)
rownames(combinedData) <- 1:nrow(combinedData)
...and the output:
> nrow(combinedData)
[1] 120
> head(combinedData)
open high low close volume adjusted date symbol
1 267.27 268.25 263.45 264.16 23621800 262.8231 2019-12-02 AAPL
2 258.31 259.53 256.29 259.45 28607600 258.1370 2019-12-03 AAPL
3 261.07 263.31 260.68 261.74 16795400 260.4153 2019-12-04 AAPL
4 263.79 265.89 262.73 265.58 18606100 264.2359 2019-12-05 AAPL
5 120.31 120.36 117.07 117.26 5538200 116.2095 2019-12-02 AXP
6 116.04 116.75 114.65 116.57 3792300 115.5256 2019-12-03 AXP
>
If you can guarantee the order of these columns this should do it:
for(df in list(AAPL, GOOG))
colnames(df) <- c("Date", "Open", "High", "Low", "Close", "Volume", "Adjusted", "Stock", "pct_change")
With lapply, we can loop over the list and remove the prefix in the column names with sub. This can be done without any external packages
lst1 <- lapply(list(AAPL, GOOG), function(x) {
colnames(x) <- sub(".*\\.", "", colnames(x))
x})

How do I change the index in a csv file to a proper time format?

I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.

Resources