Converting character to POSIXct in R Markdown is Making Me Sad - r

I'm learning to use R as part of a certificate program and I've been trying to compile a report from a practice project in R Markdown and getting my date-times back to POSIXct rather than being characters has been very difficult. I know write_csv() saves all cloumns in a dataframe as the character type, and for things like num and Date, I have corrected them using something like:
is.factor(all_trips_v2_rmd$date) # this checks to see if the variable is factor-type
all_trips_v2_rmd$date <- as.Date(as.character(all_trips_v2_rmd$date)) # this converts the columns into Date-type
is.Date(all_trips_v2_rmd$date) # this checks to make sure this column is now Date-type
which does the job just fine. However, when I run this:
is.factor(all_trips_v2_rmd$started_at)
all_trips_v2_rmd$started_at <- as.POSIXct(as.Date(as.character(all_trips_v2_rmd$started_at)), format="%Y/%m/%d/%H/%M/%S")
is.POSIXct(all_trips_v2_rmd$started_at)
the output of the first date-time value would be: "2021-04-11 19:00:00." The Y%/m%/d% part of the information will generally be correct, but all H%/M%/S% parts of the data will be exactly 19:00:00. Looking at the character-only version of the .csv file, the first date-time value is: "2021-04-12T18:25:36Z." I've been looking everywhere for a fix, and none of the solutions seem to work.

Related

Importing date from csv in R

I want to import a excel file into r and the file contains a column with date and time in this form:
20.08.2018 16:32:20
If I change to standard format in the csv file itself it looks like this:
43332,68912
If I read in the file by using read_excel() R this date looks like this:
43332.689120370371
How can I turn the current format into a date format in R?
It is a good practice not to edit anything in a .csv (or excel) file—so to treat them as read only—and to make changes in a script (so in R).
Let's call your data frame "my_df" and your datetime variable "date".
library(readr)
library(magrittr)
my_df$date %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Edit: Trying to piece together information from your comments, I created an excel file with one column called STARTED with date and time in the form 20.08.2018 16:32:20 as you indicate in the question. Since you seem to like readxl:
library(readxl)
library(magrittr)
myData <- read_excel("myData.xlsx")
myData$STARTED %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Which is the same code I already wrote above. This gives:
# A tibble: 1 x 1
STARTED
<dttm>
1 2018-08-20 16:32:20
If you only get NA, your data is not in the format given by your example 20.08.2018 16:32:20.
Following your discussion with #prosoitos, it looks like the import function cannot make sense of your date column:
Your line of example data in the comments contains no quotes around your date string. That implies that you copied that data either by opening it with excel (or similar) or your survey tool does not qualify dates as strings. Did you open our .csv in excel, saved it as .xlsx and tried to import the result in R? That would explain the mess you get, as excel could try to interpret the date strings and convert them to some funny Microsoft format nobody else uses.
Please don't do that, use a raw csv-file that was never touched with excel and import it directly into R.
Your read function obviously does not understand the content of your date variable and apparently replaces it with some unix standard time, which are seconds since 1970. However, it looks like those time stamps are invalid (43332 is something like noon on 1970/01/01), else you could easily transform them to human readable dates.
I suggest you try importing your csv with:
read.csv("your_data.csv", header=TRUE, stringsAsFactors=FALSE)
You may have to specify your seperator, e.g. sep = "\t" (for a tab-seperated file), if it is not whitespace, which is the default seperatpr of the read function. After that, the dates in your dataframe are simple text strings and you can follow up with what #prosoitos said.
(Sorry for adding an additional answer. I would have commented to #prosoitos answer, but I have insufficent reputation points.)
Read CSV into R MyData
read.csv(file="TheDataIWantToReadIn.csv", header=TRUE, sep=",")

issues with formatting a column with date in r

I'm having an issue trying to format date in r... tried the following codes
rdate<-as.Date(dusted$time2,"%d/%m/%y") and also the recommendations on this stackoverflow question Changing date format in R but still couldn't get it to work.
geov<-dusted
geov$newdate <- strptime(as.character(geov$time2), "%d/%m/%Y")
all i'm getting is NA for the whole column for date. This are daily values, i would love if r can read them. Data available here https://www.dropbox.com/s/awstha04muoz66y/dusted.txt?dl=0
To convert to date, as long as you successfully imported the data already into a data frame such as dusted or geov, and have time2 holding dates as strings resembling 10-27-06, try:
geov$time2 = as.Date(geov$time2, "%m-%d-%y")
equal sign = used just to save on typing. It is equivalent to <-, so you can still use <- if you prefer
this stores the converted dates right back into geov$time2, overwriting it, instead of creating a new variable geov$newdate as you did in your original question. This is because a new variable is not required for conversion. But if for some reason you really need a new variable, feel free to use geov$newdate
similarly, you also didn't need to copy dusted to a new geov data frame just to convert. It does save time for testing purposes though, just in case the conversion doesn't work you can restart from copying dusted to geov instead of having to re-import data from a file into dusted
Additional resources
help(strptime) for looking up date code references such as %y. On Linux, man date can reveal the date codes

Why am I getting different output from the Alteryx R tool

I using the Alteryx R Tool to sign an amazon http request. To do so, I need the hmac function that is included in the digest package.
I'm using a text input tool that includes the key and a datestamp.
Key= "foo"
datastamp= "20120215"
Here's the issue. When I run the following script:
the.data <- read.Alteryx("1", mode="data.frame")
write.Alteryx(base64encode(hmac(the.data$key,the.data$datestamp,algo="sha256",raw = TRUE)),1)
I get an incorrect result when compared to when I run the following:
write.Alteryx(base64encode(hmac("foo","20120215",algo="sha256",raw = TRUE)),1)
The difference being when I hardcode the values for the key and object I get the correct result. But if use the variables from the R data frame I get incorrect output.
Does the data frame alter the data in someway. Has anyone come across this when working with the R Tool in Alteryx.
Thanks for your input.
The issue appears to be that when creating the data frame, your character variables are converted to factors. The way to fix this with the data.frame constructor function is
the.data <- data.frame(Key="foo", datestamp="20120215", stringsAsFactors=FALSE)
I haven't used read.Alteryx but I assume it has a similar way of achieving this.
Alternatively, if your data frame has already been created, you can convert the factors back into character:
write.Alteryx(base64encode(hmac(
as.character(the.data$Key),
as.character(the.data$datestamp),
algo="sha256",raw = TRUE)),1)

R convert imported Excel numeric back from R factors to numeric

I'm trying to read an Excel created .csv file into R. I've tried numerous suggestions but none have completely panned out for me.
Here's how the data looks in the .csv file, with the first row being the header:
recipe_type,State,Successes,Attempts
paper,alabama ,586,3379
Here are my R commands to import the .csv file:
options( StringsAsFactors=F )
results<-read.csv("recipe results.csv", header=TRUE, as.is=T)
results$Successes
[1] "586"
And Successes is being treated as character data.
And I've also tried this approach:
results[,3]<- as.numeric(levels(results$Successes)) but get the rank of each value in this column rather than the actual value, which another posts said would happen.
Any ideas on how to get this data treated as numeric so I can get proper stat.desc stats for it?
Thanks
Direct conversion of a factor to numeric yields the factor levels, and nothing to do with the values themselves. You need to convert to character first:
results[,3] <- as.numeric(as.character(results$Successes))
Equivalently (see ?factor), you can convert the levels to numeric, and index by the (implicit) numeric conversion of the factor.
as.numeric(levels(results$Successes))[results$Successes]
Realise this is an old question, but came across it today when having a similar issue.
I Eventually found that (in my case) the problem arose from Excel's 'Number' format includes a comma (,) in its values so: 1,000 instead of 1000. Once I removed the comma I was able to convert from factors without NA values.
df$col1 <-as.numeric(gsub(",","",df$col1))
Just in case anyone comes across something simialr.
I found this package to be most helpful, worked without any issues, aside from a warning: gdata.
This URL contains the info on the package:http://www.r-tutor.com/r-introduction/data-frame/data-import
I did convert my spreadsheet from an .xlsx to .xls which it seemed to expect. I didn't test if an .xlsx would work.

Dates when exporting to CSV and reading into R

I'm using the R[1] package RGoogleDocs[2] to connect to my Google Docs, obtain a list of spreadsheets and import a specific worksheet from a specified spreadsheet. I can achieve this fine no problem following the example given at https://github.com/hammer/google-spreadsheets-to-r-dataframe
The problem I have encountered is with date columns. Under Google Docs I've selected to format these as YYYY-MM-DD and they display fine within Google Docs.
However, the exported CSV which gets imported to R has these as numeric fields, so for example....
Displayed in Google Docs > As imported to R
2013-02-15 > 41320
2013-02-19 > 41324
2013-02-26 > 41331
2013-03-22 > 41355
This isn't necessarily a problem as it appears that these are elapsed dates, but I don't know what the origin from which they are being counted is. Once I know the origin, R has a function for converting dates/times that allows this to be specified so I can then reformat internally in R ( using the as.Date(date, origin="") function).
To try and get round this I set the formatting to plain text for the date columns, but despite typing the dates in with leading zero's for days/months < 10 they are exported without, so the as.Date() function complains about them being in a non-standard format.
I therefore have two options/questions...
1) What is the origin that Google Docs uses internally for representing dates? (I've searched for this through the Google Help but can't find it, and wider web-searches have been fruitless)
2) Is there a method to export dates as strings to CSV? (I have tried this, but when they're set to "plain text" in Google Docs, the leading zeros ('0') that are typed in when entering the dates are not present in the export, meaning that R complains about the date being in a non standard format*).
Thanks in advance for your time,
slackline
[1] http://www.r-project.org/
[2] http://www.omegahat.org/RGoogleDocs/
I could write a function to pull out the day/month/year as individual elements and derive this though, but figured there is a more direct method.
Concerning your question number 1): Apparently, Google Docs uses 1899-12-30 as the date origin:
as.Date(41320, origin="1899-12-30")
# [1] "2013-02-15"
# etc

Resources