I want to import a excel file into r and the file contains a column with date and time in this form:
20.08.2018 16:32:20
If I change to standard format in the csv file itself it looks like this:
43332,68912
If I read in the file by using read_excel() R this date looks like this:
43332.689120370371
How can I turn the current format into a date format in R?
It is a good practice not to edit anything in a .csv (or excel) file—so to treat them as read only—and to make changes in a script (so in R).
Let's call your data frame "my_df" and your datetime variable "date".
library(readr)
library(magrittr)
my_df$date %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Edit: Trying to piece together information from your comments, I created an excel file with one column called STARTED with date and time in the form 20.08.2018 16:32:20 as you indicate in the question. Since you seem to like readxl:
library(readxl)
library(magrittr)
myData <- read_excel("myData.xlsx")
myData$STARTED %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Which is the same code I already wrote above. This gives:
# A tibble: 1 x 1
STARTED
<dttm>
1 2018-08-20 16:32:20
If you only get NA, your data is not in the format given by your example 20.08.2018 16:32:20.
Following your discussion with #prosoitos, it looks like the import function cannot make sense of your date column:
Your line of example data in the comments contains no quotes around your date string. That implies that you copied that data either by opening it with excel (or similar) or your survey tool does not qualify dates as strings. Did you open our .csv in excel, saved it as .xlsx and tried to import the result in R? That would explain the mess you get, as excel could try to interpret the date strings and convert them to some funny Microsoft format nobody else uses.
Please don't do that, use a raw csv-file that was never touched with excel and import it directly into R.
Your read function obviously does not understand the content of your date variable and apparently replaces it with some unix standard time, which are seconds since 1970. However, it looks like those time stamps are invalid (43332 is something like noon on 1970/01/01), else you could easily transform them to human readable dates.
I suggest you try importing your csv with:
read.csv("your_data.csv", header=TRUE, stringsAsFactors=FALSE)
You may have to specify your seperator, e.g. sep = "\t" (for a tab-seperated file), if it is not whitespace, which is the default seperatpr of the read function. After that, the dates in your dataframe are simple text strings and you can follow up with what #prosoitos said.
(Sorry for adding an additional answer. I would have commented to #prosoitos answer, but I have insufficent reputation points.)
Read CSV into R MyData
read.csv(file="TheDataIWantToReadIn.csv", header=TRUE, sep=",")
Related
I am new with R and I am facing this problem:
I have a large dataset (both csv file and Rdata file) that contains some date and time columns.
section of the dataset
I should do some calculation and some data visualization with it, but problems arises with the convertedTime column. I should visualize it as "minutes: seconds(with one decimal) as I indeed visualize them in the csv file if I open it with excel. I should work with the same format as shown in the excel file.
Excel sample of timeConverted.
When I load the data in R (I have tried both formats) the convertedTime are expressed in their full format. How can I convert them into the %M:%OS1 ?
keyData <- read.csv('keyPressDataWithLaneDeviation.csv')
print(head(keyData))
library(dplyr)
keyDataNoError <- filter(keyData, typingErrorMadeOnTrial ==0)
print(head(keyDataNoError))
strptime(keyDataNoError$timeConverted, format = "%M:%0S1")
print(head(keyDataNoError))
After i filter the dataset i try to format the time without results. The output of the last 2 prints are identical. Where am I wrong?
Another thing that I tried to do is load the Rdata file instead. But with the Rdata file I don't even get decimals in the ConvertedTime column and i really do not understand why.
Rdata file
csv file
You're looking for strftime, not strptime.
strftime(keyDataNoError$timeConverted, format = "%M:%0S1")
e.g.:
a<-"2018-02-24 11:30:05.105"
strftime(a, format="%M:%OS1")
[1] "30:05.1"
strftime(a, format="%M:%OS3")
[1] "30:05.105"
strftime(a, format="%M:%OS5")
[1] "30:05.10500"
Note that strftime outputs a character class object, not a POSIXt class.
I am using openxlsx package to write back a file. I have already used as.date and format function to make my fates look like dd-mmm-yyyy.
However, when I open the Excel file, even though the date comes as, say "12-may-2018", I cannot filter them out like Excel dates. It shows that the type of the data is general.
Even if I convert it to date format in Excel, it still doesn't let me filter it out by year, month, and day, which happens for Excel dates. I can convert them to date type by manually placing my cursor in the middle of a cell and pressing the return key.
Doing that for the whole data will be too much manual effort which I want to reduce. Is there any way to make it happen. Thanks for any suggestions that you guys give.
Here is my code:
data$datecolumns <- as.date(as.numeric(data$datecolumn), origin = origin - somenumberforcalibartion, format = "%d")
data$datecolumn <- format(data$datecolumn, format = "%d-%b-%Y")
write.xlsx(data, filename)
Here, datecolumn is being read in Excel numeric format.
I just saw a code snippet where the date was being read from CSV as string converted to POSIXct and then again written as CSV is being read as date in Excel. Haven't found anything for Xlsx yet.
Format function makes the date back to string. Which was causing the whole issue, hence remove format function and things work fine. #Tjebo #Roman Lustrik helped me for this.
When I try to read in an excel file it always messes with the date/time.
library(openxlsx)
download.file("http://ec.europa.eu/economy_finance/db_indicators/surveys/documents/series/nace2_ecfin_1801/services_subsectors_sa_nace2.zip", destfile="services_subsectors_sa_nace2.zip")
unzip("services_subsectors_sa_nace2.zip")
bcs<-read.xlsx("services_subsectors_sa_m_nace2.xlsx", colNames=TRUE, sheet="73")
Column 1 (no name given in the original dataset) would be the date/time column. By default this colum gets given the name 73 when it enters R.
I tried
as.POSIXct(bcs$73, format="%d/%m/%Y", tz="CET")
Any help is much appreciated. Thank you :)
You can use the janitor package, especially the function excel_numeric_to_date.
Another option would be to use the package readxl to read your excel file which converts automatically date columns in datetime :
library(readxl)
read_excel("services_subsectors_sa_m_nace2.xlsx", sheet="73")
My CSV looks as follows
But when I read it in R, it changes the format for Date and Time(most important issue).
Here is my simple code that I used to read the csv
library(readr)
dat1<- read_csv("2010.csv")
How can I make it so that the format for date and time doesn't change and looks like the first picture?
Try:
dat1 <- read.table("2010.csv", sep = ",", header = T)
If this answer goes wrong, then:
First check exactly what's in your .csv by opening it with text or some code editors instead of Excel or something like that. The problem I think maybe is that some APP like MS Excel, it will do some auto format converting in preview mode, so what you saw may be not what's really in your file.
For example: 2015-1-1 in .csv when open with Excel, it becomes 2015/1/1
I'm having an issue trying to format date in r... tried the following codes
rdate<-as.Date(dusted$time2,"%d/%m/%y") and also the recommendations on this stackoverflow question Changing date format in R but still couldn't get it to work.
geov<-dusted
geov$newdate <- strptime(as.character(geov$time2), "%d/%m/%Y")
all i'm getting is NA for the whole column for date. This are daily values, i would love if r can read them. Data available here https://www.dropbox.com/s/awstha04muoz66y/dusted.txt?dl=0
To convert to date, as long as you successfully imported the data already into a data frame such as dusted or geov, and have time2 holding dates as strings resembling 10-27-06, try:
geov$time2 = as.Date(geov$time2, "%m-%d-%y")
equal sign = used just to save on typing. It is equivalent to <-, so you can still use <- if you prefer
this stores the converted dates right back into geov$time2, overwriting it, instead of creating a new variable geov$newdate as you did in your original question. This is because a new variable is not required for conversion. But if for some reason you really need a new variable, feel free to use geov$newdate
similarly, you also didn't need to copy dusted to a new geov data frame just to convert. It does save time for testing purposes though, just in case the conversion doesn't work you can restart from copying dusted to geov instead of having to re-import data from a file into dusted
Additional resources
help(strptime) for looking up date code references such as %y. On Linux, man date can reveal the date codes