R - Importing xlsx file with date column - r

When I try to read in an excel file it always messes with the date/time.
library(openxlsx)
download.file("http://ec.europa.eu/economy_finance/db_indicators/surveys/documents/series/nace2_ecfin_1801/services_subsectors_sa_nace2.zip", destfile="services_subsectors_sa_nace2.zip")
unzip("services_subsectors_sa_nace2.zip")
bcs<-read.xlsx("services_subsectors_sa_m_nace2.xlsx", colNames=TRUE, sheet="73")
Column 1 (no name given in the original dataset) would be the date/time column. By default this colum gets given the name 73 when it enters R.
I tried
as.POSIXct(bcs$73, format="%d/%m/%Y", tz="CET")
Any help is much appreciated. Thank you :)

You can use the janitor package, especially the function excel_numeric_to_date.
Another option would be to use the package readxl to read your excel file which converts automatically date columns in datetime :
library(readxl)
read_excel("services_subsectors_sa_m_nace2.xlsx", sheet="73")

Related

Importing date from csv in R

I want to import a excel file into r and the file contains a column with date and time in this form:
20.08.2018 16:32:20
If I change to standard format in the csv file itself it looks like this:
43332,68912
If I read in the file by using read_excel() R this date looks like this:
43332.689120370371
How can I turn the current format into a date format in R?
It is a good practice not to edit anything in a .csv (or excel) file—so to treat them as read only—and to make changes in a script (so in R).
Let's call your data frame "my_df" and your datetime variable "date".
library(readr)
library(magrittr)
my_df$date %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Edit: Trying to piece together information from your comments, I created an excel file with one column called STARTED with date and time in the form 20.08.2018 16:32:20 as you indicate in the question. Since you seem to like readxl:
library(readxl)
library(magrittr)
myData <- read_excel("myData.xlsx")
myData$STARTED %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Which is the same code I already wrote above. This gives:
# A tibble: 1 x 1
STARTED
<dttm>
1 2018-08-20 16:32:20
If you only get NA, your data is not in the format given by your example 20.08.2018 16:32:20.
Following your discussion with #prosoitos, it looks like the import function cannot make sense of your date column:
Your line of example data in the comments contains no quotes around your date string. That implies that you copied that data either by opening it with excel (or similar) or your survey tool does not qualify dates as strings. Did you open our .csv in excel, saved it as .xlsx and tried to import the result in R? That would explain the mess you get, as excel could try to interpret the date strings and convert them to some funny Microsoft format nobody else uses.
Please don't do that, use a raw csv-file that was never touched with excel and import it directly into R.
Your read function obviously does not understand the content of your date variable and apparently replaces it with some unix standard time, which are seconds since 1970. However, it looks like those time stamps are invalid (43332 is something like noon on 1970/01/01), else you could easily transform them to human readable dates.
I suggest you try importing your csv with:
read.csv("your_data.csv", header=TRUE, stringsAsFactors=FALSE)
You may have to specify your seperator, e.g. sep = "\t" (for a tab-seperated file), if it is not whitespace, which is the default seperatpr of the read function. After that, the dates in your dataframe are simple text strings and you can follow up with what #prosoitos said.
(Sorry for adding an additional answer. I would have commented to #prosoitos answer, but I have insufficent reputation points.)
Read CSV into R MyData
read.csv(file="TheDataIWantToReadIn.csv", header=TRUE, sep=",")

Date and Time reduction in R

I am new with R and I am facing this problem:
I have a large dataset (both csv file and Rdata file) that contains some date and time columns.
section of the dataset
I should do some calculation and some data visualization with it, but problems arises with the convertedTime column. I should visualize it as "minutes: seconds(with one decimal) as I indeed visualize them in the csv file if I open it with excel. I should work with the same format as shown in the excel file.
Excel sample of timeConverted.
When I load the data in R (I have tried both formats) the convertedTime are expressed in their full format. How can I convert them into the %M:%OS1 ?
keyData <- read.csv('keyPressDataWithLaneDeviation.csv')
print(head(keyData))
library(dplyr)
keyDataNoError <- filter(keyData, typingErrorMadeOnTrial ==0)
print(head(keyDataNoError))
strptime(keyDataNoError$timeConverted, format = "%M:%0S1")
print(head(keyDataNoError))
After i filter the dataset i try to format the time without results. The output of the last 2 prints are identical. Where am I wrong?
Another thing that I tried to do is load the Rdata file instead. But with the Rdata file I don't even get decimals in the ConvertedTime column and i really do not understand why.
Rdata file
csv file
You're looking for strftime, not strptime.
strftime(keyDataNoError$timeConverted, format = "%M:%0S1")
e.g.:
a<-"2018-02-24 11:30:05.105"
strftime(a, format="%M:%OS1")
[1] "30:05.1"
strftime(a, format="%M:%OS3")
[1] "30:05.105"
strftime(a, format="%M:%OS5")
[1] "30:05.10500"
Note that strftime outputs a character class object, not a POSIXt class.

as.Date fails in R: 'character string is not in a standard unambiguous'

I am working with data in R that I imported form excel format.
I have a column (isas1_3b) in the dataframe (LPAv1.1.1) which is in the character format. Upon importing, the dates have changed from the format dd/mm/yy, to days (e.g., 41268).
I have tried to convert this as below:
as.Date(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
However, I get the following error:
Error in charToDate(x) : character string is not in a standard unambiguous format
4. stop("character string is not in a standard unambiguous format")
3. charToDate(x)
2. as.Date.character(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
1. as.Date(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
I'm not sure what I am doing wrong. After numerous searches, the above conversion is what was recommended.
I should also add, that there are two other date columns in the original excel document. But they have both been read in as 'POSIXct''POSIXt'.
Other info that may be relevant:
macOS 13.13.3 R 3.3.3 RStudio 1.1.419
Can someone please help resolve this issue... I am assuming it is something that I am doing. Please let me know if you need any more info.
As thelatemail rightly pointed out, the column with the days information must be in numeric format.
d <- 41268
as.Date(d, origin = "1899-12-30")
#[1] "2012-12-25"
On your dataset, this will fix it:
library(dplyr)
mutate(LPAv1.1.1, isas1_3b = as.Date(as.numeric(isas1_3b),
origin = "1899-12-30"))
The variable class was not consistent within the column/vector. There was a mix of dates, strings, and four digit numbers. Once I corrected these, it worked as expected. Thank you all for your help.
How did you import it from excel? Is it possible to adjust the import method so it imports it the way you are expecting?
Importing the data from excel with the package "xlsx" gives you two options read.xlsx which will detect/guess the class type based on the data in that column, or read.xlsx2 where you have to/get to set the class types manually with the colClasses option. (more info here: http://www.sthda.com/english/wiki/r-xlsx-package-a-quick-start-guide-to-manipulate-excel-files-in-r)
Another useful option is XLConnect.
A possible downside of both of these packages is they rely on Java to do the work of importing, so you must have Java installed for these to work.
I solved the issue by changing cell format for the dates in Excel to Date Type "2014-03-24" and saving the file as CSV (MS-DOS) (*csv) before loading into R.

R: Writing data frame into excel with large number of rows

I have a data frame (panel form) in R with 194498 rows and 7 columns. I want to write it to an Excel file (.xlsx) using function res <- write.xlsx(df, output) but R goes in the coma (keeps showing stop sign on the top left of console) without making any change in the targeted file(output). Finally shows following:
Error in .jcheck(silent = FALSE) :
Java Exception <no description because toString() failed>.jcall(row[[ir]], "Lorg/apache/poi/ss/usermodel/Cell;", "createCell", as.integer(colIndex[ic] - 1))<S4 object of class "jobjRef">
I have loaded readxl and xlsx packages. Please suggest to fix it. Thanks.
Install and load package named 'WriteXLS' and try writing out your R object using function WriteXLS(). Make sure your R object is written in quotes like the one below "data".
# Store your data with 194498 rows and 7 columns in a data frame named 'data'
# Install package named WriteXLS
install.packages("WriteXLS")
# Loading package
library(WriteXLS)
# Writing out R object 'data' in an Excel file created namely data.xlsx
WriteXLS("data",ExcelFileName="data.xlsx",row.names=F,col.names=T)
Hope this helped.
This does not answer your question, but might be a solution to your problem.
Could save the file as a CSV instead like so:
write.csv(df , "df.csv")
open the CSV and then save as an Excel file.
I gave up on trying to import/export Excel files with R because of hassles like this.
In addition to Pete's answer I wouldn't recommend write.csv because it takes or can take minutes to load. I used fwrite() (from data.table library) and it did the same thing in about 1-2 secs.
The post author asked about large files. I dealt with a table about 2,3 million rows long and write.data (and frwrite) aren't able to write more than about 1 million rows. It just cuts the data away. So instead use write.table(Data, file="Data.txt"). You can open it in Excel and split the one column by your delimiter (use argument sep) and voila!

read.xlsx function is reading one of my variables as "factor" instead of "POSIXct"

I have an xlsx file with a number of variables (columns). Quite a few are listed in date format (MM/DD YYYY HH:MM A/P) in the .xlsx file. When I load this file into R using read.xlsx, all of the variables with date format load as POSIXct except ONE, which always loads as a factor variable. Any thoughts on why this may be?
For reference I am loading the data using code similar to that below:
data <- read.xlsx("file.xlsx", sheetIndex = 1, header = TRUE)
Well, I figured it out! Turns out one of the entries for this variable (of the hundreds) was entered slightly incorrectly in the xlsx file (it was listed as 15:00 PM, an impossible time!), which threw off the xlsx package I suppose. Once fixed, the data pull results in a column with POSIXct entries.
Hope this helps anyone else in the future encountering a similar problem!

Resources