Read excel data as is using readxl in R - r

I have to read an excel file in R. The excelfile has a column with values such as 50%,20%... and another column with dates in the format "12-December-2017" but R converts both the column datas.
I am using readxl package and i specified in col_types parameter all the columns to be read as text but when i check the dataframe all the column types are characters but the percentage data and date changes to decimals and numbers respectively.
excelfile2<-read_excel(filePath,col_types=rep("text",8))
I want to read the excel file as is.Any help will be appreciated.

This is because what you visualize inside the Excel is not what actually is stored.
For example, if in excel you visualize "12-December-2017", what is stored in reality is the number of days since 1-1-1899.
My suggestion is to open the Excel file with the TextReader so you have a grasp what really you are reading in R.
Then, you can either define everything as text in excel or you can apply some transformations in R in order to convert the days since 1-1-1899 into a POSIXct format.

Related

How to stop R from automatically converting string to date?

I have a large excel file I'm reading into R for a sports analytics project. One of the columns is height, and the format in excel is ft-in (i.e. 5-10). This is fine in excel because that column was specifically formatted to take in plain text and not convert it to date. But when I read the csv to R, the data frame auto converts to date. Is there a command/parameter to have it not do this?

How to read my excel as strings in R? (R reads my excel file as date format)

I am trying to read an excel file in R.
I used read_excel() function.
My excel file is full of numbers such as 18116.28
But R seems to recognize the numbers as date and time.
R read this as 1949-08-06 06:49:24
Why does this happen? And how can I stop this?

Is there a way to read in a large document as a data.frame in R?

I'm trying to use ggplot2 on a large data set stored into a csv file. I used to read it with excel.
I don't know how to convert this data into a data.frame. In particular, I have a date column that has the following format: "2020/04/12:12:00". How can I get R to understand this format ?
If it's a csv, you can use:
fread function from data.table. This will be the fastest way to read your csv.
read_csv or read_csv2 (for ; delimited documents) in readr package
If it's .xls (or .xlsx) document, have a look at the readxl package.
All these functions import your data as data.frames (with additional classes like data.table for fread or tibble for read_csv).
Edit
Given your comment, it looks like your file is not an excel but a csv. If you want to convert a column type to date, assuming your dataframe is called df
df[, dates := as.POSIXct(get(colnames(df)[1]), format = "%Y/%m/%d:%H:%M")]
Note that you don't need to use cbind or even reassign the data.table because you use := operator
As the message is saying you, you don't need the extra-precision of POSIXlt
Going by the question alone, I would suggest the openxlsx package, it has helped me reduce the time significantly in reading large datasets. Three points you may find it to be helpful based on your question and the comments
The read command stays same as xlsx package, however would suggest you to use openxlsx::read.xslx(file_path)
the arguments are again same, but in the place of sheetIndex it is sheet and it takes only numbers
If the existing columns are converted to character, then a simple as.Date would work

R xlsx Excel number formats

I have created a workbook (createWorkbook) and formatted the headings, however how do I format the number values such that when I write/save to Excel I get the following:
For the first field e.g.
3615 becomes 3,615
For subsequent fields e.g.
70.658 becomes 70.7
Note, I still want Excel to recognize the values as doubles.
Thank you!

How do I control the format of columns when I use the write.csv function?

I've created a dataframe in R and one of my columns convert a date such as 01/08/2018 (dd/mm/yyyy) into text form Aug-18 (mmm-yy). However, when I write this to csv using the write.csv function, Excel automatically converts this to date.
Is there a way I can specify the column type to be "Text" so that Excel doesn't change it to date format?
One simple trick that IMHO gets far too little attention is to pad your date colums with whitespace, e.g. df$mydate <- paste(' ', df$mydate, sep=''). This stops Excel from translating the text as dates.
I have started routinely doing that for all kinds of risky columns when doing R<->Excel transformations.
Taken from here: https://support.office.com/en-us/article/stop-automatically-changing-numbers-to-dates-452bd2db-cc96-47d1-81e4-72cec11c4ed8

Resources