I am trying to read in an excel file using the readXL package with a column of time stamps. For this particular file, they are randomly distributed times, so they make look like 00:01, 00:03, 00:04, 00:08, 00:10, etc. so I need these timestamps to read into R correctly. The time stamps turn into random decimals.
I looked in the excel file (which is outputted from a different program) and it appears the column type within excel is "custom". When I convert that "custom" column to "text", it shows me the decimals that are actually stored and reading into R. Is there a way to load in the timestamps instead of the decimals?
I have already tried to using col_types to make it text or integers, but it is still reading the numbers in as decimals and not the timestamps.
df<-
readxl::read_xlsx(
"./Data/LAFLAC_60sec_9999Y1_2019-06-28.xlsx",
range = cell_cols("J:CE"),
col_names = T
)
The decimals are a representation in day after midnight. So 1:00 am is .041667, or 1/24.
In R, you can convert those numbers back into timestamps in a variety of ways.
Try this page for more info
https://stackoverflow.com/a/14484019/6912825
Related
I imported some data from Excel to RStudio (csv file). The data contains date information. The date format I want is month-day-year (e.g. 2-10-16 means February 10th 2016). The problem is that Excel auto-fills 2-10-16 to 2002-10-16, and the problem continues to exist when I imported the data to R. So, my data column contains both the correctly formatted dates (e.g. 2-10-16) and incorrectly formatted dates (e.g. 2002-10-16). Because I have a lot of dates, it is impossible to manually change everything. I have tried to use the this code
as.Date(data[,1], format="%m-%d-%y") but it gives me NA for those incorrectly formatted dates (e.g. 2002-10-16). Does anybody know how to make all the dates correctly formatted?
Thank you very much in advance!
would you consider to have a consistent date format in excel before importing the data to R?
The best approach is likely to change how the data is captured in Excel even if it means storing the dates as strings. What you're looking for is string manipulation to then convert into a date which could potentially create incorrect data.
This will remove the first two digits and then allow conversion to a date.
as.Date(sub('^\\d{2}', '', '2002-10-16'), '%m-%d-%y')
[1] "2016-02-10"
I wrote a data frame in CSV format with R and opened it with Excel. Then I converted it to an Excel Workbook and made some edit on it.
When I imported that Excel file in R again, the date column looked like this. (Few are in numbers and few are dates.)
Date
39387
39417
15/01/2007
16/01/2007
I tried to change the format with Excel but failed. General or number option in Excel format generate the number like I mentioned which is in no way related to the date.
It seems all four of your example are in respect of dates in January (11th and 12th for the first two), that the Excel involved has been configured to expect MDY (rather than DMY as popular in UK for example) and its date system to ‘1900’, and that the CSV has written the dates as 'conventional' dates rather than date index numbers.
So what Excel saw first was:
11/01/2017
12/01/2017
15/01/2017
16/01/2017
intended to represent the 11th, 12th, 15th and 16th of January. However, because expecting MDY Excel has interpreted (coercing the Text to date index) the first two entries as November 1 and December 1. For ‘months’ 15 and 16 it did no interpretation and just reported the text as in the CSV.
The coercion is automatic with no option to turn it off (nor 'reversed' with a change of format). Various ways to address this (common) issue are mentioned in the link kindly provided by #Gerard Wilkinson. I am not providing a full solution here since (a) some things are under user control (eg the ‘1904’ system is an option, as is the choice whether MDY or DMY) and (b) the preferred route will depend to some extent on how often required and what the user is most familiar with.
I'm trying to read an Excel file in R, with two columns containing dates. Now here is my problem, when I view my data file in R, most of the dates are in the good format, but some were transformed into number that don't make sense at all. I joined images to show the different outputs from R/Excel.
(Only pay attention to the columns "ArrivalDate" and ActlFlightDate")
Output seen from R
Output seen from Excel
My question is, how, in R, can I make those numbers become the date they are supposed to be? Especially since the class of the elements in those columns are characters.
Thank you in advance!
Your dates in your excel file are different formats. That is why some are to left side of column and some are to right side of column in the spreadsheet. They are probably mixed between an actual excel date format and a text or general. You should copy the columns to new columns and highlight the new column and right click and change format until all your dates are uniform. Then you can import to R and have consistent dates data.
I need the following help if possible please let me know your comments
My ObjectTive:-
I had multiples .csv files one location.
all .csv files have different numbers of row(m) and column (n) i.e. m=!n
all csv files have an almost similar date (Calendar day & time stamps eg: 04/01/2016 7:01) but the interesting point is some data have some time stamps missing
All .csv files have following data common ( Open,High,Low, Close,Date).
My Objective is to import only "Close" column from all .csv files but each file have different numbers of rows as some time stamps data is missing in some files.
If on any case any time stamps data is missing but the previous present then repeat previous values.
If on any case any time stamps data is missing and the previous also missing then put 'NA' on it. This is only applicable for first few data points.
Here is my planning:-
Reading/Writing Files: We’ll need to implement a logic to read files in a certain fashion and then write separate files for different sets of instruments separately.
Inconsistent time series: You’ll notice that the time series is not consistent and continuous for some securities, so you need to generate your own datetime stamps and then fill data against each datestamp (wherever available).
Missing data points: There will be certain timestamps against which you don’t have the data, make your timeseries continuous by filling in the missing points with data from pervious timestamp.
Maybe try
read_in <- function(csv){
f <- read.csv(csv)
f <- f[!is.na(f$time_stamp),]
f$close
}
l <- lapply(csv_list, read_in)
df <- rbindlist(l)
I have data in excel and after reading in R it reads as follows
as
lob2 lob3
1.86E+12 7.58E+12
I want it as
lob2 lob3
1857529190776.75 7587529190776.75
This difference causes me to have different results after doing my analysis later on
How is the data stored in Excel (does it think it is a number, a string, a date, etc.)?
How are you getting the data from Excel to R? If you save the data as a .csv file then read it into R, look at the intermediate file, Excel is known to abbreviate when saving and R would then see character strings instead of numbers. You need to find a way to tell excel to export the data in the correct format with the correct precision.
If you are using a package (there are more than 1) then look into the details of that package for how to grab the numbers correctly (you may need to make changes in Excel so that it knows they are numbers).
Lastly, what does the str function on your R object say? It could be that R is storing the proper numbers and only displaying the short version as mentioned in the comments. Or, it could be that R received strings that did not convert nicely to numbers and is storing them as characters or factors. The str function will let you see how your data is stored in R, and therefore how to convert or display it correctly.