I am new with R and I am facing this problem:
I have a large dataset (both csv file and Rdata file) that contains some date and time columns.
section of the dataset
I should do some calculation and some data visualization with it, but problems arises with the convertedTime column. I should visualize it as "minutes: seconds(with one decimal) as I indeed visualize them in the csv file if I open it with excel. I should work with the same format as shown in the excel file.
Excel sample of timeConverted.
When I load the data in R (I have tried both formats) the convertedTime are expressed in their full format. How can I convert them into the %M:%OS1 ?
keyData <- read.csv('keyPressDataWithLaneDeviation.csv')
print(head(keyData))
library(dplyr)
keyDataNoError <- filter(keyData, typingErrorMadeOnTrial ==0)
print(head(keyDataNoError))
strptime(keyDataNoError$timeConverted, format = "%M:%0S1")
print(head(keyDataNoError))
After i filter the dataset i try to format the time without results. The output of the last 2 prints are identical. Where am I wrong?
Another thing that I tried to do is load the Rdata file instead. But with the Rdata file I don't even get decimals in the ConvertedTime column and i really do not understand why.
Rdata file
csv file
You're looking for strftime, not strptime.
strftime(keyDataNoError$timeConverted, format = "%M:%0S1")
e.g.:
a<-"2018-02-24 11:30:05.105"
strftime(a, format="%M:%OS1")
[1] "30:05.1"
strftime(a, format="%M:%OS3")
[1] "30:05.105"
strftime(a, format="%M:%OS5")
[1] "30:05.10500"
Note that strftime outputs a character class object, not a POSIXt class.
Related
I have a series of massive data files that range in size from 800k to 1.4M rows, and one variable in particular has a set length of 12 characters (numeric data but with leading zeros where other the number of non-zero digits is fewer than 12). The column should look like this:
col
000000000003
000000000102
000000246691
000000000042
102851000324
etc.
I need to export these files for a client to a CSV file, using R. The final data NEEDS to retain the 12 character structure, but when I open the CSV files in excel, the zeros disappear. This happens even after converting the entire data frame to character. The code I am using to do this is as follows.
df1 %>%
mutate(across(everything(), as.character))
##### I did this for all data frames #####
export(df1, "df1.csv")
export(df2, "df2.csv")
....
export(df17, "df17.csv)
I've read a few other posts that say this is an excel problem, and that makes sense, but given the number of data files and amount of data, as well as the need for the client to be able to open it in excel, I need a way to do it on the front end in R. Any ideas?
Yes, this is definitely an Excel problem!
To demonstrate, In Excel enter your column values save the file as a CSV value and then re-open it in Excel, the leading zeros will disappear.
One option is add a leading non-numerical character such as '
paste0("\' ", df$col)
Not a great but an option.
A slightly better option is to paste Excel's Text function to the character string. Then Excel will process the function when the function is opened.
df$col <- paste0("=Text(", df$col, ", \"000000000000\")")
#or
df$col <- paste0("=\"", df$col, "\"")
write.csv(df, "df2.csv", row.names = FALSE)
Of course if the CSV file is saved and reopened then the leading 0 will again disappear.
Another option is to investigate saving the file directly as a .xlsx file with the "writexl", or "XLSX" or similar package.
I want to import a excel file into r and the file contains a column with date and time in this form:
20.08.2018 16:32:20
If I change to standard format in the csv file itself it looks like this:
43332,68912
If I read in the file by using read_excel() R this date looks like this:
43332.689120370371
How can I turn the current format into a date format in R?
It is a good practice not to edit anything in a .csv (or excel) file—so to treat them as read only—and to make changes in a script (so in R).
Let's call your data frame "my_df" and your datetime variable "date".
library(readr)
library(magrittr)
my_df$date %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Edit: Trying to piece together information from your comments, I created an excel file with one column called STARTED with date and time in the form 20.08.2018 16:32:20 as you indicate in the question. Since you seem to like readxl:
library(readxl)
library(magrittr)
myData <- read_excel("myData.xlsx")
myData$STARTED %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Which is the same code I already wrote above. This gives:
# A tibble: 1 x 1
STARTED
<dttm>
1 2018-08-20 16:32:20
If you only get NA, your data is not in the format given by your example 20.08.2018 16:32:20.
Following your discussion with #prosoitos, it looks like the import function cannot make sense of your date column:
Your line of example data in the comments contains no quotes around your date string. That implies that you copied that data either by opening it with excel (or similar) or your survey tool does not qualify dates as strings. Did you open our .csv in excel, saved it as .xlsx and tried to import the result in R? That would explain the mess you get, as excel could try to interpret the date strings and convert them to some funny Microsoft format nobody else uses.
Please don't do that, use a raw csv-file that was never touched with excel and import it directly into R.
Your read function obviously does not understand the content of your date variable and apparently replaces it with some unix standard time, which are seconds since 1970. However, it looks like those time stamps are invalid (43332 is something like noon on 1970/01/01), else you could easily transform them to human readable dates.
I suggest you try importing your csv with:
read.csv("your_data.csv", header=TRUE, stringsAsFactors=FALSE)
You may have to specify your seperator, e.g. sep = "\t" (for a tab-seperated file), if it is not whitespace, which is the default seperatpr of the read function. After that, the dates in your dataframe are simple text strings and you can follow up with what #prosoitos said.
(Sorry for adding an additional answer. I would have commented to #prosoitos answer, but I have insufficent reputation points.)
Read CSV into R MyData
read.csv(file="TheDataIWantToReadIn.csv", header=TRUE, sep=",")
I am using openxlsx package to write back a file. I have already used as.date and format function to make my fates look like dd-mmm-yyyy.
However, when I open the Excel file, even though the date comes as, say "12-may-2018", I cannot filter them out like Excel dates. It shows that the type of the data is general.
Even if I convert it to date format in Excel, it still doesn't let me filter it out by year, month, and day, which happens for Excel dates. I can convert them to date type by manually placing my cursor in the middle of a cell and pressing the return key.
Doing that for the whole data will be too much manual effort which I want to reduce. Is there any way to make it happen. Thanks for any suggestions that you guys give.
Here is my code:
data$datecolumns <- as.date(as.numeric(data$datecolumn), origin = origin - somenumberforcalibartion, format = "%d")
data$datecolumn <- format(data$datecolumn, format = "%d-%b-%Y")
write.xlsx(data, filename)
Here, datecolumn is being read in Excel numeric format.
I just saw a code snippet where the date was being read from CSV as string converted to POSIXct and then again written as CSV is being read as date in Excel. Haven't found anything for Xlsx yet.
Format function makes the date back to string. Which was causing the whole issue, hence remove format function and things work fine. #Tjebo #Roman Lustrik helped me for this.
Using openxlsx read.xlsx to import a dataframe from a multi-class column. The desired result is to import all values as strings, exactly as they're represented in Excel. However, some decimals are represented as very long floats.
Sample data is simply an Excel file with a column containing the following rows:
abc123,
556.1,
556.12,
556.123,
556.1234,
556.12345
require(openxlsx)
df <- read.xlsx('testnumbers.xlsx', )
Using the above R code to read the file results in df containing these string
values:
abc123,
556.1,
556.12,
556.12300000000005,
556.12339999999995,
556.12345000000005
The Excel file provided in production has the column formatted as "General". If I format the column as Text, there is no change unless I explicitly double-click each cell in Excel and hit enter. In that case, the number is correctly displayed as a string. Unfortunately, clicking each cell isn't an option in the production environment. Any solution, Excel, R, or otherwise is appreciated.
*Edit:
I've read through this question and believe I understand the math behind what's going on. At this point, I suppose I'm looking for a workaround. How can I get a float from Excel to an R dataframe as text without changing the representation?
Why Are Floating Point Numbers Inaccurate?
I was able to get the correct formats into a data frame using pandas in python.
import pandas as pd
test = pd.read_excel('testnumbers.xlsx', dtype = str)
This will suffice as a workaround, but I'd like to see a solution built in R.
Here is a workaround in R using openxlsx that I used to solve a similar issue. I think it will solve your question, or at least allow you to format as text in the excel files programmatically.
I will use it to reformat specific cells in a large number of files (I'm converting from general to 'scientific' in my case- as an example of how you might alter this for another format).
This uses functions in the openxlsx package that you reference in the OP
First, load the xlsx file in as a workbook (stored in memory, which preserves all the xlsx formatting/etc; slightly different than the method shown in the question, which pulls in only the data):
testnumbers <- loadWorkbook(here::here("test_data/testnumbers.xlsx"))
Then create a "style" to apply which converts the numbers to "text" and apply it to the virtual worksheet (in memory).
numbersAsText <- createStyle(numFmt = "TEXT")
addStyle(testnumbers, sheet = "Sheet1", style = numbersAsText, cols = 1, rows = 1:10)
finally, save it back to the original file:
saveWorkbook(testnumbers,
file = here::here("test_data/testnumbers_formatted.xlsx"),
overwrite = T)
When you open the excel file, the numbers will be stored as "text"
I have an xlsx file with a number of variables (columns). Quite a few are listed in date format (MM/DD YYYY HH:MM A/P) in the .xlsx file. When I load this file into R using read.xlsx, all of the variables with date format load as POSIXct except ONE, which always loads as a factor variable. Any thoughts on why this may be?
For reference I am loading the data using code similar to that below:
data <- read.xlsx("file.xlsx", sheetIndex = 1, header = TRUE)
Well, I figured it out! Turns out one of the entries for this variable (of the hundreds) was entered slightly incorrectly in the xlsx file (it was listed as 15:00 PM, an impossible time!), which threw off the xlsx package I suppose. Once fixed, the data pull results in a column with POSIXct entries.
Hope this helps anyone else in the future encountering a similar problem!