Tearing my hair out on this one. Took me hours just to get rJava up and running (because mac OS X el capitan was not wanting to play nice with Java) in order to load excel-specific data importing packages etc. But in the end this hasn't helped my problem, and I'm just about at my wits end. Please help.
Basic situation is this:
Have simple excel data of time durations, over a span of a couple of years. So the two columns I'm importing are the time(duration) and year(2016,2017 etc).
In Excel the data is formatted as [h]:mm:ss so it displays correctly (data is related to number of hours worked in a month, so typically something like 80:xx:xx ~ 120:xx:xx). I'm aware that in excel, despite the cells being formatted as above, and only showing the relevant period of hours, that in reality excel has appended an (irrelevant, arbitrary) date to this hours data. I have searched and searched and found no way around this limitation in the way excel handles dates/times/durations.
I import this data into R via the "import data -> import from excel data set" menu item in R commander GUI, not the console.
However when importing the data into R, the data displays as a single number e.g. approx. 110 hrs is converted to 4.xxxxx, not as hh:mm:ss. So when running analyses and generating plots etc, instead of the actual (meaningful) 110:xx:xx type data being displayed, a completely meaningless 4.xxxxxx is displayed.
If I change the formatting of the excel cells to display the date as well as the time rather than use the [h]:mm:ss cell formatting, R erroneously interprets the data to something equally useless, like 1901/02/04 05:23 am
I have installed and loaded a variety of packages such as xlsx, XLConnect, lubridate etc but it hasn't made any difference to how R interprets the excel data on import, from the GUI at least.
Please tell me how do I either
a) edit the raw data to a format that R will understand as a time duration (and nothing but a time duration) in hh:mm:ss format, or
b) format the current data from within R after importing, so that it displays the data in the correct way rather than a useless number or arbitrary date/time?
[Please note: I can use the console, when given the commands etc needed to be executed. But I need to find a solution that ultimately will allow the data to be imported and/or manipulated from within the GUI, not from typing a bunch of commands into the console, as the end user (not me) has zero programming ability and cannot use a console, and will only ever be using R via the GUI.]
Is your code importing the data from excel as seconds?
library(lubridate)
duration <- lubridate::as.duration(400000)
as.numeric(duration, "hours")
111.1111
as.numeric(duration, "days")
4.62963
seconds_to_period(400000)
"4d 15H 6M 40S"
Related
I'm working with a collection of big data sets https://catalogue.data.gov.bc.ca/dataset/vegetation-composite-polygons and am running into issues with st_read taking a long time to load in data. I'm running R Studio 1.2.1335 and R 3.6.1 on a 2018 MacBook Pro with 32 GB of RAM.
The data in the above link only represents 1/8 of the total data I have (I have data for 2008-2018).
Currently, loading in the unformatted data with st_read and .gdb requires most of the day (5-8 hours). After I format and trim the data to include only what I want, loading in the data only takes 5-10 minutes.
I am interested in solutions to avoid the process taking 10 days to load in each layer of the .gdb and formatting the data. Ideally, I'd be able to load in all the layers at once and format all 62,795,746 rows at the same time (taken from st_layers).
I've tried Loading data using .gdb, .shp, and .gpkg.
.gpkg is fast once I get the data formatted.
If the query argument in sf::st_read() isn't working, you could potentially import it in using the R-ArcGIS bridge. This assumes you have a licensed copy of ArcGIS Pro. arc.select() has a fields argument where you can specify which fields you want, and a where_clause argument for an SQL where expression. If the data are available through AGOL, you can also add a spatial filter, where the querying is done on the server (see arcpullr).
I have an issue very annoying.
I have some oxygen measurements saved in .xlsx table (created directly by the device software). Opened with excel, this is my part of my file.
In the first picture, we can notice that sometimes, the software skips a second (11:13:00 then 13:02).
in the second picture, just notice the continuity of time from 11:19:01 to 11:19:09.
I call my excel table in R with the package readxl with the code
oxy <- read_excel("./Metabolism/20180502 DAPH 20.xlsx" , 1)
And before any manipulation, when I check my table in R (Rstudio), I have that:
In the first case, R kept the time continuity by adding 11:13:01 and shift the next rows.
Then, later, reverse situation: the continuity of time was respected in excel, but R skips a second and again, shits the next rows.
At the end, there is the same number of rows. I guess it is a problem with the way R and excel round the time. But these little errors prevent me using the date to merge two tables, and the calculations afterwards are wrong.
May I do something to tell R to read the data exactly the same way Excel saved them?
Thank you very much!
Index both with a sequential integer counter each starting at the same point and use that for merging like with like. If you want the Excel version to be 'definitive' convert the index back to time with a lookup based on your Excel version.
I have built a script that extracts data from numerous backend systems and I want the output to be formatted like this:
Which package is the best and/or easiest to do this with?
I am aware of the xlsx package, but am also aware that there are others available so would like to know which is best in terms of ease and/or simplicity to achieve my desired output.
A little more detail:
If I run the report across seven days, then the resulting data frame is 168 rows deep (1 row represents 1 hour, 168 hours per week). I want each date (00:00 - 23:00) to be broken out into day-long blocks, as per the image I have provided.
(Also note that I am in London, England, and as such am currently in timezone UTC+1, which means that right now, the hourly breakdown for each date will range from 01:00 - 00:00 on the next day because our backend systems run on the UTC timezone and that is fine.)
At present, I copy and paste (transpose) the values across manually, but want to be able to fully automate the process so that I can run the script (function), and have the resulting output looking like the image.
This is what the current final output looks like:
Try the package openxlsx. The package offers a lot of custom formatting for .xslx documents and is actively developed / fairly responsive to githhub issues. The vignettes on the cran website are particularly useful.
Recently I obtained a lot of RTF files that contain econ data for analysis I need to do. Unfortunately, this is how Bureau of Statistics of my country could help with time series data for a long time lapse. If there is a one time need to select particular indicator for 10 years or so I'm OK to find these values by hand using Word/Notepad/TestEdit(for Mac). But my problem is that I have 15 files with data that I need to combine somehow in one dataset for my work. But, before even start doing this I don't have a clue if it is possible to read those files in appropriate format (data.frame). Wanted to ask expert opinion on how to approach this task. An example of one of files could be downloaded from here:
[https://www.dropbox.com/s/863ikx6poid8unc/Export_for_SO.rtf?dl=0][1]
All values are in Russian. Dataset represents export of particular product (first column) across countries (second column) in US dollars for 2 periods.
Thank you.
Use code found on https://datascienceplus.com/how-to-import-multiple-csv-files-simultaneously-in-r-and-create-a-data-frame/
replace read_csv with read_rtf
You may want to manually convert your files to another format using an office suite or a text editor. You should be able to save as in another format.
While in R, you may want to give striprtf a try. I'm guessing you will still have to clean your data a bit afterward.
You can install the package like this:
install.packages("striprtf")
I'm trying to import a table from Microsoft Access (.accdb) to R.
The code that I use is:
library(RODBC)
testdb <- file.path("modelEAU Database V.2.accdb")
channel <- odbcConnectAccess2007(testdb)
WQ_data <- sqlFetch(channel, "WaterQuality")
It seems that it works but the problem is importing date and time data. Into the Access file there are two columns, one with date field (dd/mm/yyyy) and another one with time field (hh:mm:ss) and when I import them in R, in date column appears the date with yyyy-mm-dd format and into the time column the format is 1899-12-30 hh:mm:ss. Also, R can't recognise these formats as a variable and I can't work with them.
Also, I tried the mdb.get function but it didn't work as well.
Does somebody know how to import the data in R from Access defining the date and time format ? Any idea how to import the Access file as a text file?
Note: I'm working with with Office 2010 and R version 2.14.1
Thanks a lot in advanced.
Look at the result of runing str on your data frame. That will tell you more about how the data is actually stored. Generally dates and times are stored as a number from an origin date (Access uses 30 Dec. 1899 because MS thought that 1900 was a leap year). Sometimes it is stored as the number of days since the origin with time being represented as a fraction of the day, other times it is the number of seconds (or miliseconds) since the origin.
You will need to see how the data was sent (whether access and odbc converted to strings first, or sent days or seconds), then you will have a better feel for how to work with these (possibly converting) in R.
There is an article in the June 2004 edition of R News (the predecesor to the R Journal) that details the common ways to deal with dates and times in R and could be very useful to you.
You should decide what you want to end up with, a single column of DateTimes, 2 columns with numbers, 2 columns with characters, etc.