When using ROracle to fetch data from a database, I am running into an issue trying to fetch large integers (up to 21 digit positions). the database column has format NUMBER(38,0).
Fetching them through a simple select does not work, the numbers get garbled from the 12th position on.
I can circumvent this by converting them to characters (to_char(COLUMN_NAME)), but this is far from ideal.
A solution from an oracle forum that converts to binary double (cast(COLUMN_NAME as binary_double)) does not work in my case.
Do you have a hint towards data types to use?
Related
How do you collect a Spark table into R using sparklyr while preserving long integers without having to convert them to strings beforehand?
My understanding is that R has an integer64 type that can be used to handle large integer values. Spark handles such values using its LongType but when I collect a Spark table with a Long I get a double on the R side. The issue with doubles is that they can lose precision.
I have attached an image to show the discrepancy that happens when collecting the data frame. If I convert the value into a string, it is collected perfectly. But if I don't, it turns into a double and then loses precision.
I was wondering if there were some Spark configs or some options I have to set somewhere to get sparklyr to collect it as an integer64.
I've been using the REDCapR package to read in data from my survey form. I was reading in the data with no issue using redcap_read until I realized I needed to add a field restriction to one question on my survey. Initially it was a short answer field asking users how many of something they had, and people were doing expectedly annoying things like spelling out numbers or entering "a few" instead of a number. But all of that data read in fine. I changed the field to be a short answer field (same type as before) that requires the response to be an integer and now the data won't read into R using redcap_read.
When I run:
redcap_read(redcap_uri=uri, token=api_token)$data
I get the error message that:
Column [name of my column] can't be converted from numeric to character
I also noticed when I looked at the data that that it read in the 1st and 6th records of that column (both zeros) just fine (out of 800+ records), but everything else is NA. Is there an inherent problem with trying to read in data from a text field restricted to an integer or is there another way to do this?
Edit: it also reads the dates fine, which are text fields with a date field restriction. This seems to be very specific to reading in the validated numbers from the text field.
I also tried redcapAPI::exportRecords and it will continue to read in the rest of the dataset, but reads in NA for all values in the column with the test restriction.
Upgrade REDCapR to the version on GitHub, which stacks the batches on top of each other before determining the data type (see #257).
# install.packages("remotes") # Run this line if the 'remotes' package isn't installed already.
remotes::install_github(repo="OuhscBbmc/REDCapR")
In your case, I believe that the batches (of 200 records, by default) contain different different data types (character & numeric, according to the error message), which won't stack on top of each other silently.
The REDCapR::redcap_read() function should work then. (If not, please create a new issue).
Two alternatives are
calling redcap_read_oneshot with a large value of guess_max, or
calling redcap_read_oneshot with guess_type = TRUE.
I have a column containing user entry descriptions, these descriptions can be anything however i do need them sorted into a logical order.
The text can be anything like
16 to 26 months
40 to 60 months
Literacy
Mathematics
When i order these in sql statement the text items return fine. However any beginning with numbers come back in an order not logical
i.e.
16 to 26 months
will be before
8 to 20 months
i understand why as it takes first character etc but don't know how to alter sql statement (using sqlite) to improve the performance without messing up the entries beginning with text
When i cast to numeric the numbers are fine the items beginning with text go wrong
Thanks
What you need is sorting the values in "natural order". To achieve this you will need to implement your own collating sequence; SQLite doesn't provide one for this case.
There are some questions (and answers) regarding this topic here on SO, but they are for other RDBMS. The best I could find in a quick search was this:
http://wiki.ozanh.com/doku.php?id=python:database:sqlite:how_to_natural_sort
You should think about improving your table schema, e. g. splitting the period into separate integer columns (monthsMin, monthsMax) instead of using text, which would make sorting much easier. You can always build a string from this values if necessary.
I have data in excel and after reading in R it reads as follows
as
lob2 lob3
1.86E+12 7.58E+12
I want it as
lob2 lob3
1857529190776.75 7587529190776.75
This difference causes me to have different results after doing my analysis later on
How is the data stored in Excel (does it think it is a number, a string, a date, etc.)?
How are you getting the data from Excel to R? If you save the data as a .csv file then read it into R, look at the intermediate file, Excel is known to abbreviate when saving and R would then see character strings instead of numbers. You need to find a way to tell excel to export the data in the correct format with the correct precision.
If you are using a package (there are more than 1) then look into the details of that package for how to grab the numbers correctly (you may need to make changes in Excel so that it knows they are numbers).
Lastly, what does the str function on your R object say? It could be that R is storing the proper numbers and only displaying the short version as mentioned in the comments. Or, it could be that R received strings that did not convert nicely to numbers and is storing them as characters or factors. The str function will let you see how your data is stored in R, and therefore how to convert or display it correctly.
My data tends to be medium to large but never qualifies as "BIG" data. The data is almost always complexly relational. For the purposes I'm talking about here, 10-50 tables with a total size of 1-10 GB. Nothing more. When I deal with data bigger than this, I'll stick it into Postgres or SQL Server.
Overall, I like SQLite, but the data I work with has lots and lots of date / datetime fields and dealing with date fields in SQLite makes my head hurt and when I move data back and forth between R and SQLite, my dates often get mangled.
I am either looking for a file-based alternative to SQLite that is easy to work with from R.
OR
Better techniques/packages for moving data in/out of SQLite and R without mangling the dates. My goal is to stop mangling my dates. For example, when I use dbWriteTable from the RSQLite package my dates are usually messed up in a way that makes them impossible to work with.
My primary workstation is running Ubuntu but I work in an office dominated by Windows. If suggesting an alternative to SQLite, +++ for an alternative that works on both platforms (or more).
Use epoch times and dates (days from origin, seconds from origin). The conversion using epochs into R POSIXct or Date is fast (strings are very slow).
Edit: Another alternative, after re-reading and considering the size of your data:
You could simply save the tables directly in R format, perhaps with a small piece of extra metadata describing the key relationships between tables. You would have to create your own conventions and all, but it's definitely smoother (no impedance mismatches).
Also, I'm personally very partial to the package data.table. It's fast and has a syntax which is pure R but has a nice mapping onto SQL concepts. E.g. in dt[i, j, by=list(...)], i corresponds to "where", j correspond to "select", and by to "group by" and there are facilities for joins as well, although I wrote infix wrappers around those so it was easier to remember.
I typically do my data processing work exclusively in R (after an initial pull from SQLITE), and I find data.table more faster and practical than massive SQLDF queries.
http://datatable.r-forge.r-project.org/
sqlite wants to read the data in the standard format "YYYY-MM-DD HH:MM:SS" (you can omit the time part if you don't need it)---I don't know of a way to read arbitrary date strings. This results in a normalized date being stored.
On output, you want to format the date using sqlite functions to whatever your other software needs---check the options of strftime().
For instance, Octave likes the day number since year 0, so if I have a table mydata with column "date", I'd do
select julianday(mydate)-1721059.666667 from mydata
The magic number is julianday("0000-01-01T00:00:00-04:00") and compensates for the fact that julianday starts in year 4017BC or something like that, whereas Octave counts from year 0.