Refer database tables in R - r

I have a database name Team which has 40 tables . How can I connect to that database and refer to particular table without using sqlquerry. By the use of R data Structures.

I am not sure what do you mean with "How can I connect to that database and refer to particular table without using sqlquerry".
I am not aware of a way to "see" DB tables as R dataframes or arrays or whatever without importing the tuples first through some sort of query (in SQL) - this seems to be the most practical way to use R with DB data (without going to the hassle of exporting these as .csv files first, and re-read them in R).
There are a couple ways to import data from a DB to R, so that the result of a query becomes a R data structure (including proper type conversion, ideally).
Here is a short guide on how to do that with SQL-R
A similar brief introduction to the DBI family

Related

Can we create a temporary table, getting a result in oracle through R and delete table afterwards?

I have a .sql file that basically creates a temporary table using multiple joins to form the data needed. Then I manually copy the result, paste it in Excel and delete the table afterwards in Oracle DB.
As the query is big, I don't think it would be a good idea to write an oracle query in R.
Is there any way by which I can directly run that .sql file through Rstudio and store the result in the data frame?
I don't know R.
However, consider moving code you have into a stored procedure. You'd then - in a single line (hopefully) - call that procedure from R. It would do its job (populate the table) and you'd just use its contents in R.

Writing date and time into SQL database in R

I am trying to create a SQL database using a data set with a column that has both date and time included in it. The problem that I am running into is when the data is written into a SQL database, and read back into R the date and time column end up having a numeric structure rather than a Posixct structure or does not show the date and times correctly.
I have been using the RSQlite and DBI package to work between the two. I just started working with SQL, is there an appropriate way in reading date and time columns into a SQL database?
Thank you for your time.
SQLite does not support date and time types. Here are some options:
convert the date/time fields back to R classes yourself. You could write a separate function for each table read into R that reads in the table and does the conversion transparently or you could adopt a naming convention for the columns that lets a single function perform the conversion according to the naming rules if you control the database itself. Another way to implement a naming convention, other than writing your own function, is to use the sqldf package. If you use sqldf("select ...", dbname = "mydb", connection = con, method = "name__class") it will convert every column whose name has two underscores to the class name after the two underscores. Note that name__class has two underscores as well.
the dbmisc package (also see this) can perform conversions. You must prepare a schema, i.e. layout specification, for each table, as described there, in order to use it.
Use a different database that does support date/time types. I usually use the H2 database in such cases. The RH2 package includes the entire H2 database software right in the R driver package in a similar manner to RSQLite.
As per a comment below, the latest version of RSQLite has support for time and date fields; however, note that that that is on the R side, like the other solutions above (except using H2), and does not change the fact that SQLite itself has no such support so, for example, if you use SQL to modify such a field such as adding 1 to get the next date it will no longer be of the same type.

Filtering data while reading from S3 to Spark

We are moving to AWS EMR/S3 and using R for analysis (sparklyr library). We have 500gb sales data in S3 containing records for multiple products. We want to analyze data for couple of products and want to read only subset of file into EMR.
So far my understanding is that spark_read_csv will pull in all the data. Is there a way in R/Python/Hive to read data only for products we are interested in?
In short, the choice of the format is on the opposite side of the efficient spectrum.
Using data
Partitioned by (partitionBy option of the DataFrameWriter or correct directory structure) column of interest.
Clustered by (bucketBy option of the DataFrameWriter and persistent metastore) on the column of interest.
can help to narrow down the search to particular partitions in some cases, but if filter(product == p1) is highly selective, then you're likely looking at the wrong tool.
Depending on the requirements:
A proper database.
Data warehouse on Hadoop.
might be a better choice.
You should also consider choosing a better storage format (like Parquet).

How to retrieve a BigQuery table with 100+GB size to R

I currently have a table in BigQuery with a size of 100+GB that I would like to retrieve to R. I am using the list_tabledata() function in bigrquery package in R, but it takes a huge amount of time.
Anyone has recommendation on handling this large amount of data in R, and how to boost the performance? Like any packages, tools?
tabledata.list is not a great way to consume a large amount of table data from BigQuery - as you note, it's not very performant. I'm not sure if bigrquery has support for table exports, but the best way to retrieve data from a large BigQuery table is using an export job. This will dump the data to a file on Google Cloud Storage that you can then download to your desktop. You can find more info on exporting tables in our documentation.
Another option, would be: instead of bringing that large volume of data to code - try to bring your code to data. This can be challenging in terms of implementing logic in BQL. JS UDF might help. It depends.
In case if this is not doable - i would recommend either use sampled data or revisit your model

How to import a data frame in RSQLite with specifying column's constrains?

I am trying to put a large data frame into a new table of a database. It could be done simply done via:
dbWriteTable(conn=db,name="sometablename",value=my.data)
However, I want to specify the Primary keys, foreign keys and the column Types like Numeric, Text and so on.
Is there any thing I can do? Should I create a table with my columns first and then add the data frame into it?
RSQlite assumes you have already your data.frame table all set before writing it to disk. There is not much to specify in the writing query. So, I visualise two ways, either before firing a query to write it, or after. I usually write the table from R to disk, then I polish it using dbGetQuery to alter table attributes. The only problem with this workflow is that Sqlite has very limited feature for altering tables.

Resources