Export R data frame to MS Access - r

I am trying to export a data frame from R to MS Access but it seems to me that there is no package available to do this task. Is there a way to export a data frame directly to Access? Any help will be greatly appreciated.

The following works for medium sized datasets, but may fail if MyRdataFrame is too large for the 2GB limit of Access or conversion type errors.
library(RODBC)
db <- "C:Documents/PreviouslySavedBlank.accdb"
Mycon <- odbcConnectAccess2007(db)
sqlSave(Mycon, MyRdataFrame)

There is the ImportExport package.
The database has to already exist (at least in my case). So you have to create it first.
It has to be a access database 2000 version with extension .mdb
Here is an example:
ImportExport::access_export("existing_databse.mdb",as.data.frame(your_R_data),
tablename="bob")
with "bob" the name of the table you want to create in the database. Choose your own name of course and it has to be a non already existing table
It will also add a first column called rownames which is just an index column
Note that creating a .accdb file and then changing the extension to .mdb wont work ^^ you really have to open it and save it as .mdb. I added as.data.frame() but if your data is already one then no need.
There might be a way for .accdb files using directly sqlSave (which is used internally by ImportExport) and specifying the driver from the RODBC package. This is in the link in the comment from #BenJacobson. But the solution above worked for me and it was only one line.

Related

In R and Sparklyr, writing a table to .CSV (spark_write_csv) yields many files, not one single file. Why? And can I change that?

Background
I'm doing some data manipulation (joins, etc.) on a very large dataset in R, so I decided to use a local installation of Apache Spark and sparklyr to be able to use my dplyr code to manipulate it all. (I'm running Windows 10 Pro; R is 64-bit.) I've done the work needed, and now want to output the sparklyr table to a .csv file.
The Problem
Here's the code I'm using to output a .csv file to a folder on my hard drive:
spark_write_csv(d1, "C:/d1.csv")
When I navigate to the directory in question, though, I don't see a single csv file d1.csv. Instead I see a newly created folder called d1, and when I click inside it I see ~10 .csv files all beginning with "part". Here's a screenshot:
The folder also contains the same number of .csv.crc files, which I see from Googling are "used to store CRC code for a split file archive".
What's going on here? Is there a way to put these files back together, or to get spark_write_csv to output a single file like write.csv?
Edit
A user below suggested that this post may answer the question, and it nearly does, but it seems like the asker is looking for Scala code that does what I want, while I'm looking for R code that does what I want.
I had the exact same issue.
In simple terms, the partitions are done for computational efficiency. If you have partitions, multiple workers/executors can write the table on each partition. In contrast, if you only have one partition, the csv file can only be written by a single worker/executor, making the task much slower. The same principle applies not only for writing tables but also for parallel computations.
For more details on partitioning, you can check this link.
Suppose I want to save table as a single file with the path path/to/table.csv. I would do this as follows
table %>% sdf_repartition(partitions=1)
spark_write_csv(table, path/to/table.csv,...)
You can check full details of sdf_repartition in the official documentation.
Data will be divided into multiple partitions. When you save the dataframe to CSV, you will get file from each partition. Before calling spark_write_csv method you need to bring all the data to single partition to get single file.
You can use a method called as coalese to achieve this.
coalesce(df, 1)

Accessing large spreadsheets written from R

I am using the following R script to write the data. table into the excel file in my set directory. However, the size of the file is in GB's as the total rows are 50 million+. Hence upon opening the file, I just see a blank grey screen and nothing else.
How can I see the contents in the file?
The first line is just for illustration purpose.
Final1 <- rep(iris, times = 1000000)
fwrite(Final1,"data2.csv")
You mention that this is part of the report. I would be willing to bet good money that whoever will be reading this report will not check all or most of the values by hand. In which case, you don't need a format that is easily browsable, e.g. xlsx or even csv. If this indeed is the case, you might want to try a (relational) database. If you do not have anything centralized, you might want to give SQLite a try. You save everything into one file which acts as a database. There are packages that handle this interaction in R. You can try with sqldf or RSQLite.

Bulkload option exporting data from R to Teradata using RODBC

I have done many researches on how to upload a huge data in .txt through R to Teradata DB. I tried to use RODBC's sqlSave() but it did not work. I also followed some other similar questions posted such as:
Write from R to Teradata in 3.0 OR Export data frame to SQL server using RODBC package OR How to quickly export data from R to SQL Server.
However, since Teradata somehow is structured differently than MS SQL server, most of those options suggested are not applicable to my situation.
I know that there is a TeradataR package available but it has not been updated since like 2-3 years ago.
So here are my 2 main problems I am facing:
1. How to bulk load (all records at once) data in .txt format to Teradata using R if there is any way. (So far I only tried using SAS to do so, but I need to explore this in R)
2. The data is big like 500+ MB so I cannot load it through R, I am sure there is a way to go around this but directly pull data from server.
Here is what I tried according to one of posts but this was for MS SQL server:
toSQL = data.frame(...) #this doesn't work for me cause its too big.
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
sqlQuery(channel,"BULK
INSERT Yada.dbo.yada
FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n'
)");
*Note: there is an option in Teradata to insert/import data but that is the same as writing millions of rows of Insert statements.
Sorry that I do not have sample codes at this point since the package that I found wasn't the right one that I should use.
Anyone has similar issues/problems like this?
Thank you so much for your help in advance!
I am not sure if you figured out how to do this but I second Andrew's solution. If you have Teradata installed on your computer you can easily run the FastLoad Utility from the shell.
So I would:
export by data frame to a txt file (Comma separated)
create my fastload script and call the exported txt file from within the fastload scrip (you can learn more about it here)
run the shell command referencing my fastload script.
setwd("pathforyourfile")
write.table(mtcars, "mtcars.txt", sep = ",", row.names = FALSE,quote= FALSE, na = "NA",col.names = FALSE)
shell("fastload < mtcars_fastload.txt")
I hope this sorts your issue. Let me know if you need help especially on the fastloading script. More than happy to help.

Read a PostgreSQL local file into R in chunks and export to .csv

I have a local file in PostgreSQL format that I would like to read into R into chunks and export it as .csv.
I know this might be a simple question but I'm not at all familiar with PostgreSQL or SQL. I've tried different things using R libraries like RPostgreSQL, RSQLite and sqldf but I couldn't get my head around this.
If your final goal is to create a csv file, you can do it directly using PostgreSQL.
You can run something similar to this:
COPY my_table TO 'C:\my_table.csv' DELIMITER ',' CSV HEADER;
Sorry if I misunderstood your requirement.
The requirement is to programmatically create a very large .csv file from scratch and populate it from data in a database? I would use this approach.
Step 1 - isolate the database data into a single table with an auto incrementing primary key field. Whether you always use the same table or create and drop one each time depends on the possibility of concurrent use of the program.
Step 2 - create the .csv file with your programming code. It can either be empty, or have column headers, depending on whether or not you need column headers.
Step 3 - get the minimum and maximum primary key values from your table.
Step 4 - set up a loop in your programming code using the values from Step 3. Inside the loop:
query the table to get x rows
append those rows to your file
increment the variables that control your loop
Step 5 - Do whatever you have to do with the file. Don't try to read it with your programming code.

Command to use with easy way the insert of R dataframe

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())
You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

Resources