Bulkload option exporting data from R to Teradata using RODBC - r

I have done many researches on how to upload a huge data in .txt through R to Teradata DB. I tried to use RODBC's sqlSave() but it did not work. I also followed some other similar questions posted such as:
Write from R to Teradata in 3.0 OR Export data frame to SQL server using RODBC package OR How to quickly export data from R to SQL Server.
However, since Teradata somehow is structured differently than MS SQL server, most of those options suggested are not applicable to my situation.
I know that there is a TeradataR package available but it has not been updated since like 2-3 years ago.
So here are my 2 main problems I am facing:
1. How to bulk load (all records at once) data in .txt format to Teradata using R if there is any way. (So far I only tried using SAS to do so, but I need to explore this in R)
2. The data is big like 500+ MB so I cannot load it through R, I am sure there is a way to go around this but directly pull data from server.
Here is what I tried according to one of posts but this was for MS SQL server:
toSQL = data.frame(...) #this doesn't work for me cause its too big.
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
sqlQuery(channel,"BULK
INSERT Yada.dbo.yada
FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n'
)");
*Note: there is an option in Teradata to insert/import data but that is the same as writing millions of rows of Insert statements.
Sorry that I do not have sample codes at this point since the package that I found wasn't the right one that I should use.
Anyone has similar issues/problems like this?
Thank you so much for your help in advance!

I am not sure if you figured out how to do this but I second Andrew's solution. If you have Teradata installed on your computer you can easily run the FastLoad Utility from the shell.
So I would:
export by data frame to a txt file (Comma separated)
create my fastload script and call the exported txt file from within the fastload scrip (you can learn more about it here)
run the shell command referencing my fastload script.
setwd("pathforyourfile")
write.table(mtcars, "mtcars.txt", sep = ",", row.names = FALSE,quote= FALSE, na = "NA",col.names = FALSE)
shell("fastload < mtcars_fastload.txt")
I hope this sorts your issue. Let me know if you need help especially on the fastloading script. More than happy to help.

Related

How can I fix the 'line x did not have y elements' error when trying to use read.csv.sql?

I am a relative beginner to R trying to load and explore a large (7GB) CSV file.
It's from the Open Food Facts database and the file is downloadable here: https://world.openfoodfacts.org/data (the raw csv link).
It's too large to read straight into R and my searching has made me think the sqldf package could be useful. But when I try and read the file in with this code ...
library(sqldf)
library(here)
read.csv.sql(here("02. Data", "en.openfoodfacts.org.products.csv"), sep = "\t")
I get this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 10 did not have 196 elements
Searching around made me think it's because there are missing values in the data. With read.csv, it looks like you can set fill = TRUE and get around this. But I can't work out how to do this with the read.csv.sql function. I also can't actually open the csv in Excel to inspect it because it's too large.
Does anyone know how to solve this or if there is a better method for reading in this large file? Please keep in mind I don't really know how to use SQL or other database tools, mostly just R (but can try and learn the basics if helpful).
Based on the error message, it seems unlikely that you can read the CSV file en toto into memory, even once. I suggest for analyzing the data within it, you may need to change your data-access to something else, such as:
DBMS, whether monolithic (duckdb or RSQLite, lower cost-of-entry) or full DBMS (e.g., PostgreSQL, MariaDB, SQL Server). With this method, you would connect (using DBI) to the database (monolithic or otherwise), query for the subset of data you want/need, and work on that data. It is feasible to do in-database aggregation as well, which might be a necessary step in your analysis.
Arrow parquet file. These are directly supported by dplyr functions and in a lazy fashion, meaning that when you call open_dataset("path/to/my.parquet"), it immediately returns an object but does not load data; you call your dplyr mutate/filter/select/summarize pipe (some limitations), and then you finally call ... %>% collect(), only then it loads the resulting data into memory. Similar to SQL above in that you work on subsets at a time, but if you're already familiar with dplyr, it is much much closer than learning SQL from scratch.
There are ways to get a large CSV file into each of this.
Arrow/Parquet: How to convert a csv file to parquet (python,
arrow/drill), a quick search in your favorite search-engine should provide other possibilities; regardless of the language you want to do your analysis in ("R"), don't constrain yourself to solutions using that language.
SQL: DuckDB (https://duckdb.org/docs/data/csv.html), SQLite (https://www.sqlitetutorial.net/sqlite-import-csv/), and other DBMSes tend to have a "bulk" command for importing raw CSV.

How to open a PostgreSQL database dump in R

i have many files (with no extension) that i would like to open with R and extract data.frames.
it says in the header that it is a
--
m -- PostgreSQL database dump
It has many tables inside it, and the only noticeable pattern i detected is that and the end of each table it has a "." (see print screen)
is there a smart way to import this file and and extract/break it into meaningful dataframes?
Thank you in advance!

Accessing large spreadsheets written from R

I am using the following R script to write the data. table into the excel file in my set directory. However, the size of the file is in GB's as the total rows are 50 million+. Hence upon opening the file, I just see a blank grey screen and nothing else.
How can I see the contents in the file?
The first line is just for illustration purpose.
Final1 <- rep(iris, times = 1000000)
fwrite(Final1,"data2.csv")
You mention that this is part of the report. I would be willing to bet good money that whoever will be reading this report will not check all or most of the values by hand. In which case, you don't need a format that is easily browsable, e.g. xlsx or even csv. If this indeed is the case, you might want to try a (relational) database. If you do not have anything centralized, you might want to give SQLite a try. You save everything into one file which acts as a database. There are packages that handle this interaction in R. You can try with sqldf or RSQLite.

Export R data frame to MS Access

I am trying to export a data frame from R to MS Access but it seems to me that there is no package available to do this task. Is there a way to export a data frame directly to Access? Any help will be greatly appreciated.
The following works for medium sized datasets, but may fail if MyRdataFrame is too large for the 2GB limit of Access or conversion type errors.
library(RODBC)
db <- "C:Documents/PreviouslySavedBlank.accdb"
Mycon <- odbcConnectAccess2007(db)
sqlSave(Mycon, MyRdataFrame)
There is the ImportExport package.
The database has to already exist (at least in my case). So you have to create it first.
It has to be a access database 2000 version with extension .mdb
Here is an example:
ImportExport::access_export("existing_databse.mdb",as.data.frame(your_R_data),
tablename="bob")
with "bob" the name of the table you want to create in the database. Choose your own name of course and it has to be a non already existing table
It will also add a first column called rownames which is just an index column
Note that creating a .accdb file and then changing the extension to .mdb wont work ^^ you really have to open it and save it as .mdb. I added as.data.frame() but if your data is already one then no need.
There might be a way for .accdb files using directly sqlSave (which is used internally by ImportExport) and specifying the driver from the RODBC package. This is in the link in the comment from #BenJacobson. But the solution above worked for me and it was only one line.

RODBC error: SqlSave unable to append to table

we hit to an error with RODBC and SqlSave command. We are a bit confused what to do since the same SqlSave command works when data that we are trying to save to Sybase database is small (~under 10.000 rows). When trying to save bigger data (~200.000 rows) saving process starts without any problems but it crashes after few thousand rows is saved. Then we hit to this error message “unable to append to table..”
We use this kind of code:
library(RODBC)
channel <- odbcConnect("linfo-test", uid="DBA", pwd="xxxxxx", believeNRows=FALSE)
sqlSave(channel=channel, dat=matkat, tablename = "testitaulu", append = TRUE)
odbcClose(channel)
If someone has any idea why this happens only with bigger data and how we could fix this, we would be extremely grateful. We are lacking ideas ourselves.
sqlSave with append=TRUE pretty much never works. You will have to explicitly write an SQL INSERT INTO statement, which is unfortunate. Sorry for the bad news.
sqlSave is working but you have to be really careful with it.
You need to remember:
in R column names of your data frame and sql server table have to match EXACTLY and remember that R is case sensitive, even leading space or space at the end can make a difference
make sure that your data types in R and sql table are matching
that you are not missing any column to insert( even it is default in sql)
connection have to be set up properly, if sql user doesn't have permission to read and write in a destination table(used in sqlSave) it can also fail

Resources