we hit to an error with RODBC and SqlSave command. We are a bit confused what to do since the same SqlSave command works when data that we are trying to save to Sybase database is small (~under 10.000 rows). When trying to save bigger data (~200.000 rows) saving process starts without any problems but it crashes after few thousand rows is saved. Then we hit to this error message “unable to append to table..”
We use this kind of code:
library(RODBC)
channel <- odbcConnect("linfo-test", uid="DBA", pwd="xxxxxx", believeNRows=FALSE)
sqlSave(channel=channel, dat=matkat, tablename = "testitaulu", append = TRUE)
odbcClose(channel)
If someone has any idea why this happens only with bigger data and how we could fix this, we would be extremely grateful. We are lacking ideas ourselves.
sqlSave with append=TRUE pretty much never works. You will have to explicitly write an SQL INSERT INTO statement, which is unfortunate. Sorry for the bad news.
sqlSave is working but you have to be really careful with it.
You need to remember:
in R column names of your data frame and sql server table have to match EXACTLY and remember that R is case sensitive, even leading space or space at the end can make a difference
make sure that your data types in R and sql table are matching
that you are not missing any column to insert( even it is default in sql)
connection have to be set up properly, if sql user doesn't have permission to read and write in a destination table(used in sqlSave) it can also fail
Related
I am a relative beginner to R trying to load and explore a large (7GB) CSV file.
It's from the Open Food Facts database and the file is downloadable here: https://world.openfoodfacts.org/data (the raw csv link).
It's too large to read straight into R and my searching has made me think the sqldf package could be useful. But when I try and read the file in with this code ...
library(sqldf)
library(here)
read.csv.sql(here("02. Data", "en.openfoodfacts.org.products.csv"), sep = "\t")
I get this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 10 did not have 196 elements
Searching around made me think it's because there are missing values in the data. With read.csv, it looks like you can set fill = TRUE and get around this. But I can't work out how to do this with the read.csv.sql function. I also can't actually open the csv in Excel to inspect it because it's too large.
Does anyone know how to solve this or if there is a better method for reading in this large file? Please keep in mind I don't really know how to use SQL or other database tools, mostly just R (but can try and learn the basics if helpful).
Based on the error message, it seems unlikely that you can read the CSV file en toto into memory, even once. I suggest for analyzing the data within it, you may need to change your data-access to something else, such as:
DBMS, whether monolithic (duckdb or RSQLite, lower cost-of-entry) or full DBMS (e.g., PostgreSQL, MariaDB, SQL Server). With this method, you would connect (using DBI) to the database (monolithic or otherwise), query for the subset of data you want/need, and work on that data. It is feasible to do in-database aggregation as well, which might be a necessary step in your analysis.
Arrow parquet file. These are directly supported by dplyr functions and in a lazy fashion, meaning that when you call open_dataset("path/to/my.parquet"), it immediately returns an object but does not load data; you call your dplyr mutate/filter/select/summarize pipe (some limitations), and then you finally call ... %>% collect(), only then it loads the resulting data into memory. Similar to SQL above in that you work on subsets at a time, but if you're already familiar with dplyr, it is much much closer than learning SQL from scratch.
There are ways to get a large CSV file into each of this.
Arrow/Parquet: How to convert a csv file to parquet (python,
arrow/drill), a quick search in your favorite search-engine should provide other possibilities; regardless of the language you want to do your analysis in ("R"), don't constrain yourself to solutions using that language.
SQL: DuckDB (https://duckdb.org/docs/data/csv.html), SQLite (https://www.sqlitetutorial.net/sqlite-import-csv/), and other DBMSes tend to have a "bulk" command for importing raw CSV.
I'm pretty novice with Mongolite, and I'm trying to query my database which is server side. However, I'm running into an issue that all of my queries are returning no data.
To connect, I run the following code:
con <- mongo(db="terrain", url="mongodb://22.92.59.149:27017")
After that, I run the following code, and get this output:
con$count('{}')
0
con$find('{}')
data frame with 0 columns and 0 rows
When I export the database from the command line as a csv, it exports the exact same file that was imported via command line, so as far as I can tell it is on my end. Additionally, I believe I am looking in the correct location because when I listCollections, I get a list of all of the databases:
admin <- mongo(db="admin", url="mongodb://22.92.59.149:27017")
admin$run('{"listDatabases":1}')
This segment of code outputs a list of all of the databases, their size on the disc, and whether they are empty or not. This is the same result if you query the DB names from the command line.
What am I missing here? I'm sure its something very simple.
Thanks in advance!
I am trying to export a data frame from R to MS Access but it seems to me that there is no package available to do this task. Is there a way to export a data frame directly to Access? Any help will be greatly appreciated.
The following works for medium sized datasets, but may fail if MyRdataFrame is too large for the 2GB limit of Access or conversion type errors.
library(RODBC)
db <- "C:Documents/PreviouslySavedBlank.accdb"
Mycon <- odbcConnectAccess2007(db)
sqlSave(Mycon, MyRdataFrame)
There is the ImportExport package.
The database has to already exist (at least in my case). So you have to create it first.
It has to be a access database 2000 version with extension .mdb
Here is an example:
ImportExport::access_export("existing_databse.mdb",as.data.frame(your_R_data),
tablename="bob")
with "bob" the name of the table you want to create in the database. Choose your own name of course and it has to be a non already existing table
It will also add a first column called rownames which is just an index column
Note that creating a .accdb file and then changing the extension to .mdb wont work ^^ you really have to open it and save it as .mdb. I added as.data.frame() but if your data is already one then no need.
There might be a way for .accdb files using directly sqlSave (which is used internally by ImportExport) and specifying the driver from the RODBC package. This is in the link in the comment from #BenJacobson. But the solution above worked for me and it was only one line.
I have done many researches on how to upload a huge data in .txt through R to Teradata DB. I tried to use RODBC's sqlSave() but it did not work. I also followed some other similar questions posted such as:
Write from R to Teradata in 3.0 OR Export data frame to SQL server using RODBC package OR How to quickly export data from R to SQL Server.
However, since Teradata somehow is structured differently than MS SQL server, most of those options suggested are not applicable to my situation.
I know that there is a TeradataR package available but it has not been updated since like 2-3 years ago.
So here are my 2 main problems I am facing:
1. How to bulk load (all records at once) data in .txt format to Teradata using R if there is any way. (So far I only tried using SAS to do so, but I need to explore this in R)
2. The data is big like 500+ MB so I cannot load it through R, I am sure there is a way to go around this but directly pull data from server.
Here is what I tried according to one of posts but this was for MS SQL server:
toSQL = data.frame(...) #this doesn't work for me cause its too big.
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
sqlQuery(channel,"BULK
INSERT Yada.dbo.yada
FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n'
)");
*Note: there is an option in Teradata to insert/import data but that is the same as writing millions of rows of Insert statements.
Sorry that I do not have sample codes at this point since the package that I found wasn't the right one that I should use.
Anyone has similar issues/problems like this?
Thank you so much for your help in advance!
I am not sure if you figured out how to do this but I second Andrew's solution. If you have Teradata installed on your computer you can easily run the FastLoad Utility from the shell.
So I would:
export by data frame to a txt file (Comma separated)
create my fastload script and call the exported txt file from within the fastload scrip (you can learn more about it here)
run the shell command referencing my fastload script.
setwd("pathforyourfile")
write.table(mtcars, "mtcars.txt", sep = ",", row.names = FALSE,quote= FALSE, na = "NA",col.names = FALSE)
shell("fastload < mtcars_fastload.txt")
I hope this sorts your issue. Let me know if you need help especially on the fastloading script. More than happy to help.
I can use sqlSave to create a new table and to append data to that table, but I want the tablet to have some additional columns (for instance, an "ID" autoincrementing column - I am manually adding these columns after creating the table and testing that I can save to it and append to it). As soon as I try to use sqlSave after adding those columns I get an error when trying to use sqlSave to append more data
Error in odbcUpdate... missing columns in 'data'
So I added an ID column to my data frame (so its columns would match my database table) and tried setting it to "NULL", NULL, and "". I keep getting the same error.
Any ideas?
Thanks,
Aerik
P.S. I'm using RODBC with MySQL OOBC driver version 5.1
Ah, I got it. The sqlSave function seems to lowercase everything. I'm not sure what checks it's doing behind the scenes, but if I make an "id" column it works, but an "ID" column does not.
Try the case="nochange" argument in odbcConnect. I'm using RODBC (1.3-10) with MySQL ODBC driver version 5.2 and it works for me.