Could not run statement: Lost connection to MySQL server during query - r

Transferring an R data frame to a MySQL (MariaDB) database table, I get the following error: Lost connection to MySQL server during query
Example data can be loaded in R with this command
cntxt <- read.delim("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=comext%2FCOMEXT_METADATA%2FCLASSIFICATIONS_AND_RELATIONS%2FENGLISH%2FCN.txt", header = FALSE, quote = "", stringsAsFactors = FALSE)
I use the RMySQL package to transfer the data frame to the the database:
con <- RMySQL::dbConnect(RMySQL::MySQL(), dbname = "test")
RMySQL::dbWriteTable(con, "cntxt", cntxt, row.names = FALSE, overwrite = TRUE)
The database write operation works fine on my laptop for tables of any size. But on a server it returns an error. The error only appears for sufficiently large tables (above 1000 rows):
dbWriteTable() succeeds for 1000 lines of data
RMySQL::dbWriteTable(con, "cntxt", head(cntxt,1000), row.names = FALSE, overwrite = TRUE)
# [1] TRUE
dbWriteTable() fails for 2000 lines of data
RMySQL::dbWriteTable(con, "cntxt", head(cntxt,2000), row.names = FALSE, overwrite = TRUE)
# Error in .local(conn, statement, ...) :
# could not run statement: Lost connection to MySQL server during query
Based on related questions, I have checked the value of max_allowed_packet:
mysql> SHOW VARIABLES LIKE 'max_allowed_packet';
| max_allowed_packet | 16777216 |
16Mb should be more than enough for 2000 lines of data.
Where is the error coming from?
There is nothing visible in mysql the error log /var/log/mysql/error.log.
Server version: 10.1.26-MariaDB-0+deb9u1 Debian 9.1

Replace the RMySQL package with the newer RMariaDB package
install.packages("RMariaDB")
Then tables over 2000 rows can be transferred again.
con <- RMariaDB::dbConnect(RMariaDB::MariaDB(), dbname="test")
cntxt <- read.delim("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=comext%2FCOMEXT_METADATA%2FCLASSIFICATIONS_AND_RELATIONS%2FENGLISH%2FCN.txt", header = FALSE, quote = "", stringsAsFactors = FALSE)
RMariaDB::dbWriteTable(con, "cntxt", cntxt, row.names = FALSE, overwrite = TRUE)

Related

dbAppendTable() error when I try to append data to a local server

I'm just starting my journey with r, so I'm a complete newbie and I can't find anything that will help me solve this.
I have a csv table (random integers in each column) with 9 columns. I read 8 and I want to append them to a sql table with 8 fields (Col1 ... 8, all int's). After uploading the csv into rStudio, it looks right and only has 8 columns:
The code I'm using is:
# Libraries
library(DBI)
library(odbc)
library(tidyverse )
# CSV Files
df = head(
read_delim(
"C:/Data/test.txt",
" ",
trim_ws = TRUE,
skip = 1,
skip_empty_rows = TRUE,
col_types = cols('X7'=col_skip())
)
, -1
)
# Add Column Headers
col_headings <- c('Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6', 'Col7', 'Col8')
names(df) <- col_headings
# Connect to SQL Server
con <- dbConnect(odbc(), "SQL", timeout = 10)
# Append data
dbAppendTable(conn = con,
schema = "tmp",
name = "test",
value = df,
row.names = NULL)
I'm getting this error message:
> Error in result_describe_parameters(rs#ptr, fieldDetails) :
> Query requires '8' params; '18' supplied.
I ran into this issue also. I agree with Hayward Oblad, the dbAppendTable function appears to be finding another table of the same name throwing the error. Our solution was to specify the name parameter as an Id() (from DBI::Id())
So taking your example above:
# Append data
dbAppendTable(conn = con,
name = Id(schema = "tmp", table = "test"),
value = df,
row.names = NULL)
Ran into this issue...
Error in result_describe_parameters(rs#ptr, fieldDetails) : Query
requires '6' params; '18' supplied.
when saving to a snowflake database and couldn't find any good information on the error.
Turns out that there was a test schema where the tables within the schema had exactly the same names as in the prod schema. DBI::dbAppendTable() doesn't differentiate the schemas, so until those tables in the text schema got renamed to unique table names, the params error persisted.
Hope this saves someone the 10 hours I spent trying to figure out why DBI was throwing the error.
See he for more on this.
ODBC/DBI in R will not write to a table with a non-default schema in R
add the name = Id(schema = "my_schema", table = "table_name") to DBI::dbAppendTable()
or in my case it was the DBI::dbWriteTable().
Not sure why the function is not using the schema from my connection object though.. seems redundant.

Import multiple csv files into postgresql database using r (memory error)

I am trying to import a dataset (with many csv files) into r and afterwards write the data into a table in a postgresql database.
I successfully connected to the database, created a loop to import the csv files and tried to import.
R then returns an error, because my pc runs out of memory.
My question is:
Is there a way to create a loop, which imports the files one after another, writes them into the postgresql table and deletes them afterwards?
That way I would not run out of memory.
Code which returns the memory error:
`#connect to PostgreSQL database
db_tankdata <- 'tankdaten'
host_db <- 'localhost'
db_port <- '5432'
db_user <- 'postgres'
db_password <- 'xxx'
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = db_tankdata, host=host_db,
port=db_port, user=db_user, password=db_password)
#check if connection was succesfull
dbExistsTable(con, "prices")
#create function to load multiple csv files
import_csvfiles <- function(path){
files <- list.files(path, pattern = "*.csv",recursive = TRUE, full.names = TRUE)
lapply(files,read_csv) %>% bind_rows() %>% as.data.frame()
}
#import files
prices <- import_csvfiles("path...")
dbWriteTable(con, "prices", prices , append = TRUE, row.names = FALSE)`
Thanks in advance for the feedback!
If you change the lapply() to include an anonymous function, you can read each file and write it to the database, reducing the amount of memory required. Since lapply() acts as an implied for() loop, you don't need an extra looping mechanism.
import_csvfiles <- function(path){
files <- list.files(path, pattern = "*.csv",recursive = TRUE, full.names = TRUE)
lapply(files,function(x){
prices <- read.csv(x)
dbWriteTable(con, "prices", prices , append = TRUE, row.names = FALSE)
})
}
I assume that your csv files are very large that you are importing to your database? According to my knowledge R first want to store the data in a dataframe with the code that you have written, storing the data in memory. The alternative will be to read a CSV file in chunks as you do with Python's Pandas.
When calling ?read.csv I saw the following output:
nrows : the maximum number of rows to read in. Negative and other invalid values are ignored.
skip : the number of lines of the data file to skip before beginning to read data.
Why don't you try to read 5000 rows at a time into the dataframe write to the PostgreSQL database and then do it for each file.
For example, for each file do the following:
number_of_lines = 5000 # Number of lines to read at a time
row_skip = 0 # number of lines to skip initially
keep_reading = TRUE # We will change this value to stop the while
while (keep_reading) {
my_data <- read.csv(x, nrow = number_of_lines , skip = row_skip)
dbWriteTable(con, "prices", my_data , append = TRUE, row.names = FALSE) # Write to the DB
row_skip = 1 + row_skip + number_of_lines # The "1 +" is there due to inclusivity avoiding duplicates
# Exit Statement: if the number of rows read is no more the size of the total lines to read per read.csv(...)
if(nrow(my_data) < number_of_lines){
keep_reading = FALSE
} # end-if
} # end-while
By doing this you are breaking up the csv into smaller parts. You can play around with the number_of_lines variable to reduce the amount of loops. This may seem a bit hacky with a loop involved but I'm sure it will work

Error When Insert data frame using RMySQL on R

Using R, I tried to insert a data frame. My script looks like below:
con <- dbConnect(RMySQL::MySQL(), username = "xxxxxx", password = "xxxxxx",host = "127.0.0.1", dbname = "xxxxx")
dbWriteTable(conn=con,name='table',value=as.data.frame(df), append = TRUE, row.names = F)
dbDisconnect(con)
WHen the script hit below line:
dbWriteTable(conn=con,name='table',value=as.data.frame(df), append = TRUE, row.names = F)
I got an error like below:
Error in .local(conn, statement, ...) : could not run statement: Invalid utf8 character string: 'M'
I am not sure why this error occurred. This is a part of script that has been run well on another machine.
Please advise
You should create proper connection, then only can insert data frame to your DB.
for creating connection username & password, host name & data base name should correct. same code only but i removed some parameter
try this:
mydb = dbConnect(MySQL(), user='root', password='password', dbname='my_database', host='localhost')
i just insert Iris data in my_database
data(iris)
dbWriteTable(mydb, name='db', value=iris)
i inserted iris data frame in the name of db in my_database

Having trouble reading table using sqldf package (R)

Background:
I can successfully pull a particular dataset (shown in the code below) from the internet using the read.csv() function. However, when I try to utilize the sqldf package to speed up the process using read.csv.sql() it produces errors. I've tried various solutions but can't seem to solve this problem.
I can successfully pull the data and create the data frame that I want with read.csv() using the following code:
ce_data <- read.csv("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
fill=TRUE, header=TRUE, sep="")
To test the functionality of sqldf on my machine, I successfully tested read.csv.sql() by reading in the data as 1 variable rather than the 5 desired using the following code:
library(sqldf)
ce_data_sql1 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
sql = "select * from file")
To produce the result that I got using read.csv() but utilizing the speed of read.csv.sql(), I tried this code:
ce_data_sql2 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
fill=TRUE, header=TRUE, sep="", sql = "select * from file")
Unfortunately, it produced this error:
trying URL
'http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData' Content
type 'text/plain' length 24846571 bytes (23.7 MB) downloaded 23.7 MB
Error in sqldf(sql, envir = p, file.format = file.format, dbname =
dbname, : unused argument (fill = TRUE)
I have tried various methods to address the errors, using sqldf documentation and have been unsuccessful.
Question:
Is there a solution where I can read in this table with 5 variables desired using read.csv.sql()?
The reason you are reading it in as a single variable is because you did not correctly specify the separator for the original file. Try the following, where sep = "\t", for tab-separated:
ce_data_sql2 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
sep = "\t", sql = "select * from file")
.
The error you are getting in the final example:
Error in sqldf(sql, envir = p, file.format = file.format, dbname =
dbname, : unused argument (fill = TRUE)
Is due to the fact that read.csv.sql does not accept the fill argument.

Connecting R To Teradata VOLATILE TABLE

I am using R to try and connect to a teradata database and am running into difficulties
The steps in the process are below
1) Create Connection
2) Create a VOLATILE TABLE
3) Load information from a data frame into the Volatile table
Here is where it fails, giving me an error message
Error in sqlSave(conn, mydata, tablename = "TEMP", rownames = FALSE, :
first argument is not an open RODBC channel
The code is below
# Import Data From Text File and remove duplicates
mydata = read.table("Keys.txt")
mydata.unique = unique(mydata)
strSQL.TempTable = "CREATE VOLATILE TABLE TEMP………[Table Details]"
"UNIQUE PRIMARY INDEX(index)"
"ON COMMIT PRESERVE ROWS;"
# Connect To Database
conn <- tdConnect('Teradata')
# Execute Temp Table
tdQuery(strSQL.TempTable)
sqlSave(conn, mydata, tablename = "TEMP ",rownames = FALSE, append = TRUE)
Can anyone help, Is it closing off the connection before I can upload the information into the Table?
My Mistake, I have been confusing libraries
Basically the lines
# Connect To Database
conn <- tdConnect('Teradata')
# Execute Temp Table
tdQuery(strSQL.TempTable)
sqlSave(conn, mydata, tablename = "TEMP ",rownames = FALSE, append = TRUE)
can all be replaced by this
# Connect To Database
channel <- odbcConnect('Teradata')
# Execute Temp Table
sqlQuery(channel, paste(strSQL.TempTable))
sqlSave(channel, mydata, table = "TEMP",rownames = FALSE, append = TRUE)
Now I'm being told, i don't have access to do this but this is another question for another forum
Thanks

Resources