How to import 11 million row table into Rstudio from Google BigQuery? [code included] - r

I am trying to do some data exploration for this dataset I have. The table I want to import is 11 million rows. Here is the script and output
#Creating a variable for our BQ project space
project_id = 'project space'
#Query
Step1 <-
"
insertquery
"
#Executing the query from the variable above
Step1_df <- query_exec(Step1, project = project_id, use_legacy_sql = FALSE, max_pages = Inf,page_size = 99000)
Error:
Error in curl::curl_fetch_memory(url, handle = handle) :
Operation was aborted by an application callback
Is there a different bigquery library I can use ? Looking to also speed up the upload time .

Related

Insert R dataframe into SQL (RODBC) - error table not found

I would like to drop my whole dataframe from R preferably using RODBC with sqlSave statement (not sqlQuery). Here is my sample code.
library(RODBC)
myconn <- odbcDriverConnect("some connection string")
mydf <- data.frame(col_1 = c(1,2,3), col_2 = c(2,3,4))
sqlSave(myconn, mydf, tablename = '[some_db].[some_schema].[my_table]', append = F, rownames = F, verbose=TRUE)
odbcClose(myconn)
After I execute it, I get back error message:
Error in sqlColumns(channel, tablename) :
‘my_table’: table not found on channel
When I check in SQL Server, an empty table is present.
If I run the same code again, I get error message:
Error in sqlSave(myconn, mydf, tablename = "[some_db].[some_schema].[my_table]", :
42S01 2714 [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]There is already an object named 'my_table' in the database.
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE [some_db].[some_schema].[my_table] ("col_1" float, "col_2" float)'
Any suggestions on how to troubleshoot?
UPDATE
In SSMS I can run the following commands successfully:
CREATE TABLE [some_db].[some_schema].[my_table] (
test int
);
drop table [some_db].[some_schema].[my_table]
Here are details of connection string:
Driver=ODBC Driver 17 for SQL Server; Server=someserveraddress; Uid=user_login; Pwd=some_password
To avoid the error, you could specify the database in the connection string:
Driver=ODBC Driver 17 for SQL Server; Server = someserveraddress; database = some_db; Uid = user_login; Pwd = some_password
and avoid using brackets:
sqlSave(myconn, mydf, tablename = 'some_schema.my_table', append = F, rownames = F, verbose=TRUE)

Python sqlite3. Writing to a table from pandas read_csv results in error: Incomplete input

I'm running on Python 3.8 and have the following code:
conn = sqlite3.connect('feedback.db')
c = conn.cursor()
read_feedback = pd.read_csv (r'C:\jup\feedback.csv', parse_dates=['Created'])
read_feedback.to_sql('FEEDBACK_IN', conn, if_exists='append', index = False) # Insert the values from the csv file into the table 'FEEDBACK_IN'
c.execute('''INSERT INTO FEEDBACK_IN (Feedback, Created, User)''')`
The table has been created using this code:
c.execute('''CREATE TABLE FEEDBACK_IN
([generated_id] INTEGER PRIMARY KEY,[Feedback] text, [Created] date, [User] text)''')
The sample from the csv:
Feedback,Created,User
Quick help. Thank you!,29/12/2020,mailaddress#domain.com
The dataframe from csv looks ok:
Feedback Created User
0 Quick help. Thank you! 2020-12-29 mailaddress#domain.com
When I reach c.execute('''INSERT INTO FEEDBACK_IN (Feedback, Created, User)''') I get get an error thrown:
----> 1 c.execute('''INSERT INTO FEEDBACK_IN (Feedback, Created, User)''')
OperationalError: incomplete input
I have played with the date formats, sequence of fields, now lost.

Write to Snowflake VARIANT column from R

I am trying to load data to snowflake using the following code, but getting an error.
con <- DBI::dbConnect(
drv = odbc::odbc(),
driver = "SnowflakeDSIIDriver",
server = "<>",
authenticator = 'externalbrowser',
warehouse = "<>",
database = "<>",
UID = "<>",
role = "<>"
)
DBI::dbAppendTable(con, name = DBI::Id(schema = "<>", table = "<>"), value = tmp[1:2,])
tmp was downloaded from Snowflake, the same table using RStudio:
```{sql connection=con, output.var = 'tmp'}
select top 10 *
FROM <>
```
The error seems to be stemming from a VARIANT column where I store a JSON string.
Error in new_result(connection#ptr, statement, immediate) :
nanodbc/nanodbc.cpp:1374: 22000: SQL compilation error:
Expression type does not match column data type, expecting VARIANT but got VARCHAR(2) for column FEATURES
I had this once and it was an invalid JSON (missing brackets somewhere). Probably this helps.

Updating a table With a Dataframe

I'm using data in a dataframe to try and update a table in an sqlite database that looks like
Part | Price
------------
a | 5
b | 9
I am getting a syntax error for this
for(row in 1:nrow(newdata)){dbGetQuery(conn=db,"UPDATE Parts SET Price = ",newdata$Price[row], " WHERE Part = '", newdata$Part[row],"';")}
The exact error I'm getting:
Error in rsqlite_send_query(conn#ptr, statement) : near " ": syntax error
Why is this please?
The query string needs to be built into a single string
for(row in seq_len(nrow(newdata))) {
dbGetQuery(conn=db, sprintf("UPDATE Parts SET Price = %i WHERE Part = '%s';", newdata$Price[row], newdata$Part[row]))
}
It's also possible to accomplish this with paste or paste0, but sprintf can be easier to read.

Programmatically building SQL Query R/Shiny/RODBC

I'm building a SQL Query statement using inputDateRange() in R/Shiny. My issue is in handling various strings to include the dates into the WHERE condition of the SQL:
Here is my code:
t.query <- paste0("Select [sensor_name], [temperature] from [dbo].
[temperature_sensor] where network_id = '24162' and date > "
, sQuote(format(input$my.dateRange[1], format="%d-%m-%Y"))
, " and date < "
, sQuote(format(input$my.dateRange[2], format="%d-%m-%Y"))
)
Now the statement closes with a single quote and I receive the error below:
42000 102 [Microsoft][ODBC Driver 13 for SQL Server][SQL
Server]Incorrect syntax near '‘'. [RODBC] ERROR: Could not
SQLExecDirect 'Select [sensor_name], [temperature] from
[dbo].[temperature_sensor] where network_id = '24162' and date >
‘18-09-2017’ and date < ‘22-09-2017’'
I need to close the string with " as I started it in "select ...., I tried to explicitly add """ or dQuote("") to concatenate " but I'm still encountering an error.
Any advice is highly appreciated?
I'd recommend using RODBCext, which will allow you to parameterize your query as
library(RODBCext)
channel <- odbcConnect(...) # make your connection object here
Data <-
sqlExecute(channel = channel,
query = "Select [sensor_name], [temperature]
from [dbo].[temperature_sensor]
where network_id = ? and date between ? and ?",
data = list('24162',
format(input$my.dateRange[1],
format = "%Y-%m-%d"),
format(input$my.dateRange[2],
format = "%Y-%m-%d")),
fetch = TRUE,
stringsAsFactors = FALSE)
This approach has a lot of advantages, including removing the frustration of matching quotes (which you shouldn't do because of the next reason), and protecting your data against SQL injection.

Resources