I'm connecting to a PostgreSQL db using R and the RPostgreSQL package. The db has a number of schemas and I would like to know which tables are associated with a specific schema.
So far I have tried:
dbListTables(db, schema="sch2014")
dbGetQuery(db, "dt sch2014.*")
dbGetQuery(db, "\dt sch2014.*")
dbGetQuery(db, "\\dt sch2014.*")
None of which have worked.
This related question also exists: Setting the schema name in postgres using R, which would solve the problem by defining the schema at the connection. However, it's not yet been answered!
Reading this answer https://stackoverflow.com/a/15644435/2773500 helped. I can use the following to get the tables associated with a specific schema:
dbGetQuery(db,
"SELECT table_name FROM information_schema.tables
WHERE table_schema='sch2014'")
The following should work (using DBI_v1.1.1)
DBI::dbListObjects(conn, DBI::Id(schema = 'schema_name'))
While it has all the info you want, it's hard to access and hard to read.
I would recommend something that produces a data frame:
# get a hard to read table given some Postgres connection `conn`
x = DBI::dbListObjects(conn, DBI::Id(schema = 'schema_name'))
# - extract column "table" comprising a list of Formal class 'Id' objects then
# - extract the 'name' slot for each S4 object element
# could also do `lapply(d$table, function(x) x#name)`
v = lapply(x$table, function(x) slot(x, 'name'))
# create a dataframe with header 'schema', 'table'
d = as.data.frame(do.call(rbind, v))
Or in one line:
d = as.data.frame(do.call(rbind, lapply(DBI::dbListObjects(conn, DBI::Id(schema = 'schema_name'))$table, function(x) slot(x, 'name'))))
Or in a more "tidy" way:
conn %>%
DBI::dbListObjects(DBI::Id(schema = 'schema_name')) %>%
dplyr::pull(table) %>%
purrr::map(~slot(.x, 'name')) %>%
dplyr::bind_rows()
OUTPUT is something like
> d
schema table
1 schema_name mtcars
You can use the table_schema option rather than just schema to see a list of tables within the specific schema. So keeping with your example code snippet above, the below line should work:
dbListTables(db, table_schema="sch2014")
I am trying to connect R with Mongo using 2 packages: rmongodb and RMongo.
I would like to create from R mongo query, which is based on index named id.
Id is a 19-number integer e.g 1234567891234567891, which is kept in mongo in numberlong format. Using rmongodb I don't know how to create query, which correctly understands my 19-number index e.g:
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "id", '6120367800331863610')
query <- mongo.bson.from.buffer(buf)
b <- mongo.find.one(mongo, ns=namespace, query)
or
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append.long(buf, "id", 6120367800331863610)
query <- mongo.bson.from.buffer(buf)
b <- mongo.find.one(mongo, ns=namespace, query)
In the first part of code, my query looks like id : 2 6120367800331863610
Id: 2 set data type for string not for numberLong and my query produces no results.
In the second part of my code, the number that I am giving is changed by r to :
id : 18 6120367800331864064. Id is now correct 18 represent NumberLong but number has changed. R has problem with treating such big numbers, I tried to change type for 6120367800331863610 using bit64 but type integer64 is not supported by mongo.bson.buffer.append.long().
My second approach to that problem was to use RMongo package. Using that I was able to get id that I was looking for but I can't use there nested keys:
dbGetQueryForKeys(mongo,namespace,"{'id':6120367800331863610}","{'id': 1, 'data.product': 1}")
Id is correct but for data.product I get null values. When I change key to {'id': 1, 'data': 1} It gives me data.frame with ID and parse data to column, which is time consuming operation because of json-type structure of this part.
I would be grateful for any help.
The dbWriteTable function in RPostgreSQL seems to ignore column names and tries to push data from R to PostgreSQL as-is. This is problematic when appending to existing tables, particularly if there are columns un-specified in the R object that should be given default values.
RMySQL handles this case very gracefully by adding the column names to LOAD DATA LOCAL INFILE. How do I force RPostgreSQL to assign default values to un-specified columns in dbWriteTable when append=TRUE?
Here is an example:
CREATE TABLE test (
column_a varchar(255) not null default 'hello',
column_b integer not null
);
insert into test values (DEFAULT, 1);
Which yields the following table:
select * from test;
column_a | column_b
----------+----------
hello | 1
(1 row)
I want to insert some new data to this table from R:
require('RPostgreSQL')
driver <- PostgreSQL()
con <- dbConnect(driver, host='localhost', dbname='development')
set.seed(42)
x <- data.frame(column_b=sample(1:100, 10))
dbWriteTable(con, name='test', value=x, append=TRUE, row.names=FALSE)
dbDisconnect(con)
But I get the following error:
Error in postgresqlgetResult(new.con) :
RS-DBI driver: (could not Retrieve the result : ERROR: missing data for
column "column_b"
CONTEXT: COPY test, line 1: "92"
)
This is because I have not specified the column_a field, so dbWriteTable is trying to write the data for column_b into column_a. I would like to force dbWriteTable to use the defaults for column_a, and properly write column_b to column_b.
I should only get a failure if:
I fail to specify a column with no default value
I try to insert a column that doesn't exist in the table
I insert the wrong datatype into an existing column
I had exactly the same problem, this fixed it.
Check out the dbWriteTable2 function from package caroline.
The code then allows you to write a data frame without an id column into the database using add_id = TRUE, e.g.
dbWriteTable2(con_psql,"domains",data_domains,append=TRUE,overwrite=FALSE,row.names=FALSE,add.id=TRUE)
I've a data frame where some of the column names are of the format . format. For ex: Company.1
when i'm using that column in a sqldf function it throws an error
data=sqldf(select Company.1 from test)
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near ".1": syntax error)
Any workaround so that i can use the column name as it is?
The dot has another meaning in SQL (e.g., separating table name from column name) and
is replaced by an underscore before sending the data to SQLite.
library(sqldf)
test <- data.frame( "Company.1" = 1:10 )
sqldf( 'SELECT Company_1 FROM test' )
This problem is about the . in your column name, if you change it to Company_1 it works:
data = sqldf("select Company_1 from test")
The solution for the latest update of sqldf is answered here
We only need to write the SQL statement between single quotes, and the
column names including dots between double quotes or
backticks/backquotes interchangeably.
I have a csv file that as ~1.9M rows and 32 columns. I also have limited RAM, which makes it loading into the memory very inconvenient. As a result I am thinking of using a database but do not have any intimate knowledge on the subject and so have have looked around at this site but found no viable solns so far.
The CSV file looks like this:
Case,Event,P01,P02,P03,P04,P05,P06,P07,P08,P09,P10,P11,P12,P13,P14,P15,P16,P17,P18,P19,P20,P21,P22,P23,P24,P25,P26,P27,P28,P29,P30
C000039,E97553,8,10,90,-0.34176313227395744,-5.581162038780728E-4,-0.12090388100201072,-1.5172412910939355,-0.9075283173030568,2.0571877671625742,-0.002902632819930783,-0.6761896565590585,-0.7258602353522214,0.8684602429202587,0.0023189312896576167,0.002318939470525324,-0.1881462494296103,-0.0014303471592995315,-0.03133299206977217,7.72338072867324E-4,-0.08952068388668191,-1.4536398437657685,-0.020065144945600275,-0.16276139919188118,0.6915962670997067,-1.593412697264055,-1.563877781707804,-1.4921751129092755,4.701551108078644,6,-0.688302560842075
C000039,E23039,8,10,90,-0.3420173545012358,-5.581162038780728E-4,-1.6563770995734233,-1.5386562526752448,-1.3604342580422861,2.1025445031625525,-0.0028504751366762804,-0.6103972392687121,-2.0390388918403284,-1.7249948885013526,0.00231891181914203,0.0023189141684282384,-0.18603688853814693,-0.0014303471592995315,-0.03182759137355937,0.001011754948131039,0.13009444290656555,-1.737249614361576,-0.015763602969926262,-0.16276139919188118,0.7133868949811379,-1.624962995908364,-1.5946762525901037,-1.5362787555380522,4.751479927607516,6,-0.688302560842075
C000039,E23039,35,10,90,-0.3593468363273839,-5.581162038780728E-4,-2.2590624066428937,-1.540784192984501,-1.3651511418164592,0.05539868728273849,-0.00225912499740972,0.20899232681704485,-2.2007336302050633,-2.518401278903022,0.0023189850665203673,0.0023189834133465186,-0.1386548782028836,-0.0013092574968056093,-0.0315006293688149,9.042390365542781E-4,-0.3514180333671346,-1.8007561969675518,-0.008593259125791147,-2.295351187387221,0.6329101442826701,-1.8095530459660578,-1.7748676145152822,-1.495347406256394,2.553693742122162,34,-0.6882806822066699
....
....
upto 1.9 M rows
As you can see the 'Case' column repeats itself but I want to only get unique records before importing it into a dataframe. So i used this:
f<-file("test.csv")
bigdf <- sqldf("select * from 'f' where Case in (select Case from 'f' group by Case having count(*) = 1)", dbname = tempfile(), file.format = list(header = T, row.names = F))
However I get this error:
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "in": syntax error)
Is there something obvious I am missing here.
Much thanks in advance.
CASE is a keyword, so you have to quote this column name as "Case" in your query.
For those who want unique rows using sqldf, use DISTINCT:
newdf <- sqldf("SELECT DISTINCT * FROM df") # Get unique rows
sqldf uses SQLite Syntax by default.
newdf <- sqldf("SELECT DISTINCT name FROM df") # Get unique column values
newdf <- sqldf("SELECT *, COUNT(DISTINCT name) as NoNames FROM df GROUP BY whatever") # Get a count of unique names
if you use "Case" in sqldf in R, you should put a "," before "Case". Because the "Case" query is the whole line, you should make it seperate.