How to pull column names from multiple tables using R - r

Sorry in advance due to being new to Rstudio...
There are two parts to this question:
1) I have a large database that has almost 6,000 tables in it. Many of these tables have no data in them. Is there a code using R to only pull a list of tables names that have data in them?
I know how to pull a list of all table names and how to pull specific table data using the code below..
test<-odbcDriverConnect('driver={SQL Server};server=(SERVER);database=(DB_Name);trusted_connection=true')
rest<-sqlQuery(test,'select*from information_schema.tables')
Table1<-sqlFetch(test, "PROPERTY")
Above is the code I use to access the database and tables.
"test" is the connection
"rest" shows the list of 5,803 tables names.. one of which is called "PROPERTY"
"Table1" is simply pulling one of the tables named "PROPERTY".
I am looking to make "rest" only show the data tables that have data in them.
2) My ultimate goal, which leads to the second question, is to create a table that shows a list of every table from this database in column#1 and then column 2,3,4,etc... would include every one of the column headers that is contained in each table. Any idea how do to that?
Thanks so much!

The Tables object below returns a data frame giving all of the tables in the database and how many rows are in each table. As a condition, it requires that any table selected have at least one record. This is probably the fastest way to get your list of non-empty tables. I pulled the query to get that information from https://stackoverflow.com/a/14163881/1017276
My only reservation about that query is that it doesn't give the schema name, and it is possible to have tables with the same name in different schemas. So this is likely only going to work well within one schema at a time.
library(RODBCext)
Tables <-
sqlExecute(
channel = test,
query = "SELECT T.name TableName, I.rows Records
FROM sysobjects t, sysindexes i
WHERE T.xtype = ? AND I.id = T.id AND I.indid IN (0,1) AND I.rows > 0
ORDER BY TableName;",
data = list(xtype = "U"),
fetch = TRUE,
stringsAsFactors = FALSE
)
This next part uses the tables you found above and then gets the column information from each of those tables. Lastly, it makes on single data frame with all of the column names.
Columns <-
lapply(Tables$TableName,
function(x) sqlColumns(test, x))
Columns <- do.call("rbind", Columns)
sqlColumns is a function in RODBC.
sqlExecute is a function in RODBCext that allows for parameterized queries. I tend to use that anytime I need to use quoted strings in a query.

Related

Inserting R code output(dataframe )in multiple columns of sql table at once

For example I made an SQL table with column names "Names", "Class", "age" and I have a data-frame which I made using R code:
data_structure1<- as.data.frame("Name")
data_structure1
data_structure2<-as.data.frame("Class")
data_structure2
data_structure3<-as.data.frame("age")
data_structure3
final_df<- cbind(data_structure1,data_structure2,data_structure3)
final_df
#dataframe "Name" contains multiple entries as different names of students.
#dataframe "Class" contains multiple entries as classes in which student are studying like 1,2,3,4 etc.
#dataframe "age" contains multiple entries as ages of students.
I want to insert final_df in my SQL table containing column names as "Names", "Class", "age".
Below is the command which I am using for inserting only one column in one column of SQL table and this is working fine for me.
title<- sqlQuery(conn,paste0("INSERT INTO AMPs(Names) VALUES('",Names, "')"))
title
But this time I want to insert a dataframe of multiple columns in multiple columns of SQL table.
Please help me with this. Thanks in advance.
Using the DBI package and a package for your database (e.g., RPostgres) you can do something like this, assuming the table already exists:
AMPs <- data.frame(name = c("Foo", "Bar"), Class = "Student", age = c(21L, 22L))
conn <- DBI::dbConnect(...) # read doc to set parameters
## Create a parameterized INSERT statement
db_handle <- DBI::dbSendStatement(
conn,
"INSERT INTO AMPs (Name, Class, age) VALUES ($1, $2, $3)"
)
## Passing the data.frame
DBI::dbBind(db_handle, params = unname(as.list(AMPs)))
DBI::dbHasCompleted(db_handle)
## Close statement handle and connection
DBI::dbClearResult(db_handle)
DBI::dbDisconnect(conn)
You may also want to look at DBI::dbAppendTable.
Edit:
The example above works with PostgreSQL, and I see that the OP tagged the question with 'sql-server'. I don't have means for testing it with SQL Server, but the DBI interface aims at being general so I believe it should work. One important thing to note is the syntax, and, in particular, the syntax for the bind parameters. In PostgreSQL the parameters are defined as $1 for the first parameter, $2 for the second and so on. This might be different for other databases. I think SQL Server is using ?, and that that the binding is done based on position - i.e., the first ? is bind to the first supplied parameter, the second ? to the second and so on.

RSQLite dbGetQuery with input from Data Frame

I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

R/ROracle How to get dbListTables from specific SCHEMA

How to get the list of table names from database for certain scheme?
tabellen <- dbListTables(con, all=T)
gives all the tables from database, but i would like to specify the scheme. I read in the ROracle package that i can specify scheme like:
tabellen <- dbListTables(con, schema="K")
However i get an empty character...
when i use sql command:
rs <- dbSendQuery(con, "SELECT * FROM ALL_TABLES WHERE OWNER ='K'")
data <- fetch(rs)
It works but i get a table, not a list what i would prefer too.. Is there a way to get directly the list of tables? [SOLVED] - too much programming..I wrote scheme instead of schema...Thanks for pointing it out, my bad, sorry for that
And additionally how i can get the name of columns for certain table which i choosed [NOT SOLVED]
Thanks for help

Finding out the size of a Netezza table using UNIX SAS

What syntax / tables can be used to determine the size (Gbs) of a Netezza table? I am accessing via UNIX SAS (either ODBC or libname engine). I assume there is a view which will give this info?
So you're interested in two system views _v_obj_relation_xdb and _v_sys_object_dslice_info. The first (_v_obj_relation_xdb) contains the table information (name, type, etc.) and the second (_v_sys_object_dslice_info) contains the size per disk information. You probably want to take a look at both of those tables to get a good idea of what you're really after, but the simple query would be:
select objname, sum(used_bytes) size_in_bytes
from _V_OBJ_RELATION_XDB
join _V_SYS_OBJECT_DSLICE_INFO on (objid = tblid)
where objname = 'UPPERCASE_TABLE_NAME'
group by objname
This returns the size of the table in bytes and I'll leave the conversion to GB as an exercise to the reader. There are some other interesting fields there so you might want to check out those views.
You could also use (_v_sys_object_storage_size )
select b.objid
,b.database as db
,lower(b.objname) as tbl_nm
,lower(b.owner) as owner
,b.objtype
,d.used_bytes/pow(1024,3) as used_gb
,d.skew
,cast(b.createdate as timestamp) as createdate_ts
,cast(b.objmodified as timestamp) as objmodified_ts
from _v_obj_relation_xdb b inner join
_v_sys_object_storage_size d
on b.objid=d.tblid
and lower(b.objname) = 'table name'
The size on disk (used_bytes) represents compressed data and includes storage for any deleted rows in the table.
The table rowcount statistic (reltuples) is generally very accurate, but it is just a statistic and not guaranteed to match the "select count(*)" table rowcount.
You can get this information via a catalog query
select tablename, reltuples, used_bytes from _v_table_only_storage_stat where tablename = ^FOOBAR^;

Resources