My background is in data science with R, but in my current position I'm pulling data through Rails and ActiveRecord. I want to perform transformations to my data and create new columns and save it in a temporary way that allows me to continue querying it like a regular table, but without actually making changes to the database.
In R, this might look something like:
new_table <- old_table[old_table$date >= '2020-01-01']
new_table$average <- mean(new_table$value)
I would take this new_table and perform any number of queries I could have done to the old_table, and once I close my app I expect this temporary table to be removed as well.
This particular transformation is simple and wouldn't require a new table, but for example, there are a number of tables I'd like to join with my new_table. It would be easier if I could perform my transformations once and then join it, rather than joining the old_table and performing the transformation each time.
Since your question is vague I'll give a general answer that might not fit your use but it's a best guess at this point. There are numerous ways to use the DB connection in Rails to query directly, as referenced in the link in my comments above. But as an experiment I wanted to see if this would work and it does, at least with a project that is using Postgres. I wanted it to be DB agnostic so I'm avoiding calling the DB connection directly...
First create a temporary class in the Rails console:
rails c
Loading development environment (Rails...
class MyTempTable < ActiveRecord::Base
end
=> nil
EDIT:
In addition to the method below, you can also do this to create the table:
MyTempTable.find_by_sql('create temp table temp_tables AS select...')
This will create the temp table directly from a query. You could then use a join statement if you wanted data from more than one table in the new temp table, and you can add any additional columns you want
End Edit
Now you have a class that will act like a table with the usual ActiveRecord methods. Rails now assumes there is a table in the DB called my_temp_tables (must be plural). You can then create a temp table (if your DBMS supports temp tables) like this:
MyTempTable.find_by_sql('create temp table my_temp_tables(col1, col2... ')
Now you have a temp table with the columns you want. You can then do SQL operations using
MyTempTable.find_by_sql('INSERT INTO my_temp_tables SELECT * FROM ....')
You can then treat MyTempTable like any other model in Rails. If you wanted all the columns from one table joined with some columns from another table you can create the temp table as above, you just have to create all the columns first (at least in Postgres, in MSSQL you can probably create the temp table inserting directly from a select => join statement). If you are new to Rails you can grab column names by doing this on existing tables:
some_columns = SomeTable.column_names
=> ["id", "name", "serial", "purchased", ...]
Now you have an array of the column names so you don't have to type all of them. You can list out the columns you want from the various tables, cut and past them into the create temp table... statement, then INSERT the joined data into MyTempTable
If you do much of this regularly you'll probably want to keep a listing of all your column names in an text file. You can also create Rake tasks that do all of this and save the data to some format, or send it off to where ever it is supposed to go. That way you can have it all in a file that you can just run and it will create the temp tables, do the work, and then when it closes out the temporary classes and tables go away.
You might want to investigate some Ruby Gems, there are probably existing gems that do some of what you want. But as a proof of concept this works. You could also spin up a local Rails app and use scripting to import the data you want into tables, then just flush and recreate it at will.
Any Rails gurus that know of a better way, please add an answer or edit this one. This is mostly a thought experiment for me since I wanted to see if it was possible.
If you want to create views that you can access later on you could use a gem like https://github.com/scenic-views/scenic
Or something like this might be of interest: https://github.com/igorkasyanchuk/rails_db
Sounds like you're keen on the benefits of having some structure and tools available to work on the data, but don't want the data persisted in a db table.
Maybe use a model without a table like this.
Related
I am looking to see if there is a way to format conditional values in batch instead of manually typing. For example, I am filtering on 5 digit codes in SQL, my source of the codes is in Excel in list form. There can be hundreds of codes to add to a SQL WHERE statement to filter on, is there tool or formatting methods the will take a list of values and format with single quotes and comma separation?
From this:
30239
30240
30241
30242
To this:
'30239',
'30240',
'30241',
'30242',
...
Then, these formatted values can be pasted into the WHERE clause instead of manually typing all of this out. Again, this is for hundreds of values...
I used to use BrioQuery that had functionality to import text files to be used in filtering, but my current qry tool, TOAD Data Point does not seem to have this.
Thank you
Look into SQL*Loader. Create s staging table to contain the imported values. Use loader to populate the stage table. Then modify your query to reference the stage table; it becomes something like:
Select ...
where target_column_name in (select column_name from stage_table).
The structure "where in ( select)" may not be the best for performance, but once loaded you will have all the facilities SQL offers at your disposal.
It has been a few years since I've used TOAD but as I remember it has an import functionality. There are other tools for loading data into Excel into Oracle. SQL*Loader just happens to be the one Oracle supplies with the RDBMS.
I'm trying to reprocess ga_sessions_yyyymmdd data but am finding the ga_sessions never used to have a field called [channelGrouping] but it does in more recent data.
So my jobs work fine for the latest version of ga_sessions but when i try reprocess earleir ga_sessions data the job fails as it's missing the [channelGrouping] field.
Obviously usually this is what you want, but in this case it's not. I want to make sure i'm sticking to the latest ga_sessions schema and would like the job to just set missing cols to null for when they did not exist.
Is there any way around this?
Perhaps i need to make an empty table called ga_sessions_template_latest and union it on to whatever ga_sessions_ daily table i'm handling - maybe this will 'upgrade' the old ga_sessions to the new structure.
Attached is a screenshot of exactly what i mean (my union idea will actually be horrible due to nested fields in ga_sessions).
I don't have such a script yet. But since the tables are under your project you are able to update them. You can write a script and update the schema on all tables with missing columns from the most recent schema set.
I envision a script that gets most recent table schema.
Then goes back one by one to past tables, does a compare, identifies the missing columns, defines them as not required and nullable, and reads the schema + applies the additional columns and runs the update on the table. Data won't be modified, you will have just additional columns with null values.
you can try out for some also from the Web UI.
I imported a table with thousands of Equipments. Then imported another table with types of equipments, which contain around 20 types.
When I wrote the cypher query below to associate them, Neo4j warned me about a cartesian product. Is there a better way to create the associations? Should I have done it during the CSV import?
MATCH (te:Equipment_Type),(e:Equipment)
WHERE te.type_id = e.type_id
CREATE (e)-[:TYPE_OF]→(te)
Update
I tryed what Brian sugested, during the CSV import, and worked like a charm.
Imported the Equipment Types first;
Then created and index on Equipment(type_id);
Modified the code to search during CSV import.
From Neo4j Console:
Added 100812 labels, created 100812 nodes, set 414307 properties,
created 100812 relationships, statement executed in 33902 ms.
The Code:
CREATE INDEX ON :Equipment(type_id)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "http://localhost/Equipments.csv" AS row
MERGE (e:Equipment {eqp_id: row.eqp_id, name: row.name, type_id: row.type_id})
WITH e, row
MATCH (te:Equipemnt_Type)
WHERE te.type_id = row.type_id
CREATE (e)-[:TYPE_OF]->(te)
With the size of data that you're talking about it's not a big deal, especially if you have indexes on Equipment_Type:type_id and Equipment:type_id. It's warning you because a cartesian project in a query can seem quick when you first write it on a small dataset and then grow quickly as you get more data.
But yes, creating the relationships during the CSV import would be the best way to approach it, probably.
I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.
I need to modify a column in a SQLite database but I have to do it programatically due to the database already being in production. From my research I have found that in order to do this I must do the following.
Create a new table with new schema
Copy data from old table to new table
Drop old table
Rename new table to old tables name
That seems like a ridiculous amount of work for something that should be relatively easy. Is there not an easier way? All I need to do is change a constraint on a existing column and give it a default value.
That's one of the better-known drawbacks of SQLite (no MODIFY COLUMN support on ALTER TABLE), but it's on the list of SQL features that SQLite does not implement.
edit: Removed bit that mentioned it may being supported in a future release as the page was updated to indicate that is no longer the case
If the modification is not too big (e.g. change the length of a varchar), you can dump the db, manually edit the database definition and import it back again:
echo '.dump' | sqlite3 test.db > test.dump
then open the file with a text editor, search for the definition you want to modify and then:
cat test.dump | sqlite3 new-test.db
As said here, these kind of features are not implemented by SQLite.
As a side note, you could make your two first steps with a create table with select:
CREATE TABLE tmp_table AS SELECT id, name FROM src_table
When I ran "CREATE TABLE tmp_table AS SELECT id, name FROM src_table", I lost all the column type formatting (e.g., time field turned into a integer field
As initially stated seems like it should be easier, but here is what I did to fix. I had this problem b/c I wanted to change the Not Null field in a column and Sqlite doesnt really help there.
Using the 'SQLite Manager' Firefox addon browser (use what you like). I created the new table by copying the old create statement, made my modification, and executed it. Then to get the data copied over, I just highlighted the rows, R-click 'Copy Row(s) as SQL', replaced "someTable" with my table name, and executed the SQL.
Various good answers already given to this question, but I also suggest taking a look at the sqlite.org page on ALTER TABLE which covers this issue in some detail: What (few) changes are possible to columns (RENAME|ADD|DROP) but also detailed workarounds for other operations in the section Making Other Kinds Of Table Schema Changes and background info in Why ALTER TABLE is such a problem for SQLite. In particular the workarounds point out some pitfalls when working with more complex tables and explain how to make changes safely.