VERTICA: changing projections - projection

I need to change super-projections in table (wrong order and segmentation).
So how I try to do it.
Rename existing projections:
ALTER PROJECTION schema.table_b0 RENAME TO table_b0_2;
ALTER PROJECTION schema.table_b1 RENAME TO table_b1_2;
Create new projections:
CREATE PROJECTION schema.table
as select * from schema.table
order by ...
segmented by hash (...) all nodes;
Refresh:
select refresh('schema.table');
Drop old ones:
DROP PROJECTION table_b0_2;
DROP PROJECTION table_b1_2;
I guess it's almost what I want, but...
I get two projections with suffixes "_b0" and "_b1", but usually (if creation table have been done right) there is two projections with suffixes "_b0" and "_super". Why?
After that projection creation I can't drop table without cascade parameter in drop table statement. So I kinda get my projections as separeted objects. Is the anything I can fix that (to create projections as though they were created with rightcreate table statement from the start)?

It's just a name. When it creates a default superprojections, it names it _super. (I think this naming convention is new though). Default projections are not going to be optimal, and you'll want to replace them using DBD. I'm assuming you are already aware of b0 vs b1 and ksafety.
Default projects are objects you did not explicitly create. You can tell that these are default projections by looking at the projections view. They will say DELAYED CREATION. Since you did not explicitly create them, Vertica allows them to be dropped with the table. If, however, you explicitly create a projection by hand or using DBD, it will require that you drop those first or use cascade.
A couple of notes. First, the create projection should be the projection name, not the table name. You'll get an object already exists error if they are the same name. Second, before you drop the old projections, you may need to move the ancient history marker with select make_ahm_now();.
Hope this helps.

Related

Rails - Create and operate on a temporary table?

My background is in data science with R, but in my current position I'm pulling data through Rails and ActiveRecord. I want to perform transformations to my data and create new columns and save it in a temporary way that allows me to continue querying it like a regular table, but without actually making changes to the database.
In R, this might look something like:
new_table <- old_table[old_table$date >= '2020-01-01']
new_table$average <- mean(new_table$value)
I would take this new_table and perform any number of queries I could have done to the old_table, and once I close my app I expect this temporary table to be removed as well.
This particular transformation is simple and wouldn't require a new table, but for example, there are a number of tables I'd like to join with my new_table. It would be easier if I could perform my transformations once and then join it, rather than joining the old_table and performing the transformation each time.
Since your question is vague I'll give a general answer that might not fit your use but it's a best guess at this point. There are numerous ways to use the DB connection in Rails to query directly, as referenced in the link in my comments above. But as an experiment I wanted to see if this would work and it does, at least with a project that is using Postgres. I wanted it to be DB agnostic so I'm avoiding calling the DB connection directly...
First create a temporary class in the Rails console:
rails c
Loading development environment (Rails...
class MyTempTable < ActiveRecord::Base
end
=> nil
EDIT:
In addition to the method below, you can also do this to create the table:
MyTempTable.find_by_sql('create temp table temp_tables AS select...')
This will create the temp table directly from a query. You could then use a join statement if you wanted data from more than one table in the new temp table, and you can add any additional columns you want
End Edit
Now you have a class that will act like a table with the usual ActiveRecord methods. Rails now assumes there is a table in the DB called my_temp_tables (must be plural). You can then create a temp table (if your DBMS supports temp tables) like this:
MyTempTable.find_by_sql('create temp table my_temp_tables(col1, col2... ')
Now you have a temp table with the columns you want. You can then do SQL operations using
MyTempTable.find_by_sql('INSERT INTO my_temp_tables SELECT * FROM ....')
You can then treat MyTempTable like any other model in Rails. If you wanted all the columns from one table joined with some columns from another table you can create the temp table as above, you just have to create all the columns first (at least in Postgres, in MSSQL you can probably create the temp table inserting directly from a select => join statement). If you are new to Rails you can grab column names by doing this on existing tables:
some_columns = SomeTable.column_names
=> ["id", "name", "serial", "purchased", ...]
Now you have an array of the column names so you don't have to type all of them. You can list out the columns you want from the various tables, cut and past them into the create temp table... statement, then INSERT the joined data into MyTempTable
If you do much of this regularly you'll probably want to keep a listing of all your column names in an text file. You can also create Rake tasks that do all of this and save the data to some format, or send it off to where ever it is supposed to go. That way you can have it all in a file that you can just run and it will create the temp tables, do the work, and then when it closes out the temporary classes and tables go away.
You might want to investigate some Ruby Gems, there are probably existing gems that do some of what you want. But as a proof of concept this works. You could also spin up a local Rails app and use scripting to import the data you want into tables, then just flush and recreate it at will.
Any Rails gurus that know of a better way, please add an answer or edit this one. This is mostly a thought experiment for me since I wanted to see if it was possible.
If you want to create views that you can access later on you could use a gem like https://github.com/scenic-views/scenic
Or something like this might be of interest: https://github.com/igorkasyanchuk/rails_db
Sounds like you're keen on the benefits of having some structure and tools available to work on the data, but don't want the data persisted in a db table.
Maybe use a model without a table like this.

filter pushdown using spark-sql on map type column in parquet

I am trying to store my data in nested way in parquet and using map type column to store complex objects as values.
If somebody could let me know whether filter push down works on map type of columns or not.For example below is my sql query -
`select measureMap['CR01'].tenorMap['1M'] from RiskFactor where businessDate='2016-03-14' and bookId='FI-UK'`
measureMap is a map with key as String and value as a custom data type containing 2 attributes - String and another map of String,Double pair.
I want to know whether pushdown will work on map or not i.e if map has 10 key value pairs , Spark will bring whole map's data in memort and create the object model or it will filter out the data depending upon the key at I/O read level.
Also I want ot know is there is any way to specify key in where clause, something like - where measureMap.key = 'CR01' ?
The short answer is No. Parquet predicate pushdown doesn't work with mapType columns or for the nested parquet structure.
Spark catalyst optimizer only understands the top level column in the parquet data. It uses the column type, column data range, encoding etc to finally generate the whole stage code for the query.
When the data is in a MapType format it is not possible to get this information from the column. You could have hundreds of key-value pair inside a map which is impossible with current spark infrastructure to do a predicate pushdown.

EMC Documentum DQL - How to delete repeating attribute

I have a few objects created on my database and I need to delete some of the repeating attributes related to them.
The query I'm trying to run is:
UPDATE gemp1_product objects REMOVE ingredients[1] WHERE (r_object_id = '08015abd8002cd68')
But all I get is the folloing error message:
Error querying databse.
[DM_QUERY_E_UPDATE_INDEX]error: "UPDATE: Unable to REMOVE tghe attribute ingredients at index 1."
[DM_OBJECT_W_DELETE_ATTR_POSITION_ERROR]warning: "attempt to delete
non-existent attribute 88"
Object 08015abd8002cd68 exists and I can see it on the database. Queries like SELECT and DELETE work fine but I do not want to delete the whole object.
There is no easy way to do this. The reason is that repeating attributes are ordered, to enable multiple repeating attributes to be synchronized for a given object.
Either
set the attribute value to be empty for the given position, and change your code to discard empty attributes, or
use multiple DQL statements to shuffle the order so that the last one becomes empty, or
change your data model, e.g. use a single attribute as a property bag with pre-defined delimiters.
Details (1)
UPDATE gemp1_product OBJECTS SET ingredients[1] = '' WHERE ...
Details (2)
For each index; first find the value of index+1:
SELECT ingredients
FROM gemp1_product
WHERE (i_position*-1)-1 = <index+1>
ENABLE (ROW_BASED)
Use the value in a new query:
UPDATE gemp1_product OBJECTS SET ingredients[1] = '<value_from_above>' WHERE ...
It should also be possible to do this by nesting DQL somehow, but it might not be worth the effort.
Something is either wrong with your query or with your repository. I think you are mistyping your attribute name or using wrong index in your UPDATE query.
If you google for DM_OBJECT_W_DELETE_ATTR_POSITION_ERROR you'll see on this link a bit more detailed explanation:
CAUSE: Program executed a DeleteAttr operation that specified an non-existent attribute position (either a negative number or a number larger than the number of attributes in the object).
From this you could guess that type isn't in consistent state, or that you are trying to remove too big index of your repeating attribute, etc. Did you checked your repository with Consistency checker Job and other similar Jobs?
As of for the removing of repeating property (sttribute) value with DQL query, this is unachievable with single query since you need to specify index position which you don't know at first. But writing a simple script or doing it manually if it's not big amount of values to delete is the way you want to go.

Data from Multiple Data Sources in One Column in Grid

I've been thrown quite the scenario today. Essentially, I have one table (ProjTransPosting) that houses records, and that table relates to a number of similarly structured tables (ProjCostTrans, ProjRevenueTrans, etc). They relate by TransId, but each TransId will relate to only one of the number of child tables (meaning if a TransId of 137 exists in ProjCostTrans, there cannot be a TransId of 137 in ProjRevenueTrans). The schemas of the children tables are identical.
So, my original thought was to create a Map and create the mappings from the various children tables. And then I would use this Map as a datasource in the form so everything can show up in one column. I created all the relationships between the Map and the children table along with the relation to the parent table. I put Map in the form as a datasource and this caused a blank Grid, although I don't know why. Is it the case that the Map object can only by of one table type at any given time? I thought the purpose of this was that it could be universal and act as a buffer to many record types. I'd like to pursue this route as this definitely would achieve what I'm looking for.
In failing this I was forced to arrange my Data Source to perform something like this: SELECT ProjTransPosting LEFT JOIN ProjCostTrans LEFT JOIN ProjRevenueTrans ... The problem with this is, each child table I add-on, it's creating additional columns, and the values of the other columns are all NULL (blank in AX). So I have something like this:
Parent.TransId ChildA.Field ChildB.Field ChildC.Field
1 NULL 1256 NULL
2 1395 NULL NULL
3 NULL 4762 NULL
4 NULL NULL 1256
Normally, the user would deal with the annoyance of having the extra columns show up, but they want to also be able to filter on the fields in all the children tables. My example above, they want to be able to filter "1256" and the results would return TransIds 1 and 4, but obviously since the values in this case are spread out in multiple columns, this cannot be done by the user.
Ideally the Map would "combine" these columns into one and then the user could filter easily on it. Any ideas on how to proceed with this?
Try creating a union query and then a view based on that query.
Maps are supposed to be used only in X++, and not as data sources in forms.
This sounds like the exact purpose of table inheritance in AX 2012.
http://msdn.microsoft.com/en-us/library/gg881053.aspx
When to use:
http://msdn.microsoft.com/en-us/library/gg843731.aspx
EDIT: Adding my comments here to make this a more full answer.
Let's say you have three tables TabPet, TabPetCat, TabPetDog, where TabPet is the supertype table and the others are decedents.
If you insert two records each into TabPetCat and TabPetDog (4 total), they will all have unique recIds. Let's say TabPetCat gets 5637144580 and 5637144581. TabPetDog gets 5637144582, and 5637144583.
If you open TabPet, you will see 5637144580, 5637144581, 5637144582, and 5637144583.
So what you would do is make your table ProjTransPosting the supertype and then ProjCostTrans, ProjRevenueTrans, etc descendant tables. Unless transId is really necessary, you could just get rid of it and only use RecId.

Determine flyway variables from earlier SQL step

I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.

Resources