Duplicate column issue in migration scripts in .net microservice module. I have to resolve duplicate column in my migration which already executed - asp.net

In .net core microservices, another team works on open source modules, and I extend their modules in my project. I already added one column in a Entity and then same column is added by open source team. Now duplicate column error is showing.
I can not alter open source migration files and my column is already in production.
How to resolve this issue please suggest.

i think we are dealing with multiple smells here.
each microservice should have its own Datarealm
by the sound of it you extended a Table of an object which was generated by the opensource service
But this does not help you, the question is can you merge the Data which is in this column.
If so you might get away with a simple copy/update.
What you need todo is:
Create a new Column With a different Name
Copy your existing Data into the new column
Drop your column
Execute your Migration
Copy your Data back into the column
Test your application very carefully, if the logic which you have implemented for the from you generated Data works as intended
Drop your backup column
Depending on the amount of Data this will lead to a downtime, so plan ahead and have a rollback strategy ready if something goes wrong.
Personal Opinion to Prevent those Smells
Every time i needed an opensource project in my projects in the past i wrote a wrapper around it, this has multiple benefits.
For one if the api of the project changes you only have to update it in exactly one place, which improves the maintainability.
Because it has a wrapper it automatically gets a own Database and if i need to extend an entity which i get from one of those opensource projects, i usually do it via Foreign Keys with a different Table. Which then gets linked via a view.
Yes this costs some performance, but in the end it was worth it every time.

Related

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column to the Iceberg table. To do that I have to write a ALTER TABLE statement, save it to the git repository and deploy via CI/CD pipeline. Tables are accessible via AWS Glue Catalog.
I couldn't find to much info about this in google so if anyone could share some knowledge, it would be much appreciated.
Cheers.
Version control of Iceberg tables.
Agree with #Fokko Driesprong. This is a supplement only.
Sometimes, table changes are considered as part of task version changes. That is, table change statements, ALTER TABLE, are bound to task upgrades.
Tasks are sometimes automatically deployed. So it often executes a table change statement first, and then deploys a new task. If the change is disruptive, then we need to stop the old task first and then deploy the new one.
Corresponding to the upgrade, we also have a rollback script, of course, the corresponding table change statement.
thanks for asking this question. I don't think there is a definitive way of doing this. In practice I see most people bundling this as part of the job that writing to the Iceberg table. This way you can make sure that new columns are populated right away with the new version of the job. If you don't do any breaking changes (such as deletion of column), then the downstream jobs won't break. Hope this helps!

Can we compare the contents of two folders in spotfire?

I have two environments. One is development and another is production. Lets say I have folder in production which has all my metadata like ILs, joins, DS, Analysis, scripts etc. Now in development I have the same folder but with new enhancements done.
Now, I want to compare that what are the changes that have been done and as per the result I will be able to understand the impact.
So, could you please tell me that what is way to compare that two folders of development and production environment?
For the requirements posted here, you can create information link on top of LIB_ITEMS table to fetch details of library item details from Spotfire database. An another set of activity is performed at this link, but approach can be used for your requirements as well.

How to avoid code duplication for non-data structures (views, stored procedures etc)

My project contains a lot of objects like views and stored procedures which are being changed quite frequently. Now I have to create new SQL script on every update which contains complete source code of changed objects despite I've actually changed only few rows. It leads to massive code duplication and I also found it difficult to review these changes.
I'd like to have only one actual version of SQL script for every object like view or procedure and recreate these objects every time I redeploy the database. As result I could change existing source file (like in Java or C programming) instead of creating a new update every time I need to alter view or procedure.
Is there a possibility to execute some scripts every time I migrate the database with Flyway?
I'm not sure why that got so many downvotes, it's a perfectly understandable and valid question. Perhaps it's because it closely resembles this open question:
Migrating Stored Procedures with Flyway
We are actually starting to push against this issue now. We've been using flyway for development and testing (and love it). But we've come to a point where we're starting to have to use procs/triggers/views (p/t/v's) and the fundamental disconnect between how we did it before, and how we must use flyway, is starting to be a strain.
Before, for a given database object (let's say it's a procedure), there'd be one source file. And if you needed to change the proc 'n' times, there would be 'n' versions of the same file in your VCS. Diff tools work great, IDE's all understand this, merges detect when two developers working in separate branches make changes to the proc, etc, etc. You know, old school.
But with flyway, any one proc with 'n' changes is now scattered across 'n' files. Instead of "one object in one file with 'n' versions", you have "one objecct in 'n' files with one change each". I now need to do a text search in my IDE for any instance of "proc_name" if I want to know the history of changes to the proc. The VCS knows nothing about it. Devs can each make a migration in their own branches that succeed when each is deployed, but leave the proc with a missing update.
I'm not saying any of this to complain about flyway, and I fully realize it's not a simple area. I'd almost say it's unsolveable (by flyway).
We're scheming how to handle this problem, and I'd be very interested to know how others have handled it.
Repeatable migrations are supported by Flyway 4.0, now.
Just add sql files starting with "R" without any version information to your migration folder:
R__Blue_cars.sql
You have to ensure, that the script could be repeatable migrated.
This is usually done by "CREATE OR REPLACE" clauses in your DDL statements.
https://flywaydb.org/documentation/migration/repeatable

Is ADO.NET Entity Framework (with ASP.NET MVC v2) a viable option when writing custom and contantly updated websites?

I've just finished going through the MvcMusicStore tutorial found here. It's an excellent tutorial with working source code. One of my favorite MVC v2 tutorials so far.
That tutorial is my first introduction to using ADO.NET Entity Framework and I must admit that most of it was really quick and straight-forward. However, I am worried about maintainability. How customizable is this framework when the customer requests additional features to their site that require new fields, tables and relationships?
I am very concerned about not being able to efficiently execute customer's change orders because the Entity models are basically drag-and-drop, computer generated code. My experience with code generators is not good. What if something goes haywire in the guts of the model and I'm unable to put humpty-dumpty back together?
In the long run, I wonder if using hand typed models which human-beings can read and edit is a more efficient course than using Entity Framework.
Has anyone worked enough with entity framework to say that they are comfortable using it in a very fluid development environment?
I have been using entity framework(V1.0) for about a year in my current project. We have 100s of tables,all added to the edmx.
Problems we face (though not sure if the new entity framework resolves these issues)
When you are used to VS.net IDE, you
will be used to doing all drag/drop
operations from your IDE. The
problem is, once your edmx hosts
100s of tables,the IDE really stalls
and you would have to wait for 3-4
minutes before it becomes responsive
With so many tables ,any edits you
do on the edmx take long.
When you are going to use a version
control, comparing 10000 line XML is
quite painful. Think about merging 2
branches each having a 10000 line
edmx,the tables, new association
between tables, deleted associations
and going back and forth comparing
xmls. You would need a good xml
comparison tool if you are serious
about merging 2 big edmx files
For performance reasons we had to
make the csdl,msl and ssdl as
embedded resources
Your edmx should have to be in sync
with your DB all the time,or at
least, when you try to update the
edmx, it will try to sync and might
throw some obscure errors if they
are out of sync.
Be aware that your
entities(tables/views) should always
have a primary key, else you will
get obscure errors. See my other
question here
Things We did/I might consider in the future when using EF
Use multiple edmx by using 1 edmx
for tables logically grouped/linked
together. Be aware of the fact that
if you do this, each edmx should
live in its own namespace. If you
try to add 2 related tables(say
person & address) to 2 edmx in the
same namespace, you will get a
compiler error stating that the
foreign key relationship is already
defined. (Tip: create a folder and
create the edmx under this folder.
If you try to alter the namespace in
the edmx without having the folder,
it does not save properly the
namespace the next time you
open/edit it)
fewer tables in edmx => less heavy
container => good
fewer tables in edmx=> easier to
merge when merging 2 branches
Be aware of the fact that object
context is not thread safe
Your repository (or what ever DAO you use) should be responsible for creating and disposing the container it creates. Using DI frameworks, especially in a web app complicated things for us. Web requests are served from the threadpool and the container were not disposed properly after the web request was served as the thread itself was not disposed. The container got reused (when the thread was reused) and created a lot of concurrency issues
Don't trust your VS IDE. Get a good
XML editor and know how to edit the
edmx file (though you don't need to
edit the designer). Get your hands dirty
ALWAYS ALWAYS ALWAYS (just cannot emphasize this enough) run a
SQL profiler (and I mean each and
every step of your code) when you
execute your queries. As complex as
the query might look, you will be
surprised to find how many times you
hit the DB Example:(sorry, unable to
get code to the right format,can
someone format it ?)
var myOrders = from t in context.Table where t.CustomerID=123
select t; //above query not yet
executed
if(myOrders.Count>0)//DB query to
find count {
var firstOrder = myOrders.First()//DB query to get
first result
}
Better approach
// query materialized, just 1 hit to
DB as we are using ToList() var
myOrders = (from t in Context.tables
where t.customerID=123 select
t).ToList();
if(myOrders.Count>0)//no DB hit
{
//do something
var myOrder = myOrders[0];//no DB hit
}
Know when to use tracking and no
tracking(for read-only) and web apps
do a lot of reads than writes. Set
them properly when you initialize
your container
Did I forget compiled queries ? Look
here for more goodies
When getting 1000s of rows back from
your DB, make sure you use IQueryable and detach the
objectContext so that you don't
run out of memory
Update:
Julie Lerman address the same problem with a similar solution. Her post also points to Ward's work on dealing with huge number of tables
I'm not too familiar with Entity Framework, but I believe it simply generates an EDM file which can be hand-edited. I know I've done this quite frequently with the Linq-to-SQL DBML files that the designer generates (it's often faster to hand-edit them than use the designer for small tweaks).
You know I'd be interested if any developers can provide some insight into this.
Any Entity Framework examples seem to only consist of about ten to twenty tables, which is small scale really.
What about using the EF on a database with hundreds or even a thousand tables?
Personally, I know several developers and organisations that were burned by LINQ-to-SQL and are holding off for a year or so to see what direction EF takes.
Starting from Entity Framework 4 (with Visual Studio 2010), the generated code is outputted from T4 (Text Template Transformation Toolkit) files which you can edit so you have full control over what is generated. See Oleg Sych's blog which is a mine of information about T4. Code generation is not a problem and T4 opens so many perspectives that I can't live without anymore.
I'm currently working on a project where we use Entity Framework 4 for the data access layer, and Scrum as the agile project management method. From one sprint to another, there are several tables added, other modified, new requirements added. When you have run once into each potential EF problem (like knowing that default values from database are not persisted by default in the .edmx file, or changing a nullable column to a non-nullable and updating the designer doesn't change the mapped property state), you're good to go.
Edit: to answer your question, it's EF 4 whose code generation is based on T4 rather than T4 supporting EF. On EF 3.5 (or EF 1.0 if you prefer), you could in theory use T4 by writing them from scratch, parsing the EDMX file in the T4 code and generate your entities. It would be quite a lot of work considering all of this is already done by EF 4. Plus, Entity Framework 3.5 only supports one type of entitiy, whereas EF 4 as built in or downloadable templates for POCO entities (that don't know anything about persistence), Self-Tracking Entities...
Considering Entity Framework itself, I think it was lacking many features in its first release, and while usable, was quite frustrating to use. EF4 is much more improved. It still lacks some basic features (like enum support), but it has become my data access layer of choice now.

Core Data Pre-populated SQLite issue, z-metadata

I have this issue with Core Data. I am creating a core-data-based application, for one of the tabs, to populate a UITableViewController. Basically, I have read somewhere that there is an issue with providing a pre-populated sqllite file to be used to load up the data. I created a pre-populated data file and at first had issues with Z_METADATA and other anomalies like that. If we are creating our own sqllite file, is there somethign we have to include, such as certain table names etc?
Note, I didnt create teh application with use core data for storage ticked at beginning, so im not sure if that makes a difference.
Doron, take a look and A Blog On Tech for a really great example of how to get what you are trying to do. Basically it's best to let Xcode create the base SQLite DB for you, copy it to your code directory, pre-populate your data there and then finally add it to the project through Xcode.
So while it is possible to work a Core Data application that you haven't created from the beginning in Xcode it is much easier to start from there.

Resources