Alfresco Repo Side working on Document Upload - alfresco

I would like to understand how does the Alfresco Repo works whenever any document is uploaded. How exactly the content is stored in file system metadata in DB and indexes in Solr respectively ?

You have to dive deep down in Alfresco documentation and code as well for getting all these clarifications.
Alfresco stores the documents in the physical directories in proper folder structures(which you can define in your code as well). CMIS services expose the functionalities for all operations which you can perform on documents. CMIS internally uses cache in CMIS session for better performance.
Documents metadata and node references all are being stored in DB(PostgreSQL) and indexing of documents are being done automatically through Solr in latest versions.
You have to specify the fields which you want in Solr document for indexing. Searching documents in Solr is faster than DB but queryConsistency is Eventual for Solr. So, as per your use case you have to decide whether query Solr or DB.
So, whenever any operation(CRUD) on any document is being done, it reflects in DB first then async indexing of that document happens in Solr. This Async indexing leads to Eventual consistency.

Related

How to keep generation number when copy file from gcs bucket to another bucket

I'm using gcs bucket for wordpress (wp-stateless plugin)
after create and upload media file to a bucket. I copy it to other bucket (duplicate). But generation number of each object has been change (maybe random).
My question is: How to keep generation number same bucket source like in destination bucket?
Thanks in advance.
Basically, there’s not an official way of keeping the same version and generation numbers when copying files from one bucket to another. This is WAI and intuitive because the version number refers to this object (which resides on this bucket), when you copy it to another bucket, it's not the same object (it's a copy) so it cannot keep the same version number.
I could think of a workaround, keeping somewhere your own version of the objects and then through the API make an organized copy. This would mean you would be dumping the bucket but you would need to have a list of all the objects and its versions and then add them in sequential order (sounds like a lot of work). You could keep your own versioning (or the same versioning) in the metadata of each object.
I would recommend that if your application depends on the object’s versioning, to use custom metadata. Basically, if you did your own versioning using custom metadata, when copying the objects to a new bucket, it would keep the same metadata.
There is already a feature request created about this. But, it has mentioned that it's currently infeasible.
However, you can raise a new feature request here

Firestore schema migration between projects

I have a Firebase project which basically have two environments: Staging and Production. The way I organized them is by creating different Firebase projects. Each of my projects uses Firebase Cloud Functions and Firestore. Except for that, I have each of the projects associated with a specific GIT branch. Both of the branches are integrated into CI/CD pipeline in Google Cloud Build.
So, in order to make it absolutely clear, I will share a simple diagram:
As you can see, I have the source code for the cloud functions under source control and there's nothing to worry about there. The issue comes in when I have the following situation:
A Firestore schema change is present on Staging
Cloud function (on Staging) is adjusted to the new schema.
Merge staging branch into production.
Due to the old Firestore schema on production, the new functions there won't work as expected.
In order to work around it, I need to manually go to the production Firestore instance and adjust the schema there (there's a risk to mess up production data).
In the perfect case, I would have that operation automated and existing project data would be adjusted to a new schema which comes in dynamically after merge.
Is that possible somehow? Something like migrations in .NET Core.
Cloud Firestore is schema-less - documents have no enforced schema. Code is able to write whatever fields it wants at any time that it wants. (For web and mobile clients, this is gated by security rules, but for backend code, there are no restrictions.) As such, there is no such thing as a formal migration in Cloud Firestore.
The "schema" of your documents is effectively defined by your code that reads and writes those documents. This means that migrating a data to a new format means that you're going to have to write code to perform the required changes. There is really no easy way around this. All you can really do is design your updates so that they are not disruptive to existing code when it comes time to move them to another environment. This means your code should be resilient to breaking changes, or simply do not perform breaking changes until after all code has been updated to deal with those changes.
You have to use Google Cloud to download an archive of the Firestore data. Run a migration script yourself on the archive, and then upload the archive to restore your Firestore database.
https://cloud.google.com/firestore/docs/manage-data/export-import
Google Cloud gives you a lot of command line access for managing your Firestore service.
// manage indexes
gcloud firestore indexes
// export all data to a bucket
gcloud firestore export gs://[BUCKET_NAME]
// import data from a bucket
gcloud firestore import gs://[BUCKET_NAME]/[filename]
// manage admin "functions" currently running (i.e. kill long processes)
gcloud firestore operations
To download/upload your JSON archive from Google Cloud buckets
// list files in a bucket
gsutil ls gs://[BUCKET_NAME]
// download
gsutil cp gs://[BUCKET_NAME]/[filename] .
// upload
gsutil cp [filename] gs://[BUCKET_NAME]/[filename]
Once you setup Google Cloud to be accessible from your build scripts. It's possible to automate data migration scripts to download, transform and then upload data.
It's recommended to maintain a "migrations" document in your Firestore so you can track which reversion of the migration needs to be done.
To avoid heavy migration tasks try adding a "version" property to documents, and then use the converter callbacks on the query builder to mutate data to the latest schema on "client side". While this won't help you handle changes with Firestore rules or functions. It's often easier to make tiny changes that are mostly cosmetic.

Search for the node just created Alfresco JAVA

In one requirement I need to query just created document. If I use lucene search then it will take few seconds to do the indexing and may not come in the search result.
The query should be executing from some alfresco webscript or a scheduler which runs every 5 seconds.
Right now I am doing it by using NodeService and finding child by name which is not the efficient way to do. I am using JAVA API.
Is there any other way to do it?
Thank you
You don't mention what version of Alfresco you are using, but it looks like you are using Solr.
If you just created the document, the recommendation is to keep the reference to it, so you don't have to search for it again.
However, sometimes it is not possible to have the document reference. For example, client1 is not aware that client2 just created a document. If you are using Alfresco version 4.2 or later, you can probably enable Transactional Metadata Queries (TMQ), which allows you to perform searches against the database, so there is no Solr latency. Please review the whole section, because you need to comply with four conditions to use TMQ:
Enable the TMQ patch, so the nodes properties tables get indexed in the database.
Enable searches using the database, whenever possible (TRANSACTION_IF_POSSIBLE).
Make sure that you use the correct query language (CMIS, AFTS, db-lucene, etc.)
Your query must be supported by TMQ.

Is there any open source folder/browser/creator for Azure Blob Storage?

We are looking to create a document management area that models a standard folder structure using Azure blob storage in an MVC3 applicaiton.
E.g.
Users can create folders
User can upload documents to folders
Users can list directory contents etc
Users can delete documents
Users can download documents
Now I appreciate Azure Blob storage only has containers and the rest is faked by slashes creating paths. However this sort of functionality seems like the sort of thing someone else must have created?
I did some searching but coultn't find anything. Basically something like CloudXplorer or Azure Storage Explorer but web based
Does anyone know of any azure blob storage implementation with web front end (ideally MVC)?
You can start looking at BlobShare, an MVC application which allows you to upload, download, view, share, ... blobs:
However, for the sorting part you'll need to build something yourself. I would personally consider using the Windows Azure Caching (Preview) to do this; whenever you access a 'directory', cache the blobs in that 'directory' and do the sorting based on the data in teh cache (same applies to paging). And use something like Service Bus Topics / Queues to refresh the cache whenever someone adds/deletes/renames/... a blob (+ timeout for directories not accessed in X minutes).
I also found this sample. http://peterkellner.net/2010/11/12/azure-storage-treeviewer-directory-browser/. It isn't web based but seems to create a basic tree structure and displays it. There is also related discussion here http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/256cfc0f-bccc-4bf7-b7eb-cb7c7aca0c8a
See my answer over here https://stackoverflow.com/a/65944680/7988162 on how to use the Storage Explorer across all platforms.
Hierarchical blob storage now exists through Data Lake Storage Gen2. Check out documentation here.

How enable iCloud support for sqlite?

I want to provide iCloud support for my wrapper around sqlite. Is not using coredata.
I wonder how enable iCloud for it. The database content is changed all the time (is for invoicing). Also, if is possible to have some kind of versioning will be great.
Exist any sample I can use to do this?
The short answer is no, you would need to use Core Data as you suspected. Apple has stated that sqlite is unsupported.
Edit: Check out the section on iCloud that's now in the iOS Application Programming Guide under Using iCloud in Conjunction with Databases
Using iCloud with a SQLite database is possible only if your app uses
Core Data to manage that database. Accessing live database files in
iCloud using the SQLite interfaces is not supported and will likely
corrupt your database. However, you can create a Core Data store based
on SQLite as long as you follow a few extra steps when setting up your
Core Data structures. You can also continue to use other types of Core
Data stores—that is, stores not based on SQLite—without any special
modifications.
You can't just put the SQLite database in the iCloud container, because it might get corrupted. (As you modify an SQLite DB, temporary files are created and renamed, so if the sync process starts copying those files, you'll get a corrupt database.)
If you don't want to move to Core Data, you can do what Core Data does: store your database in your document folder, and store a transaction log in the iCould container. Every time you change the database, you add those changes to a log file, so you can play them back and make equivalent changes on other devices.
This gets pretty complicated: aside from getting the log/reply logic right, you'll want to coalesce redundant changes and periodically collapse the log into a complete copy of the database.
You might have an easier time developing a solution if you can exploit knowledge of your application (Core Data has to solve the problem in the general case). For example, you could save invoices as separate files in the cloud container (text, Property List, XML, JSON, whatever), writing them out as the database changes and only importing ones if the system tells you they were created or changed.
In summary, your choice is either to migrate to Core Data or write a sync solution yourself. Which one is best depends on the particulars of your application.

Resources