I am uploading several files to Alfresco repsitory via webdav. The batch process works fine, but after the upload, all dates in the repository are changed to current date.
How can I make it keep and show the original file dates (creation and modified) ?
Thanks.
You can leverage metadata extractors. The main purpose is to extract metadata from binary files during upload. There are lots of built-in metadata extractors, just look for implementers of interface org.alfresco.repo.content.metadata.MetadataExtracter. There are different extractors that can extract creation date and set it as cm:created on Alfresco node.
You can enable metadata extraction by applying it as a rule on a space, look for action named Extract Common Metadata in the actions drop-down-box while creating the rule.
I don't believe it's possible without the importing code explicitly turning off the default behaviour of the "cm:auditable" policy, and I suspect the WebDAV driver doesn't do this (since it has no way of knowing whether that's appropriate or not - there are cases where forcing the creation and modification dates to today is the correct thing to do).
This behaviour is discussed in some detail here - it might be worth evaluating whether the bulk filesystem import tool is a more appropriate way to import the content into Alfresco, particularly since it can preserve the creation and modification dates if you tell it to (i.e. by specifying the values of those properties).
Related
I'm trying to obtain a dependency graph (either as an image or in text-form) from a bazel cquery. According to the documentation, the option --output=graph is currently only supported by bazel query, but not by cquery. Unfortunately, in our project it's not possible to use query since it fetches some external dependencies with restricted access. Only using a config (with cquery) prevents fetching these restricted dependencies.
Is there a work-around to somehow get a graph-like structure from cquery? The default output is just a flattened list which seems to contain no information on the inter-dependencies between the targets.
If the inter-dependencies can somehow be printed, I guess it would be quite easy to reconstruct an image from it.
The following works: Using query instead of cquery and appending the flag --keep_going to ignore errors caused by external dependencies that cannot be fetched by everybody. Then --output=graph can be used.
Note: The result might be different from a configured cquery, but for our purposes, it doesn't matter much.
I have a Plone instance which contains some structures which I need to copy to a new Plone instance (but much more which should not be copied). Those structures are document trees ("books" of Archetypes folders and documents) which use resources (e.g. images and animations, by UID) outside those trees (in a separate structure which of course contains lots of resources not needed by the ones which need to be copied).
I tried already to copy the whole data and delete the unneeded parts, but this takes very (!) long, so I'm looking for a better way.
Thus, the idea is to traverse my little forest of document trees and transfer them and the resources they need (sparsely rebuilding that separate structure) to the new Plone instance. I have full access to both of them.
Is there a suggested way to accomplish this? Or should I export all of them, including the resources structure, and delete all unneeded stuff afterwards?
I found out that each time that I make this type of migration by hand, I make mistakes that force me to do it again.
OTOH, if migration is automated, I can run it, find out what I did wrong, fix the migration, and do it all over again until I am satisfied.
In this context, to automate your migration, I advise you to look at collective.transmogrifrier.
I recommend jsonmigrator - which is a twist on collective.transmogrifier mentioned by Godefroid. See my blog on it here
You can even use it to migrate from Archetypes to Dexterity types (you just need matching fieldnames (and matching types roughly speaking).
Trying to select the resources to import will be tricky though. Perhaps you can find a way to iterate through your document trees & "touch" (in a unix sense) any resource that you are using. Then copy across only resources whose "timestamp" indicates that they have been touched.
I know you can use touch to create a new empty file.
I just learned that touch can be used to update the access and modification time of a file. I don't quite know in what situations and why do you need to update the access and modification time of a file , i.e. the usefulness of this particular function?
Thanks!
Some utility depends on timestamp of the file.
For example, make uses timestamp to check whether it is required to do something (usually build) based on the timestamp of the source code, and output (executable, object files, ...)
By touching followed by make, the source file, you can force rebuild.
In addition, touch has a -d option that can fake the modification time.
If one "knows what he's doing" she can avoid long build time, due to unnecessary re-compilations.
For example, when adding a declaration to a common header file,
that does not change any old API, one can fake the header real modification time,
and bypass Makefile's dependencies.
I have a bunch of XSD Files which I did not write myself. The files sometimes import each other:
<xs:import namespace="http://www.mysite.com/xmlns/xXX-YYYY/V" schemaLocation="http://www.mysite.com/xmlns/xXX-YYYY/V/schema_A.xsd"/>
and I would like to get an overview of the dependencies without having to read through all of them.
The URI specified by schemaLocation does not exist, instead a catalog.xml File is used to resolve the schema locations.
http://de.wikipedia.org/wiki/XML_Catalogs
Can anybody recommend a tool that can visualize the dependencies of my schemas by also processing the information given in the catalog.xml file?
Thanks
Mischa
To follow up on my comment...
I am not aware of any tool that takes into account OASIS catalog files. Have a look at this response, see if it supports what you need (and your platform).
Strictly speaking, there are a number of issues with dependencies diagrams, which is why such a question should be qualified with why do you want it.
Some think that it truly shows dependencies between XSD files; it is not true: it may show what the author thinks the dependencies are, but that wouldn't be what the processor actually agrees to. "schemaLocation" is just a hint that processors may or may not use: "may not" use it if they're instructed otherwise (well known XSDs could be cached internally, through catalog entries or any other proprietary "catalogs"), or because the processor may decide there is no need to load an external reference when there's no use for it anyway (it may happen in some corner cases).
A diagram built as described by explicit schema locations is definitely easier to do. It only shows what the author intended; it doesn't mean that it is the "real one" (as in content is pulled indirectly, which makes the whole XSD set valid, while individual XSDs, open independently of the set, would be invalid).
Trying to build a diagram where dangling or non existing schemaLocation are overridden through a catalog, is way harder, due to the multitude of ways to structure the content, and the resolution mechanism. It would have the same shortcoming as the one above (except now the author is the one of the catalog file, rather than who authored the XSDs).
The "true" dependency can be built by traversing a schema set already loaded and compiled. Even then, you would still need to define criteria regarding dependencies due to substitutable components (elements in substitution groups or derived types, through the use of the xsi:type attribute). That is even harder.
Take a look at this tool: DocFlex/XML XSDDoc.
It is an XML schema documentation generator.
It doesn't visualize xsd dependencies, but it does work with XML catalogs.
The overview of each XSD file lists all other XSD files referenced from it
(i.e. imported, included or redefined).
There is also an opposite list of those schemas that reference the given one.
So, you can use it to figure out which XSD files depend on which.
At least, that will be easier than reading raw XSD files.
As an example, here is a documentation generated with that tool:
XML Schemas for DITA 1.1. It has been generated basically by two files:
http://docs.oasis-open.org/dita/v1.1/OS/schema/ditaarch.xsd
http://docs.oasis-open.org/dita/v1.1/OS/schema/catalog.xml
ditaarch.xsd is the schema driver that pulls all other schemas (25 in total); catalog.xml is the XML catalog, via which all file references are resolved.
What is specified in schemaLocation attributes in those schemas themselves are just opaque URIs.
I'm looking for the best solution to allow our users to upload XLS spreadsheet so that they can be used to populate tables in our data warehouse (DW).
Our users are heavy Business Object (BO) users, and BO lets you export to XLS. When they have data in a spreadsheet that needs to be loaded to the DW, they need a process to upload the data in the XLS to the DW's db. As a result, we end up with many of these "interfaces" when I think that what we really need is a programmatic automated feed. Using Excel as a data source for inter-system feeds, in my gut, just seems like a bad idea to me.
Question #1: I'd like to see if you agree and why or why not.
OK, there is no swimming against that tide, so I now take as a given that XLS uploads are here to stay for us. Now I need to find the best solution. First, I'll explain what we do now and then what I don't like about it:
Via web pages, we provide empty XLS files (no rows) with a defined set of columns. Each file is intended to be used to update a different target dest table. In each spreadsheet is an "upload" button. Pushing the Upload button results in the macro in the spreadsheet serializing the contents of the file to CSV and FTPing the data to server folder. Periodically, a scheduler fires off an Informatica ETL job that uses the CSV file as input and loads the data into a custom XLS-specific staging table and then, if the records pass edits, into the appropriate target table. Any errors encountered are logged to an error table. For each XLS file uploaded, the data ends up in a separate staging and error table that is specific for the file.
Some of the things I don't like include about our process are:
1) The macro code in the XLS is too exposed, includes passwords for example, can be tampered with and there are issues ensuring that the users are using the latest XLS templates.
2) Business Rule edits are placed in the ETL program, where they should probably be, but because we would like to catch the errors ASAP, i.e, in the spreadsheet, edits are also added to the macro code. This results in duplication of business edits. I want these rules in one place and centrally controlled. IMHO, I think putting any macro code in the XLS introduces a maintenance issue, even calls to stored procedures (some of which we have) or calls to web services (we haven't yet tried to call .NET Web Services from XLS macros.)
3) Every XLS file upload template has its own process with distinct set of staging and error tables and a custom screen for reporting errors encountered. It seems like we need a more generalized re-usable solution.
Besides often getting data exported to XLS from BO, the users like also Excel because it is easier to edit a large number of records and less clunkier than editing individual records via a web interface.
This is the general direction that I am thinking:
First, I want the users to have the ease of editing of Excel with editing, but without including embedded macros in the spreadsheet. I experimented with Farpoint's Grid with Excel compatibility...
http://www.fpoint.com/netproducts/spreadweb/tour/excel.aspx
...and I found that it was quite easy to allow a user the ability to open up an XLS file that resides on their PC and have it open up in a browser and be able to easily access the data read from server-side .NET web code. Excel isn't running locally in their browser, but the functionality of Excel is reproduced, presumably through a lot of client sided scripting that I expect would be a real pain to duplicate myself. You can even cut and paste from a local spreadsheet into the web's spreadsheet. This sounds great, by biggest problem is cost. Our company is near death and won't allow us to purchase any new software.
Next, I want to identify the common components across all spreadsheet upload processing and come up with generic processing code. For example, I imagine a table which defines each of our spreadsheets and the format of each including the column names and data type definitions, perhaps in terms of their destination columns instead of hard coding. Based on this table template definition, I can generate XLS templates for download from this table definition. I can also perform simple generic edits to ensure that the data entered matches the table definition. And one common web page can be used to present the data and allow report data type mismatch errors and allow for the user to correct them. I would also define a common table for storing the data in a "staging" table, using a table with two columns, submission #, row num, name and value, perhaps. No more "custom everything" is the goal.
Next I need to decide where to put the business rules. My dept's mgt firmly believes that all loading of data should be done by Informatica ETL batch processes and therefore the rules/edits belong "in Informatica". I have zero experience with Informatica tools, I am more of a .NET guy. I am therefore unsure as to how these rules are implemented but I suspect that they are not reusable in the sense that they can be used by a .NET web page to validate a particular record against. You see, in some cases, when the user is not performing a bulk upload, they do have the ability to edit a specific record and I would like the same edits that were applied by the ETL bulk insert process to be applied to an individual update attempt to a single record via a web page. If the solution to write a single web service or stored procedure that can be called from either the web page doing an update of a single record or called thousands of times for each record in a bulk upload? The latter sounds inefficient.
Your thoughts on anything above would be very much welcomed.
From a cost perspective, the efforts you'll need to go through to re-create spreadsheet functionality on the web will exceed the cost of Farpoint or other controls. Even if you made $20 an hour, do you think you could complete a working product in under 2 weeks? I think you have the facts on your side when you discussed maintenance issues if you allow ETL functionality to exist in Excel - you have twice the amount of work to maintain the transformation rules. I think you need to convince management that in order to create a maintainable, robust solution you need some flexible utilities.
Farpoint is a good choice. There is also SpreadsheetGear that is a .Net engine that interprets Excel macros and can run on a web server. It has a Win32 control that allows you to create a WinForms solution with very Excel interface functionality. Last time I checked there was no web control for the product. It does an excellent job of providing Excel capability for processing large amounts of data.
Good luck. I think you will find a good solution since you seem to have a good grasp of the pro's and con's of all the different potential solutions.