I have a Plone site that has a lot of data in it and I would like to query the database for usage statistics; ie How many cals with more than 1 entries, how many blogs per group with entries after a given date, etc.
I want to run the script from the command line... something like so:
bin/instance [script name]
I've been googling for a while now but can't find out how to do this.
Also, can anybody provide some help on how to get user specific information. Information like, last logged in, items created.
Thanks!
Eric
In general, you can query the portal_catalog to locate content by searching various indexes. See http://plone.org/documentation/manual/developer-manual/indexing-and-searching/querying-the-catalog and http://docs.zope.org/zope2/zope2book/SearchingZCatalog.html for an introduction to the catalog.
In some cases the built-in indexes will allow you to do the query you want. In other cases you may need to write some Python to narrow down the results after doing an initial catalog query.
If you put your querying code in a file called foo.py, you can run it via:
bin/instance run foo.py
Within foo.py, you can refer to the root of the database as 'app'. The catalog would then be found at app.site.portal_catalog, where 'site' is the id of your Plone site.
Finding information about users happens via a separate API (for the Pluggable Auth Service). I'd suggest asking a separate question about that.
Related
Hi and thanks in advance. I want to delete a folder from Google Cloud Storage, including all the versions of all the objects inside. That's easy when you use gsutil from your laptop (you can just use the folder name as prefix and put the flag to delete all versions/generations of each object)
..but I want it in a script that is triggered periodically (for example when I'm on holidays). My current ideas are Apps Script and Google Cloud Functions (or firebase functions). The problem is that in these cases I don't have an interface as powerful as gsutil, I have to use REST API, so I cannot say something like "delete everything with this prefix" and neither "all the versions of this object". Thus the best I can do is
a) List all the object given a prefix. So for prefix "myFolder" I receive:
myFolder/obj1 - generation 10
myFolder/obj1 - generation 15
myFolder/obj2 - generation 12
... and so on for hundreds of files and at least 1 generation/version per file.
b) For each file-generation delete it giving the complete object name plus its generation.
As you can see that seems a lot of work. Do you know a better alternative?
Listing the objects you want to delete and deleting them is the only way to achieve what you want.
The only alternative is to use Lifecycle which can delete objects for you automatically based on conditions, if the conditions satisfy your requirements.
I want to know which tables are being read by a query.
for each Customer where CustomerID = 12345.
Eventually this customer will be found in the following example, but progress must 'read' many tables before getting to customer 12345.
How do I know exactly which tables are read (By CustomerID), prior to getting to customer 12345?
*NOTE: I do not have access to modify the code being run for this selection. Ideally I would run a separate set of code that is executed at the same time as the customer query above to track the reads.
EDIT: More clearly - Can you track reads from a given program (.p) OR ProcessID and output either a RECID or the PrimaryKey to a file?
I understand the information is being read off the Disk and probably stored in a database buffer. So how would I get at the information in the database buffer?
You seem to be mixing up a few different things.
In a situation like your example where you FIND a specific record in one, and only one table then there is just a single record read. Progress will find that record by first scanning a relevant index. That might be 2 or 3 "logical reads" of the b-tree to get to the proper node. The record block and index blocks may, or may not be read from disk - that depends on what has happened previously.
There are "Virtual System Tables" available that can tell you how many READ operations take place against a particular table or index. But they do not trace the specific ROWID or other identifying data. _TableStat and _IndexStat are aggregates for all users on the system, _UserTableStat and _UserIndexStat are specific to a particular user's activity. You do need to set the -tablerangesize and -indexrangesize parameters adequately to take advantage of these.
If you have enabled the table and index statistics then you can use a tool like ProTop - http://protop.wss.com to get insight into this activity. Or you can write your own code.
OpenEdge Auditing does not track reads. That would be prohibitively expensive.
It's probably not really a good idea but, in theory, you could write FIND triggers for the tables you are interested in. That doesn't require access to the application source but you would need a development license. It will probably kill performance to do this though - so unless this is a non-production test environment that you just want to fiddle with I wouldn't really do that.
You mention wanting to know how you got to that point. That sounds more like you might need to have a "4gl trace". One easy way to get the stack trace of a running process is to execute:
$DLC/bin/proGetStack PID (UNIX)
or
%DLC%\bin\proGetStack PID (Windows)
This command will generate a "protrace.pid" file containing a 4gl stack trace and other interesting information.
There are also more complicated ways to get that info like using PROMON and the "client statement cache" or setting various log entry types at session startup. But proGetStack is pretty convenient and requires no code or scripting changes.
Some great options from Tom above. And all of them may be relevant to you. The option he only skirts around is the logging options. I feel obliged to expand on this because I'm giving a talk on it in a couple of weeks!
Assuming you are running a modern version of Progress, or even 10.2B08, then you have client logging available to you. Start your session with these additional options:
-clientlog "\somefolder\somefile.txt"
-logentrytypes "QryInfo:3"
This will log all the info of all the queries in your session to the file you specified above. If you navigate to the point in the system where you want to analyse your query and empty the logfile and save it, you can then run the offending query and see all the detail you need.
The output tells you all sorts of useful info, including the number of reads on each table, compared with the number returned to the user. You also get the index selected.
Using Tom's advice and/or this will get you what you need.
I know I can setup multiple namespaces for DoctrineCacheBundle in config.yml file. But Can I use one driver but with multiple namespaces?
The case is that in my app I want to cache all queries for all of my entities. The problem is with flushing cache while making create/update actions. I want to flush only part of my cached queries. My app is used by multiple clients. So when a client updates sth in his data for instance in Article entity, I want to clear cache only for this client only for Article. I could add proper IDs for each query and remove them manually but the queries are dynamically used. In my API mobile app send version number for which DB should return data so I don't know what kind of IDs will be used in the end.
Unfortunately I don't think what you want to do can be solved with some configuration magic. What you want it some sort of indexed cache, and for that you have to find a more powerful tool.
You can take a look at doctrines second level cache. Don't know how good it is now (tried it once when it was in beta and did not make the cut for me).
Or you can build your own cache manager. If you do i recommend using redis. The data structures will help you keep you indexes (Can be simulated with memcached, but it requires more work). What I meen by indexes.
You will have a key like client_1_articles where 1 is the client id. In that key you will store all the ids of the articles of client 1. For every article id you will have a key like article_x where x is the id the of article. In this example client_1_articles is a rudimentary index that will help you, if you want at some point, to invalidated all the caches of articles coming from client 1.
The abstract implementation for the above example will end up being a graph like structure over your cache, with possibly
-composed indexes 'client_1:category_1' => {article_1, article_2}
-multiple indexes for one item eg: 'category_1'=>{article_1, article_2, article_3}, 'client_1' => {article_1, article_3}
-etc.
Hope this help you in some way. At least that was my solution for a similar problem.
Good luck with your project,
Alexandru Cosoi
I have many server that I want to monitor with sensu + InfluxDB. I already created checks and metric collection with Sensu into InfluxDB.
I installed Chronograf to make queries on the DB and it's working like a charm.
But...
For all my servers, I want to have the same graphs:
CPU usage
CPU load
Memory
Disks
etc...
Even if recreating them is very straight forward, I wanted to do it automatically. I want for all my graphs, the ability to choose the server I want to watch. All my data in the database are like this:
server1.memory.total
server1.load_avg.five
server2.memory.total
server2.load_avg.five
[...]
The queries I use for example are like that:
SELECT "value" FROM "metrics".."server1.load_avg.five" WHERE time > now() - 1h
I just want to find the way to select the right server for the graph I want to see.
Can I do that with grafana or chronograf? Maybe I have to develop my own dashboard, what is the best way to begin this?
Chronograf has an undocumented API that will allow for the functionality you're looking for, but it's still in it's early stages and hasn't been tested extensively.
At the moment they're minimally documented on our end. It may require a bit of toying with to figure out how they work. Here's the list of routes for the API
POST "/api/v0/servers"
GET "/api/v0/servers"
GET "/api/v0/servers/:id"
PUT "/api/v0/servers/:id"
DELETE "/api/v0/servers/:id"
GET "/api/v0/servers/:id/version"
GET "/api/v0/servers/:id/query"
POST "/api/v0/dashboards"
GET "/api/v0/dashboards"
GET "/api/v0/dashboards/:id"
GET "/api/v0/dashboards/:id/export"
PUT "/api/v0/dashboards/:id"
DELETE "/api/v0/dashboards/:id"
DELETE "/api/v0/dashboards/:id/visualizations/:vid/cell"
POST "/api/v0/dashboard_import"
POST "/api/v0/dashboards/:id/cells"
PUT "/api/v0/dashboards/:id/cells"
POST "/api/v0/visualizations"
GET "/api/v0/visualizations"
GET "/api/v0/visualizations/:id"
PUT "/api/v0/visualizations/:id"
DELETE "/api/v0/visualizations/:id"
POST "/api/v0/visualizations/:id/statements"
PUT "/api/v0/visualizations/:id/statements/:sid/text"
PUT "/api/v0/visualizations/:id/statements/:sid/config"
DELETE "/api/v0/visualizations/:id/statements/:sid"
I have a Plone 4 site which contains a lot of users and groups which are stored in the ZODB. Over time, we added some functionality which uses relational data (in a PostgreSQL database); some tables have fields which contain user or group ids.
However, currently the users and groups are defined in ZODB rather than the RDB, so we don't have proper foreign keys here. Thus, the obvious idea is to migrate the user and groups data to the RDB - those who/which are used by the Plone site, at least; I assume emergency users need to be an exception to this (but those are no members of any groups anyway).
Would this be a good thing to do?
Are there reasons to do it only partly, or should I transfer everything including group memberships? (Since memberships are stored as lists of users (and/or groups) with the containing group, I could imagine a reverse table which holds all groups a user is member of, and which is maintained by a trigger function.)
Are there any special tools to use?
Thank you!
imho it's based on what you want to achieve. In Plone you have PAS, so technically it doesn't really matter, where you put users, groups and user group relationships.
You can store users/groups in:
Plone (by default)
SQL - pas.plugins.sqlalchemy
LDAP/AD - Products.PloneLDAP
There are also many other plugins for AUTH, like RPX, Goolge+, etc.
You can enable, disable and modify the behabvior of every plugin thru PAS.
Does it make sense, to NOT use Plone users?
Of course, if you want to share user credentials (Example LDAP), or if you need the user informations in other Apps, etc.
Migration
Should be very simple if the PAS plugins you are using supports "Properties" and "User enumeration".
Get the data from one plugin and put the data into another one with a simple python script. Both supports the same API.
the tool you're looking for is https://pypi.python.org/pypi/pas.plugins.sqlalchemy/0.3
I've used this in a webportal where users are "shared" with a newsletter system.
I've 200 users and any problem.
I think the only "good reason" to store users in an external DB rather in zodb/plone is in a use-case like mine.
Have you ever think about "extend" plone users (ex. https://plone.org/products/collective.examples.userdata)? With plone.api you can easly manipulate users' properties in your code.