I want to evaluate some statistics of my Plone installation and first of all I wanted to know how the total number of Pages of my Plone changing over time.
I had a look at the existing Plone statistics addons. Unfortunately there are none for Plone 5.0. I got quintagroup.analytics running. This addon has interesting metrics, but does not help me seeing the development over time.
So i started programming myself. In order to get the current number of Pages I plan to use a catalog query like this:
catalog = getToolByName(self.context, 'portal_catalog')
catalog.searchResults({'portal_type': 'Document'})
return len(results)
The python script should be run by a cron job every hour and write the result to a log file for me to evaluate later.
My question for you is: Are there any simpler solutions I did not see? Would my solution work? Do you see any conceptual errors in that?
I wonder that there are not more questions like that on the internet. Are people not that much interested in the metrics of their CMS, or did I ignore an obvious simple solution for that?
At the moment the solution does not yet ron, because of my inexperience with the structure of the plone addons, especially calling a python script, but I am working on that.
If you go to "Site Setup" -> "Dexterity Content Types" you can see how many objects of a certain content type currently exist in your site, e.g.:
http://plonedemo.kitconcept.com/##dexterity-types
There is no out-of-the box way to gather or present those statistics over time though.
You could use the data the catalog itself provides:
portal_catalog.Indexes['portal_type'].uniqueValues(withLengths=True)
gives you counts for all types, as a list of (name, count) tuples:
(('CaptchaField', 2), ('Collection', 2), ('Document', 676), ...
I've not double-checked if this avoids searching and len-ing the results, but it's easier than catalog searches if you think you might be interested in more than one type.
(I've only checked this on 4.3.x/Archetypes, but I see no reason it would not work with 5.x/Dexterity as long as it still uses portal_catalog.)
Related
how are you. For a while I've been working for a Gynecologist building her a data base. For the project I am using Firebase and JavaScript. The database is for her to keep track of their patients and she keeps reports on each one of them. I am almost done with the job, the UI is almost finished, the core functionalities of the database (save data, delete, retreive, and update) are up and running but I am stuck in one little thing. She asked me for a way to turn those reports she keeps in the database into a format like PDF so she can print them and give them in case needed to her patients. The thing is that Ive tried with html2pdf, a git repository that works kind of clunky, and tried looking for others but I still cant find one that works correctly. So I wanted to ask you guys if you know of some alternatives. I started thinking about using EXCEl or Word document. But either way it seems quite complicated. Thank you for your time.
Best to all.
I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.
There has been a LOT of development in the Meteor world, and as such it's getting hard to find answers that work for current versions due to the plethora of answers you find for old, out-dated versions.
I have an app that has a LOT of data in a particular collection. By lots I mean somewhere between 10k-100k, and very potentially a lot more. Essentially it's log data, and I need to display the results in a table with no pagination (like a tail). In researching ways to optimize large collections I keep running into things like this that seem to be for older versions of Meteor.
So, as I see it my options are:
Use fast-render plugin to display the page prior to the subscription (at least this is my understand on how it works).
Use some sort of progressive publish function, where it loads limited more relevant bits of data first, then progressively loads the remaining data by expanding the window/limit (not sure if this would cause heavier load on the server, though). There seems to have been a "progressive publish" plugin, but it doesn't seem to be under active development any longer.
Optimize the lookups via indexing (How do you specify that when creating the collection???)
Profiling and optimizing the template further (not sure how).
Some other method I haven't thought of yet...
Some combination of all-the-above.
What is the proper approach by which to publish and render lots of data in this way?
I'm going to assume that "optimize" means reduced query time.
Always start with the biggest bang for your buck.
Unless you're publishing the entire collection, or query on the _id, then you want to create an index using _ensureIndex. Get more info on this on the mongodb website or by searching other questions. http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/
Second, limit the fields to just the info you need. eg {fields: {a:1, b:1}}. http://docs.meteor.com/#/full/fieldspecifiers
Third, don't sort.
If this still isn't good enough, make another question with schema & query details & the desired UI so we can better understand the reactivity and why you can't use some form of pagination.
I need to make an application which will access an URL(like http://google.com) and return the time spent to load all elements(images, css, js...) and compare this results with the previous results.
This application need to be a Desktop app, and I will save the informations in a text file ou xml, and use this file do compare with previous results.
I have searched for a similar application, but nothing...
There are some plugins for firefox that list these elements, like Yslow or Firebug, but not what I need.
So, i'm totally lost and I don't know how to start this work?
Exists the possibility of make this application? What language is better for this type of application?
Thks!
This is a very objective question, so without you elaborating more on your requirements, you may not get any useful answers.
Some things you would need to answer are: how many URLs you want to check, where are you wanting to store the results (database, files etc), does it need to run on the desktop or on a server etc.
Personally, I like the statistics that cURL gives you - DNS time, connect time, receive time etc - so you could write something in PHP, but as I stress that is personal preference and may not suit your situation.
I can see where to get an rss feed for the BUG LIST, however I would like to get rss updates for modifications to current bugs if possible.
This is quite high up when searching via Google for it, so I'm adding a bit of advertisement here:
As Bugzilla still doesn't support this I wrote a small web service supporting exactly this. You can find its source code here and a running instance here.
What you're asking for is the subject of this enhancement bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=256718
but no one seems to be working on it.
My first guess is that the way to do it is to add a template somewhere like template/en/default/bug/show.atom.tmpl with whatever you need. Put it in custom or an extension as needed.
If you're interested in working on it or helping someone with it, visit channel #mozwebtools on irc.mozilla.org.
Not a perfect solution, but with the resolution of bug #255606, Bugzilla now allows listing all bugs, by running a search with no criteria, and you can then get the results of the search in Atom format using the link in the bottom of the list.
From the release notes for 4.2:
Configuration: A new parameter search_allow_no_criteria has been added (default: on) which allows admins to forbid queries with no criteria. This is particularly useful for large installations with several tens of thousands bugs where returning all bugs doesn't make sense and would have a performance impact on the database.