How to filter F-Droid (MediaWiki) recent changes by package name? - rss

F-Droid offers a page of recent changes both as HTML and RSS:
https://f-droid.org/wiki/index.php?title=Special:RecentChanges&days=30&from=&hidebots=0&hideanons=1&hideliu=1&limit=500
https://f-droid.org/wiki/api.php?action=feedrecentchanges&limit=500
The RSS does not show the same content. It seems to be truncated. Applying the same parameters as with the HTML link causes all <item> elements to be dropped.
I like to filter the RSS (and optionally the HTML) by a specific package name. Is this possible?
Related
https://f-droid.org/wiki/api.php
https://www.mediawiki.org/wiki/API:Main_page
https://www.mediawiki.org/w/api.php?action=help&modules=feedrecentchanges

Related

When using apoc.load.html, Is it possible to return the full HTML rather than only text?

Lets say I want to scrape the Neo4j RefCard found at: https://neo4j.com/docs/cypher-refcard/current/
And I would like to fetch a 'code' example along with its styling. Here's my target. Notice that it has CSS treatment (font, color...):
...so in Neo4j I call the apoc.load.html procedure as shown here, and you can see it's no problem finding the content:
It returns a map with three keys: tagName, attributes, and text.
The text is the issue for me. It's stripped of all styling. I would like for it to let me know more about the styling of the different parts of this text.
The actual HTML in the webpage looks like following image with all of these span class tags: cm-string, cm-node, cm-atom, etc. Note that this was not generated by Neo4j's apoc.load.html procedure. It came straight from my Chrome browser's inspect console.
I don't need the actual fonts and colors, just the tag names.
I can seen in the documentation that there is an optional config map you can supply, but there's no explanation for what can be configured there. It would be lovely if I could configure it to return, say, HTML rather than text.
The library that Neo4j uses for CSS selection here is jsoup.
So I am hoping to not strip the <span> tags, or otherwise, extract their class names for each segment of text.
Could you not generate the HTML yourself from the properties in your object? It looks they are all span tags with 3 different classes depending on whether your using the property name, property value, or property delimiter?
That is probably how they are generating the HTML themselves.
Okay, two years later I revisited this question I posted, and did find a solution. I'll keep it short.
The APOC procedure CALL apoc.load.html is using the scraping library Jsoup, which is not a full-fledged browser. When it visits a page it reads the html sent by the server but ignores any javascript. As a result, if a page uses javascript for inserting content or even just formatting the content, then Jsoup will miss the html that the javascript would have generated had it run.
So I have just tried out the service at prerender.com. It's simple to use. You send it a URL, it takes your url as an argument and fetches that page itself and executes the page's javascript as it does. It returns the final result as static HTML.
So if I just call prerender.com with apoc.load.html then the Jsoup library will simply ask for the html and this time it will get the fully rendered html. :)
You can try the following two queries and see the difference pre-rendering makes. The span tags in this page are rendered only by javascript. So if we call it asking for its span tags without pre-rendering we get nothing returned.
CALL apoc.load.html("https://neo4j.com/docs/cypher-refcard/current/", {target:".listingblock pre:contains(age: 38) span"}) YIELD value
UNWIND value.target AS spantags
RETURN spantags
...but if we call it via the prender.com website, you will get a bunch of span tags and their content.
CALL apoc.load.html("https://service.prerender.cloud/https://neo4j.com/docs/cypher-refcard/current/", {target:".listingblock pre:contains(age: 38) span"}) YIELD value
UNWIND value.target AS spantags
RETURN spantags

Parsing page data into sidebar - wordpress

What would be the proper procedure for accessing the current page html data and picking up all of a certain tag and throwing them into the sidebar as links?
I'm not sure your proficiency with php, but I'll give you and overview of what you'd probably want to do.
First, you need the HTML. I'm assuming you're running this on a page (in a page.php file or single.php file, or similar), this means that you have access to the global variable $post, which contains the html of the page in it. To access it you can use the helper function get_the_content(), this returns the html being displayed.
Next you need to parse through this to get the h2 tags. A simple regex can handle this, something like <h2[^>]*>(.*)</h2>. It's important to remember that this regex is very picky, so format your html correctly, no multiline h2s.
So now you have the html, and have parsed it with a regex to get the h2s. Now you need to generate the list from the results, and prepend it to the top of the content of the page. There are a ton of ways to do this, the easiest being just running the code in the right spot in the template file.
Of course there are probably better ways of doing this, I'd recommend you look at say a FAQ plugin (if that's what this is for), or do the lists manually (as this system can be broken), or possibly use a custom post type; but for your question, that's how I'd do it.

Autogenerated menu based on headings (h1, h2..)?

I use Wordpress as tool for publicizing big chunks of static text (as pages ...) and I would like to have Table of Content, auto-generated based on HTML headings h1,h2,h3...
How can I do that?
I dont know any plugins that does the job, maybe there are. But a plugin that would does that would need to parse the current post (using database/on page request), get only the headings(using parser like regex) and use them as a menu (write the results as a html list of ul li). Another addition would be to cache the result according to the page revision but it's your choice.

Adding Facebook Meta tags to dotnetnuke pages

I have added the Facebook comments plugin to dynamic pages on my site, and I am attempting to set up administration of those plugins by adding the associated meta tags to their pages. My site uses the CMS dotnetnuke.
I went to the page's settings - under advanced - and added the appropriate meta tag information. However, upon saving, administration is not enabled on that page.
I ran the page through the wc3 validator, and the following error related to that meta tag was produced:
Error Line 11, Column 1926: there is no attribute "property"
…type="text/javascript"></script><meta property="fb:admins" content="76804243"/>
✉
You have used the attribute named above in your document, but the
document type you are using does not support that attribute for this
element. This error is often caused by incorrect use of the "Strict"
document type with a document that uses frames (e.g. you must use the
"Transitional" document type to get the "target" attribute), or by
using vendor proprietary extensions such as "marginheight" (this is
usually fixed by using CSS to achieve the desired effect instead).
This error may also result if the element itself is not supported in
the document type you are using, as an undefined element will have no
supported attributes; in this case, see the element-undefined error
message for further information.
How to fix: check the spelling and case of the element and attribute,
(Remember XHTML is all lower-case) and/or check that they are both
allowed in the chosen document type, and/or use CSS instead of this
attribute. If you received this error when using the element
to incorporate flash media in a Web page, see the FAQ item on valid
flash.
Do I need to specify this attribute somehow in my DNN skin? Any ideas on a possible fix?
Thanks!
Alex
There are several ways you can change the DocType for your site. Here's the entry from the DotNetNuke wiki that describes the options:
http://www.dotnetnuke.com/Resources/Wiki/Page/Set-the-doctype-of-your-skin.aspx

ASP.NET RSS Feed giving style sheet error

Hey im wondering why I am receiving the following error in my rss feed
"This XML file does not appear to have any style information associated with it. The document tree is shown below."
from a bit of research ive done this is becuase I dont have a stylesheet attached. But I have done plenty of RSS Feeds before and normally they pick up the default look and feel as below
http://www.tbray.org/ongoing/ongoing.atom
I am just wondering why this one is giving the above error ?
Most likely, the content-type of your feed is not correct, so IE is treating it as raw XML.
What is the URL of your feed? What browser are you using?
the tbray.org URL returns a
Content-Type: application/atom+xml
What is the content-type header returned by your misbehaving feed?
('wget --save-headers ...' may be useful)
my guess is that you need to make sure you declare your namespaces.
Take tim bray's feed and save it locally as test.htm. bring it up in firefox and it will show nicely. now if you remove a namespace in that's being used like :thr the content will disappear. if you remove the base namespace you'll just get plain text.

Resources