Is there a plugin or any way to automatically graph wiki content pages? - graph

Having a DokuWiki with some content (regular to small in sice and depth) I would like to automatically generate a GraphViz or Freeplane or any Form of easy to grasp visualisation of my content.
Why? Because the wiki tends to become less and less effective, when searching and organizing its content. As a user I have no good way to get a sharp Idea of the Wiki structure, which is why more and more often topics are not written and found where they supposed to be.
How to generate graphical sitemap of large website is what I found so far, but because my wiki is not that big, it would be quicker for me to just manually make a graph. And because the main topics are not that often updated or extended (like 10 extension a month tops), it would not be that hard to keed it up to date manually.
However, I would like to avoid manual tasks, at least in the future.
So is there a plugin or any other good way to graph the contents?
starting on the landing page, following the internal-wiki-links
using the namespace-sitemap
Either one would be nice, 1. interest me a bit more, because it reflects the paths a user could go, when just calling the wiki-start-page. I am greatful for any help, thanks.

I wrote a simple tool to do just that, the graph can then be analyzed in Gephi. Have a look at this blogpost: http://www.splitbrain.org/blog/2010-08/02-graphing_dokuwiki_help_needed

Related

Making a tree of Wikipedia links

I am trying to use the Wikipedia API to get all links on all pages. Currently I'm using
https://en.wikipedia.org/w/api.php?format=json&action=query&generator=alllinks&prop=links&pllimit=max&plnamespace=0
but this does not seem to start at the first article and end at the last. How can I get this to generate all pages and all their links?
The English Wikipedia has approximately 1.05 billion internal links. Considering the list=alllinks module has a limit of 500 links per request, it's not realistic to get all links from the API.
Instead, you can download Wikipedia's database dumps and use those. Specifically, you want the pagelinks dump, containing information about the links themselves, and very likely also the page dump, for mapping page ids to page titles.
I know this is an old question, but in case anyone else is searching and finds this, I highly recommend looking at Wikicrush to extract the link graph for all of Wikipedia. It produces a relatively compact representation that can be used to very quickly traverse links.

How Does an RSS Feed Work?

How's it going?
I've found a lot of more detailed answers relating to specific problems relating to RSS feeds, but I can't really figure out how you USE one, basically.
Could someone explain?
I see the RSS feed icon at the top of a lot of Wordpress sites, including my own, but when I click it, it just seems to be a long XML file. I don't know what to do with it, or even why it would be there.
How do you use this? Are you meant to hit it with an API request, or is there a particular kind of software that you use?
Cheers
Before telling you what RSS, let me describe you a common problem that many people have.
Say there is a bunch of sites that you really like and it's sort of a
daily routine for you to go thru them. They may be a news site, your
friend's blog, but also craigslist bcause you're currently looking for
a new house and maybe a weather site to know how late you should stay
at work :)
The first thing you do when you get to work, is open your web browser
and these sites in new tabs. It's not particularly cumbersome because
there are just 4 sites. But think about it: maybe there is a new blog
that you start to like and ho, these cartoons are really funny. Maybe
there is also a bit of financial info that you're interested in and
the pictures that your brother is posting to Flickr every couple day:
they just had a new baby! Also, as you're trying to buy a house, you'd
love a little raise and you've figured that your boss really likes it
when you tell her that you've read about your company in the news or
when you tell her about a new competing product... There is also
StackOverflow. You're desperately trying to get this "expert" badge
and boost up your reputation: this may help with your boss too or even
when you're looking for a new job.
Opening all these tabs is starting to take a toll and you keep
forgetting an important one. You're also slowly getting tired of the
different reading experience that all these sites have: small fonts,
large fonts, ads all over...etc. Now you have a problem.
Imagine there is a tool that does the following: you can tell it what sites you care about, and then, this tool will look up the new stuff for you. It will show everything in a nice looking format. It should also help you identify what's really worth seeing ASAP or maybe have some kind of "serendipity" mode that you can go into and find interesting stuff that you would have missed otherwise. The tool will obviously send you to the original sites should you need more info about any particular story or classified...
This tool exists. It's usually called a Reader, mostly because it lets your read more things online. Often times you'll see them called "RSS reader", because RSS is what they use to get the information from all these sites. RSS is the pipe. You as a user should probably not know about it, but that's what the readers depend on. In an ideal world, when you're on site you like, you should just hit "follow" on a button like this one and then you'd be redirected to your reader of choice. Later when new content is added, you'll get it straight in your reader.
To get a bit into more technical details, RSS (like Atom) is an XML flavor. It's a collection (mostly reverse chronological) of entries. Entries have at least a title and a link to the actual story. They should also include a unique identifier and could have other elements like a description, an image, tags, author information... etc.
RSS is great because it's content agnostic. It can be used to represent a lot of different things (as described in the little story) and decouples the publishing platform from the subscribing platform: they don't even know the other one exists. RSS is their lingua-franca.
I wrote a blog post about this very question not long ago. Here's the link if you're interested in reading my personal interpretation. https://www.rss.com/whatisrss
An XML file is all the content of a page, with no markup. The XML represents the data in its rawest, most descriptive form. Many readers can interpret XML sources from a variety of places, and format all of the data in its own unique way.

Documentation Generation - What boxes should I aim to tick?

I'm looking at requiring my team to document their code more thoroughly for some major upcoming projects and to make life a little less painful, I am steering towards XML documentation generators such as Sandcastle, Doxygen or Box Live Documenter.
What are the key considerations I should keep in mind when evaluating the best option and what experiences have led you to a particular decision?
For me the key considerations would be:
Fully automated: Can it be set up in such a way so that pretty much
no outside work is required to
create or edit the documentation.
Fully styled: Can the documentation be fully styled so
that it looks great in a wiki or pdf
after it’s generated. I should be
able to change colors, font sizes,
layouts, etc.
Good Filtering: Can I select only the items I want to be
generated. I should be able to
filter the namespaces, file types,
classes, etc.
Customization: Can I include headers, footers, custom elements,
etc.
I found Doxygen could do all of this. Our workflow is as follows:
Developer makes a change to the code
They update the documentation tags right above the code they just changed
We click a generate button
Doxygen will then extract all the XML documentation from the code, filter it to only include the classes and methods we want, and apply the CSS styling we’ve pre-made for it. Our end result is an internal wiki that looks the way we want, and doesn’t require editing.
Extra: We have all our projects in various git repositories. We pull all these down to one root folder and generate the docs form this root folder..
Would be interested to know how others are automating even further..?
Who is paying for the documentation and why? (is the system stable enough, does it add enough value)
Who is going to read it, and why is she not using a more effective communication channel?
(if correct mostly distance in time/place)
Who is going to keep it up to date.
When are you going to destroy it? (Automatically if it hasn't been read or updated in the past three months?)
I mostly prefer better code to make my life less painful, over more documentation, but I like scenario & unit tests and a high level architecture description.
[edit] Documentation costs time and money to write and keep up to date. JavaDoc style documentation has a serious detrimental effect on the amount of code simultaneously visible and might be a good idea for the developers using the code, but not for those writing it.

Displaying Flowcharts in Asp.net

I want to display some data relations (coming from an xml, for that matter) in asp.net as a flowchart.
It doesn't necessarily have to be a freeware (although it would be nice).
What do you recommend I use?
Edit:
Automatically arranging the node locations according to the relations would be a big advantage.
I don't claim to be an expert in this, but I just spent a fair amount of time trolling around the web looking for such a tool. In the end, I found an awesome javascript engine for doing this:
jsPlumb: http://code.google.com/p/jsplumb/
It will take some work on your part, but the examples are very nice. It should be able to do everything you want.

Interpreting Search Results

I am tasked with writing a program that, given a search term and the HTML source of a page representing search results of some unknown search engine (it can really be anything, a blog, a shop, Google, eBay, ...), needs to build a data structure of the results containing "what's in the results": a title for earch result, the "details" link, the position within the results etc. It is not known whether the results page contains any of the data at all, and whether there are any search results. The goal is to feed the data structure into another program that extracts meaning.
What I am looking for is not BeautifulSoup or a RegExp but rather some clever ideas or algorithms on how to interpret the HTML source. What do I do to find out what part of the page constitutes a single result item? How do I filter the markup noise to extract the important bits? What would you do? Pointers to fields of research covering what I try to to are aly greatly appreciated.
Thanks, Simon
I doubt that there exist a silver-bullet algorithm that without any training will just work on any arbitrary search query output.
However, this task can be solved and is actually solved in many applications, but with different approach. First you have to define general structure of single search result item based on what you actually going to do with it (it could be name, date, link, description snippet, etc.), and then write number of html parsers that will extract necessary necessary fields from search result output of particular web sites.
I know it is not super sexy solution, but it probably the only one that works. And it is not rocket science. Writing parsers is actually extremly simple, you can make dozen per day. If you will look into html source of search result, you will notice that output results are typically very structured and marked with specific div sections or class atributes, so it is very easy to find it in the document. You dont have even use any complicated HTML parsing library for that, something grep-like will be enough.
For example, on this particular page your question starts with <div class="post-text"> and ends with </div>. Everything in between is actually a post text with some HTML formatting that you may want to remove along with extra spaces and "\n". And this <div class="post-text"> appears on the page only once.
Once you go at large scale with your retrieval applicaiton, you will find out that there is not that big variety of different search engines on different sites, and you will be able to re-use already created parsers for sties using similar search engines.
The only thing you have to remember is built-in self-testing. Sites tend to upgrade and change design from time to time. If your application is going to live for some time, you will need to include into your parsers some logic that will check validity of their results and notify you every time search output has changed and is not compatible anymore with your parser. Then you will have to modify particular parser or write new one.
Hope this helps.

Resources