Import "text" strings, to create, automatically, empty pages in MediaWiki - drupal

I am using MediaWiki (Sometimes I think that it could be better to use Drupal) to create a wiki.
I have tried to find out a correct api or something similar to import a table (csv, xml, or another format) with text fields.
The idea is to bring a document with "name-pages" and "tags" to create automatically empty pages.
Finally, the users will see that there are new empty pages to fill!.
And every day pass a scheduler (something like Feed Import in Drupal) to bring new pages. I mean, if the text exists, don't do anything; however, it the text is new, create a new wikimedia page!
I don't find the correct api to do this. Somebody knows any way to do this?
Thank you
Regards!

Related

How many controller files should I create in my Symfony2 application?

In the src/AppBundle/Controller folder, there is a file called DefaultController.php.
I'll create url's like below, should I use just DefaultController.php for all URL requests or is it recommended to use a different controller.php file (UserController.php, FeedController.php, etc) for each part of the site? (profile, feed, settings, etc)
I also have another question. As far as I understand, we put our html files in our /App/Resources/views folder to keep them separated. Do I need to create a specific file for each part of the website just like flat PHP? (settings/index.php, settings/password.php, settings/things.php, settings/security.php, etc).
I am not sure whether this question is suitable for SO or not.
settings
/settings
/settings/password
/settings/things
/settings/security
/settings/privacy
/settings/ban
/settings/notifications
/settings/mail
/settings/mobile
/settings/applications
/settings/advertising
/settings/invite
user
/username
/username/photos
/username/friends
/username/posts
feed
/feed
/feed/posts/postid
For both questions is no hard answer. I should create a controller for each part of your website AT LEAST. Theoretical you could throw everything into one controller but it will be a very long list if you are finished. Another problem is that your action names like indexAction will repeat which is of course not possible because every method must have a different name. And names like index1Action, index2Action and so on is also not a proper solution :-). Another helper is to create an own controller for every ENTITY.
Twig files should only be written for one page only or only for a part of a page. Imagine that you have a homepage with last 10 newsitems but also a news page with more news items (maybe with pagination). The newsitems themselves looks the same on both pages. In this case you could make a home.html.twig, a news.html.twig and also a newsitem.html.twig. Both home and news will include newsitem to show the newsitems...
Hope i gave you a light.

Using XLS files to store JBehave stories

I have based my new project with JBehave, Selenium, Maven and Spring from the etsy sample here : https://github.com/jbehave/jbehave-tutorial/tree/master/etsy-selenium/java-spring
I'm really new to JBehave and I would like to know if there is a way to get it reading stories from XLS files ?
I think there is something to do with the String[] storyPaths() method that returns the stories name but I don't know how JBehave manages their content.
Thanks a lot for the help !
I need to change the storyPaths() method to be able to read stories title.
I also have my answer : I need to extend from org.jbehave.core.io.LoadFromClasspath or org.jbehave.core.io.LoadFromURL classes to get stories content.

How to read website content in python

I am trying to write a program which reads articles (posts) of any website that could range from Blogspot or Wordpress blogs / any other website. As to write code which is compatible with almost all websites which might have been written in HTML5/XHTML etc.. I thought of using RSS/ Atom feeds as ground from extracting content.
However, as RSS/ Atom feeds usually might not contain entire articles of websites, I thought to gather all "posts" links from the feed using feedparser and then want to extract the article content from the respective URL.
I could get URL's of all articles in website (including summary. i.e., article content shown in feed) but I want to access the entire article data for which I have to use the respective URL.
I came across various libraries like BeautifulSoup, lxml etc.. (various HTML/XML Parsers) but I really don't know how to get the "exact" content of the article (I assume "exact" means the data with all hyperlinks, iframes, slides shows etc still exist; I don't want CSS part).
So, can anyone help me on it?
Fetching the HTML code of all linked pages is quite easy.
The hard part is to extract exactly the content you are looking for. If you simply need all code inside of the <body> tag, this shouldn't be a big problem either; extracting all text is equally simple. But if you want a more specific subset, you have more work to do.
I suggest that you download the requests and BeautifulSoup module (both avaible via easy_install requests/bs4 or better pip install requests/bs4). The requests module makes fetching your page really easy.
The following example fetches a rss feed and returns three lists:
linksoups is a list of the BeautifulSoup instances of each page linked from the feed
linktexts is a list of the visible text of each page linked from the feed
linkimageurls is a list of lists with the src-urls of all the images embedded in each page linked from the feed
e.g. [['/pageone/img1.jpg', '/pageone/img2.png'], ['/pagetwo/img1.gif', 'logo.bmp']]
import requests, bs4
# request the content of the feed an create a BeautifulSoup object from its content
response = requests.get('http://rss.slashdot.org/Slashdot/slashdot')
responsesoup = bs4.BeautifulSoup(response.text)
linksoups = []
linktexts = []
linkimageurls = []
# iterate over all <link>…</link> tags and fill three lists: one with the soups of the
# linked pages, one with all their visible text and one with the urls of all embedded
# images
for link in responsesoup.find_all('link'):
url = link.text
linkresponse = requests.get(url) # add support for relative urls with urlparse
soup = bs4.BeautifulSoup(linkresponse.text)
linksoups.append(soup)
linktexts.append(soup.find('body').text)
# Append all text between tags inside of the body tag to the second list
images = soup.find_all('img')
imageurls = []
# get the src attribute of each <img> tag and append it to imageurls
for image in images:
imageurls.append(image['src'])
linkimageurls.append(imageurls)
# now somehow merge the retrieved information.
That might be a rough starting point for your project.

How to OR solr term facets via the search URL in Drupal 7 site?

I have a Drupal 7 website that is running apachesolr search and is using faceting through the facetapi module.
When I use the facets to narrow my searches, everything works perfectly and I can see the filters being added to the search URL, so I can copy them as links (ready-made narrowed searches) elsewhere on the site.
Here is an example of how the apachesolr URL looks after I select several facets/filters:
search_url/search_keyword?f[0]=im_field_tag_term1%3A1&f[1]=im_field_tag_term2%3A100
Where the 'search_keyword' portion is the text I'm searching for and the '%3A' is just the url encoded ':' (colon).
Knowing this format, I can create any number of ready-made searches by creating the correct format for the URL. Perfect!
However, these filters are always ANDed, the same way they are when using the facet interface. Does anyone know if there is a syntax I can use, specifically in the search URL, to OR my filters/facets? Meaning, to make it such that the result is all entries that contains EITHER of the two filters?
Thanks in advance for any help or pointers in the right direction!
New edit:
I do know how to OR terms within the same vocabulary through the URL, I'm just wondering how to do it for terms in different vocabularies. ;-)
You can write a filter query that looks like:
fq=field1:value1 OR field2:value2
Alternatively you can use localparams to specify the query operator:
fq={!q.op=OR}field1:value1 field2:value2
As far as I know, there's not any easier way to do this. There is, in fact, an rather old bug asking for a way to OR the fq parameters...
I finally found a way to do this in Drupal
Enable the fq parameter setting.
Go to admin/config/search/apachesolr/[your_search_page]/core_search/edit or just navigate to the settings of the search page you're trying to modify
Check the 'Allow user input using the URL' setting
URL Syntax
Add the following at the end of the URL: ?fq=tid:(16 OR 38), where 16 and 38 are the term ids

Python selecting a value in a combo box and HTTP POST

In Python, I'm trying to read the values on http://utahcritseries.com/RawResults.aspx. How can I read years other than the default of 2002?
So far, using mechanize, I've been able to reference the SELECT and list all of its available options/values but am unsure how to change its value and resubmit the form.
I'm sure this is a common issue and is frequently asked, but I'm not sure what I should even be searching for.
So how about this:
from mechanize import Browser
year="2005"
br=Browser()
br.open("http://utahcritseries.com/RawResults.aspx")
br.select_form(name="aspnetForm")
control=br.form.find_control("ctl00$ContentPlaceHolder1$ddlSeries")
control.set_value_by_label((year,))
response2=br.submit()
print response2.read()
With problems relating to AJAX-loading of pages, use Firebug!
Install and open Firebug (it's a Firefox plugin), go to the Net page, and make sure "All" is selected. Open the URL and change the select box, and see what is sent to the server, and what is received.
It seems the catchily-named field ctl00$ContentPlaceHolder1$ddlSeries is what is responsible.. Does the following work..?
import urllib
postdata = {'ctl00$ContentPlaceHolder1$ddlSeries': 9}
src = urllib.urlopen(
"http://utahcritseries.com/RawResults.aspx",
data = urllib.urlencode(postdata)
).read()
print src

Resources