Query CMSPlugin model for draft or live objects - django-cms

I've got a CMSPlugin & a request to display it's items in another app.
Obviously for every plugin you create there is a live and draft version, so doing audio = Audio.objects.all() brings you duplicate instances.
How would you go about creating a query which only returns the plugin objects from public pages?
My plugin;
class Audio(CMSPlugin):
"""
Model for storing audio clips.
"""
caption = models.CharField(
_("Title"),
max_length=255,
blank=True
)
audio_track = models.FileField()
description = models.CharField(
_("Description"),
max_length=255,
blank=True,
null=True
)

How about something like this:
Audio.objects.filter(placeholder__page__publisher_is_draft=False)
This assumes all Audio plugins belongs to a CMS page. CMSPlugins are not guaranteed to have a page associated with them! Unless you set page_only option to True:
page_only = True
Docs: http://django-cms.readthedocs.org/en/latest/reference/plugins.html#page-only

Related

Is there a way to send metadata to a confluence document via a script?

I have a task where I make a request to download a confluence page from a page and copy it into another with the same content, but changing it's metadata (for example the title of the page is given via a metadata) and I'd like to do this automatically, my idea of it being: the script gets certain parameters (title, author, etc.) and it puts these into the metadata of the second page. Does anyone have an idea how to do this?
If using ScripRUnner/Groovy Runner you can use such Groovy code:
def PAGE_ID = 123456
def pageManager = ComponentLocator.getComponent(PageManager.class)
def page = pageManager.getPage(PAGE_ID)
def newPage = new Page()
newPage.setTitle(page.getTitle())
newPage.setBody(page.getBodyAsString())
newPage.setSpace(page.getSpace());
pageManager.saveContentEntity(newPage, null);

Pages 404 when created via the django-cms API

I'm building a system which creates a site object for clients added to the system and off the back of that creates some basic pages for the site.
Pages are created with published=True however after browsing to them via pages admin or directly to URLs the result is a 404.
To fix the 404 I've noticed simply saving the page's settings then makes the pages load. I therefore suspect there's further actions which are triggered by the CMS admin, whether that be through the form used or signals.
The Code
The creation of pages comes from a post_save signal;
#receiver(post_save, sender=Client)
def create_site_on_client_creation(instance, **kwargs):
"""
When a Client is created, create a Site object with the client slug as a
sub-domain to the main sites base domain. Also update the site object
if the client slug changes.
"""
site = instance.site or Site()
if not instance.site_id:
instance.site = site
instance.save(update_fields=['site'])
create_initial_pages(
site, extra_pages=[
'Privacy Policy',
'Terms of Use'
]
)
The function then creating the content is;
def create_initial_pages(site, **kwargs):
""" Create the initial pages required for a site """
# Create a CMS "home" page on the initial site creation
created_root = False
root_page = Page.objects.on_site(site).drafts().filter(
title_set__language=settings.LANGUAGE_CODE,
title_set__title=HOME_TITLE,
is_home=True
).first()
if not root_page:
created_root = True
root_page = create_page(
title=HOME_TITLE,
template=TEMPLATE_INHERITANCE_MAGIC,
language=settings.LANGUAGE_CODE,
site=site,
published=True
)
if 'extra_pages' in kwargs:
for page_title in kwargs['extra_pages']:
# Check if we can find
existing = Page.objects.on_site(site).drafts().filter(
title_set__language=settings.LANGUAGE_CODE,
title_set__title=page_title,
).exists()
if not existing:
create_page(
title=page_title,
template=TEMPLATE_INHERITANCE_MAGIC,
language=settings.LANGUAGE_CODE,
site=site,
parent=root_page,
published=True
)
if created_root:
root_page.is_home = True
root_page.save()
I think what's happening here is that you are making changes to the root page after it is published which is therefore making changes to the draft:
if created_root:
root_page.is_home = True
root_page.save()
# You may need to publish here
root_page.publish(settings.LANGUAGE_CODE)
The API always fires the publish action on creation if the published kwarg is added: https://github.com/divio/django-cms/blob/develop/cms/api.py#L203
The tests are a good place to see how the api is used by code, finding a test that creates a page tree should help you to see if you have missed a specific setting that causes your page to not be seen: https://github.com/divio/django-cms/blob/develop/cms/tests/test_multilingual.py#L85
Also be sure that your new site has a language configuration in the settings file, it's possible that the admin is confused by the page and language combination and when you "save the page settings" you are actually creating an entry in your current language: http://docs.django-cms.org/en/latest/reference/configuration.html#internationalisation-and-localisation-i18n-and-l10n
I would advise rolling / commenting your code back to just the one root page and get that working as you expect, then when that is working you can expand upon it with more pages.
Try just calling the publish method (with a language) on the page after the call to create_page. You'll need to get the page from the create_page method. e.g...
my_page = create_page(
title=page_title,
template=TEMPLATE_INHERITANCE_MAGIC,
language=settings.LANGUAGE_CODE,
site=site,
parent=root_page,
published=True
)
my_page.publish("en")

Microsoft Custom Vision - Deleting a iteration doesn't delete images associated with it

On the Microsoft Custom Vision documentation there is this Note: "...When you delete an iteration, you end up deleting any images that are uniquely associated with it."
But when I use the Python trainer.delete_iteration(project_id, iteration.id) my images that are uniquely associated with the last trained iteration are not deleted.
Do I need to do something else or this is not working?
The documentation may need to be updated. Looking at the source code for the delete_iteration method it looks like it just sends the DELETE request to the iteration URL:
delete_iteration.metadata = {'url': '/projects/{projectId}/iterations/{iterationId}'}
url = self.delete_iteration.metadata['url']
path_format_arguments = {
'projectId': self._serialize.url("project_id", project_id, 'str'),
'iterationId': self._serialize.url("iteration_id", iteration_id, 'str')
}
url = self._client.format_url(url, **path_format_arguments)
request = self._client.delete(url, query_parameters)
response = self._client.send(request, header_parameters, stream=False, **operation_config)
So, in order to delete the associated images, it looks like you would need to use the delete_images method as well.

Scraping "older" pages with scrapy, rules and link extractors

I have been working on a project with scrapy. With help, from this lovely community I have managed to be able to scrape the first page of this website: http://www.rotoworld.com/playernews/nfl/football-player-news?ls=roto%3anfl%3agnav. I am trying to scrape information from the "older" pages as well. I have researched "crawlspider", rules and link extractors, and believed I had the proper code. I want the spider to perform the same loop on subsequent pages. Unfortunately at the moment when I run it, it just spits out the 1st page, and doesn't continue to the "older" pages.
I am not exactly sure what I need to change and would really appreciate some help. There are posts going all the way back to February of 2004... I am new to data mining, and not sure if it is actually a realistic goal to be able to scrape every post. If it is I would like to though. Please any help is appreciated. Thanks!
import scrapy
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors import LinkExtractor
class Roto_News_Spider2(crawlspider):
name = "RotoPlayerNews"
start_urls = [
'http://www.rotoworld.com/playernews/nfl/football/',
]
Rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//input[#id="cp1_ctl00_btnNavigate1"]',)), callback="parse_page", follow= True),)
def parse(self, response):
for item in response.xpath("//div[#class='pb']"):
player = item.xpath(".//div[#class='player']/a/text()").extract_first()
position= item.xpath(".//div[#class='player']/text()").extract()[0].replace("-","").strip()
team = item.xpath(".//div[#class='player']/a/text()").extract()[1].strip()
report = item.xpath(".//div[#class='report']/p/text()").extract_first()
date = item.xpath(".//div[#class='date']/text()").extract_first() + " 2018"
impact = item.xpath(".//div[#class='impact']/text()").extract_first().strip()
source = item.xpath(".//div[#class='source']/a/text()").extract_first()
yield {"Player": player,"Position": position, "Team": team,"Report":report,"Impact":impact,"Date":date,"Source":source}
If your intention is to fetch the data traversing multiple pages, you don't need to go for scrapy. If you still want to have any solution related to scrapy then I suggest you opt for splash to handle the pagination.
I would do something like below to get the items (assuming you have already installed selenium in your machine):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("http://www.rotoworld.com/playernews/nfl/football/")
wait = WebDriverWait(driver, 10)
while True:
for item in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//div[#class='pb']"))):
player = item.find_element_by_xpath(".//div[#class='player']/a").text
player = player.encode() #it should handle the encoding issue; I'm not totally sure, though
print(player)
try:
idate = wait.until(EC.presence_of_element_located((By.XPATH, "//div[#class='date']"))).text
if "Jun 9" in idate: #put here any date you wanna go back to (last limit: where the scraper will stop)
break
wait.until(EC.presence_of_element_located((By.XPATH, "//input[#id='cp1_ctl00_btnNavigate1']"))).click()
wait.until(EC.staleness_of(item))
except:break
driver.quit()
My suggestion: Selenium
If you want to change of page automatically, you can use Selenium WebDriver.
Selenium makes you to be able to interact with the page click on buttons, write on inputs, etc. You'll need to change your code to scrap the data an then, click on the older button. Then, it'll change the page and keep scraping.
Selenium is a very useful tool. I'm using it right now, on a personal project. You can take a look at my repo on GitHub to see how it works. In the case of the page that you're trying to scrap, you cannot go to older just changing the link to be scraped, so, you need to use Selenium to do change between pages.
Hope it helps.
No need to use Selenium in current case. Before scraping you need to open url in browser and press F12 to inspect code and to see packets in Network Tab. When you press next or "OLDER" in your case you can see new set of TCP packets in Network tab. It provide to you all you need. When you understand how it work you can write working spider.
import scrapy
from scrapy import FormRequest
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors import LinkExtractor
class Roto_News_Spider2(CrawlSpider):
name = "RotoPlayerNews"
start_urls = [
'http://www.<DOMAIN>/playernews/nfl/football/',
]
Rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//input[#id="cp1_ctl00_btnNavigate1"]',)), callback="parse", follow= True),)
def parse(self, response):
for item in response.xpath("//div[#class='pb']"):
player = item.xpath(".//div[#class='player']/a/text()").extract_first()
position= item.xpath(".//div[#class='player']/text()").extract()[0].replace("-","").strip()
team = item.xpath(".//div[#class='player']/a/text()").extract()[1].strip()
report = item.xpath(".//div[#class='report']/p/text()").extract_first()
date = item.xpath(".//div[#class='date']/text()").extract_first() + " 2018"
impact = item.xpath(".//div[#class='impact']/text()").extract_first().strip()
source = item.xpath(".//div[#class='source']/a/text()").extract_first()
yield {"Player": player,"Position": position, "Team": team,"Report":report,"Impact":impact,"Date":date,"Source":source}
older = response.css('input#cp1_ctl00_btnNavigate1')
if not older:
return
inputs = response.css('div.aspNetHidden input')
inputs.extend(response.css('div.RW_pn input'))
formdata = {}
for input in inputs:
name = input.css('::attr(name)').extract_first()
value = input.css('::attr(value)').extract_first()
formdata[name] = value or ''
formdata['ctl00$cp1$ctl00$btnNavigate1.x'] = '42'
formdata['ctl00$cp1$ctl00$btnNavigate1.y'] = '17'
del formdata['ctl00$cp1$ctl00$btnFilterResults']
del formdata['ctl00$cp1$ctl00$btnNavigate1']
action_url = 'http://www.<DOMAIN>/playernews/nfl/football-player-news?ls=roto%3anfl%3agnav&rw=1'
yield FormRequest(
action_url,
formdata=formdata,
callback=self.parse
)
Be carefull you need to replace all to corrent one in my code.

Programicatlly visit (all) ASP.Net page(s) in a website?

In the Security model for out ASP.Net website (.Net 3.5) we store the page name:
page.GetType().Name
as the primary key in a database table to be able to lookup if a user has access to a certain page. The first time a page is visited this record is created automatically in the database.
We have exported these database statements to insert scripts, but each time a new page gets created we have to update the scripts, not a huge issue, but I would like to find an automated way to do this.
I created an attribute that I tagged a few pages with and then wrote a small process to get all the objects that have this attribute, through the reflection create an instance and insert the record using the same code to for page records mentioned above:
IEnumerable<Type> viewsecurityPages = Assembly.GetExecutingAssembly().GetTypes().Where(t => t.IsDefined(typeof(ViewSecurityAttribute),false));
foreach (Type t in viewsecurityPages)
{
object obj = Activator.CreateInstance(t, false);
//clip..(This code just checks if the record already exists in the DB)
if (feature == null)
{
Attribute attb = Attribute.GetCustomAttribute(t, typeof(ViewSecurityAttribute));
if (attb != null)
{
CreateSecurableFeatureForPage((Page)obj, uow, attb.ToString());
}
}
}
The issue is that page.GetType().Name when the page goes through the actual page cycle process is something like this:
search_accounts_aspx
but when I used the activator method above it returns:
Accounts
So the records don't match the in the security table. Is there anyway to programtically "visit" a webpage so that it goes through the actual page lifecycle and I would get back the correct value from the Name parameter?
Any help/reference will be greatly appreciated.
Interesting problem...
Of course there's a (too obvious?) way to programmatically visit the page... use System.Net.HttpWebRequest. Of course, that requires the URI and not just a handle to the object. This is a "how do we get there from here?" problem.
My suggestions would be to simply create another attribute (or use that same one) which stores the identifier you need. Then it will be the same either way you access it, right?
Alternatively... why not just use a 3rd party web spider/crawler to crawl your site and hit all the pages? There are several free options. Or am I missing something?

Resources