Scrapy : Unable to extract attribute field using css selector - web-scraping

Here is the HTML code
<!DOCTYPE html>
<html lang="en">
<div class="container" id="content-area">
<div class="flex-row flex-baseline flex-space-between" data-id="1826" id="info">
<h1 class="no-margin">XYZ</h1>
<div class="new-stack" id="sublists">Added</div>
</div>
</div>
I am looking to pull the data-id attribute inside div tag. Here is what I am trying using CSS Selector
>>> response.css("#content-area div")[0].css("::attr[data-id]").get()
And I got below error
cssselect.parser.SelectorSyntaxError: Got pseudo-element ::attr not at the end of a selector
Here is how I solved it by combining CSS and XPATH Selectors.
>>> response.css("#content-area div")[0].xpath("#data-id").get()
'1826'
Is there any solution which can do this using just CSS Selector?

You need to use () instead of []
>>> response.css("#content-area div")[0].css("::attr(data-id)").get()

Related

Add external css file to dom AngleSharp

I have an external CSS file that isn't referenced inside of the html file. Is it possible to add this CSS file and apply the styling to the html via AngleSharp on the fly?
Another work around I've thought of is actually inserting the reference to the CSS in the html before parsing it into the DOM but I wanted to know if AngleSharp provided the initial question before I implemented the "workaround". Thanks so much!
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Test Doc</title>
</head>
<body>
<div id="styleme">
Hello
</div>
</body>
</html>
Notice no css is linked.
And the external css file:
#styleme {
color:blue;
background-color: gray;
}
Yes. There are actually multiple ways.
I guess what you are looking for is:
var config = Configuration.Default.WithCss();
// note: ideally load your document from a URL directly if applicable
var document = await BrowsingContext.New(config)
.OpenAsync(res => res.Content(#"<!doctype html>
<html lang=en>
<head>
<meta charset='utf-8'>
<title>Test Doc</title>
</head>
<body>
<div id=styleme>
Hello
</div>
</body>
</html>"));
var style = document.CreateElement<IHtmlStyleElement>();
// note: if you have the CSS from a URL; choose IHtmlLinkElement instead
style.TextContent = #"#styleme { color:blue; background-color: gray; }";
document.Head.AppendChild(style);
// note: using LINQPad here; you may want to store the style somewhere
document.DefaultView.GetComputedStyle(document.QuerySelector("#styleme")).Dump();
Hope that helps!

Trying to ignore unused layout fragment in Thymeleaf Layout Dialect

Does anyone know if it's possible to hide a layout:fragment if it is not specified in the calling page?
For example, I have a page layout.html that has something like (where there is a separate fragment.html file with header and footer fragments):
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:th="http://www.thymeleaf.org"
xmlns:layout="http://www.ultraq.net.nz/thymeleaf/layout"
th:lang = "en">
<head>
<title layout:title-pattern="$CONTENT_TITLE">TITLE</title>
</head>
<body>
<header layout:replace="fragment :: header">HEADER</header>
<section layout:fragment="messages">MESSAGES</section>
<section layout:fragment="content">CONTENT</section>
<footer layout:replace="fragment :: footer">FOOTER</footer>
</body>
</html>
If in a calling page to the layout that I don't want to include the "messages" fragment, is there a way to do it by just not including that code? For example (say, simple.html):
<html layout:decorator="layout">
<head>
<title th:text=#{PAGETITLE_SIMPLE}>SIMPLE PAGE TITLE</title>
</head>
<body>
<section layout:fragment="content">
<p>Put in some random content for the body of the simple page</p>
</section>
</body>
This will still put into the rendered HTML the text "MESSAGES" inside a <section>-tag.
I have been able to put into this simple.html
<section layout:fragment="messages" th:remove="all"></section>
But this seems somewhat sloppy and was wondering if there was a way to hide that from the users of the layout by putting the logic in the layout to ignore that fragment altogether.
Using Spring 4.1.6, Thymleaf 2.1.4, and Layout Dialect 1.3.3.
Thanks
I was able to resolve this by applying the methods posted by Serge Ballesta in How to check Thymeleaf fragment is defined to the layout dialect.
This is what the rewritten layout.html looks like:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:th="http://www.thymeleaf.org"
xmlns:layout="http://www.ultraq.net.nz/thymeleaf/layout"
th:lang = "en">
<head>
<title layout:title-pattern="$CONTENT_TITLE">TITLE</title>
</head>
<body>
<header layout:replace="fragment :: header">HEADER</header>
<section layout:replace="this :: messages">MESSAGES</section>
<section layout:fragment="content">CONTENT</section>
<footer layout:replace="fragment :: footer">FOOTER</footer>
</body>
</html>
This way, if the calling page (simple.html) only has the <section> for content, no HTML will be rendered for the section for messages. But if the page did have the following, it will be included as intended:
<section layout:fragment="messages">
<p>Message 1</p>
<p>Message 2</p>
</section>

Assemble: Multiple points of content insertion in layout?

All assemble users who uses layouts knows that "{{> body }}" marks the point of insertion of contents of any page who uses the layout. But is it possible to define multiple points of insertions, instead of tossing everything at where the {{> body }} is?
For instance, in my page I would like to define a specific piece of javascript, but I like that custom javascript to be at the very bottom of the page along with out javascript tags. If it only puts everything where the {{> body }} is, this is not possible, since the script will just be appended to the content.
In other words, it would be useful to have {{> script }} or even customizable tags marking different points of insertion, and in the page using the layout, these tags are specifically defined.
Above is my ideal use case, does anyone know if assemble supports anything like this?
#Xavier_Ex check out the assemble handlebars helper repo https://github.com/assemble/example-layout-helpers
And this particular pull request https://github.com/assemble/handlebars-helpers/pull/75
We added some layout helpers about a month ago that allow you to "extend" a layout and include different content sections. Notice that you'll have to include your layout as a partial in the assemble gruntfile setup for this to work properly...
assemble: {
options: {
flatten: true,
assets: 'docs/assets',
partials: ['src/includes/*.hbs', 'src/layouts/*.hbs'],
layout: false,
data: ['src/data/*.{json,yml}', 'package.json']
},
pages: {
src: 'src/*.hbs',
dest: 'docs/'
}
}
Layout (default.hbs)...
<!DOCTYPE html>
<html lang="en">
<head>
{{#block "head"}}
<meta charset="UTF-8">
<title>{{title}} | {{site.title}}</title>
<link rel="stylesheet" href="{{assets}}/{{stylesheet}}.css">
<link rel="stylesheet" href="{{assets}}/github.css">
{{/block}}
</head>
<body {{#is stylesheet "bootstrap"}}style="padding-top: 40px;"{{/is}}>
{{#block "header"}}
{{! Navbar
================================================== }}
{{> navbar }}
{{/block}}
{{! Subhead
================================================== }}
<header class="{{#is stylesheet "bootstrap"}}jumbotron {{/is}}{{#is stylesheet "assemble"}}masthead {{/is}}subhead">
<div class="container">
<div class="row">
<div class="col col-lg-12">
<h1> DOCS / {{#if title}}{{ uppercase title }}{{else}}{{ uppercase basename }}{{/if}} </h1>
</div>
</div>
</div>
</header>
{{! Page content
================================================== }}
{{#block "body"}}
<div class="container">
<div class="panel panel-docs">
{{> body }}
</div>
</div>
{{/block}}
{{#block "script"}}
<script src="{{assets}}/highlight.js"></script>
<script src="{{assets}}/holder.js"></script>
{{/block}}
</body>
</html>
Some page
{{#extend "default"}}
{{#content "head"}}
<link rel="stylesheet" href="assets/css/home.css" />
{{/content}}
{{#content "body"}}
<h2>Welcome Home</h2>
<ul>
{{#items}}
<li>{{.}}</li>
{{/items}}
</ul>
{{/content}}
{{#content "script"}}
<script src="assets/js/analytics.js"></script>
{{/content}}
{{/extend}}
Hope this helps.

How can I customize how the title is displayed?

I want to customize how the HTML for the title of my Dexterity content type is generated.
I wrote a view template for a the type that uses the metadata.IBasic behavior:
<html ...>
<body>
<metal:content-core fill-slot="content-core">
<metal:content-core define-macro="content-core">
<div id="conent-images">...</div>
...
<div id="content-metadata">
<h1 tal:content="context/title">Title</h1>
...
</div>
...
<div id="content-body">...</div>
</metal:content-core>
</metal:content-core>
</body>
</html>
But Plone then renders the title twice. How can I remove the first apparition of title?
With that code you are filling the slot named content-core. There are several slots defined in the layout that is the base for the template: content-title, content-description y content-core.
To remove the first title apparition you can fill the the content-title slot with nothing.
<html ...>
<body>
<metal:content-core fill-slot="content-title">
<metal:content-core define-macro="content-title">
</metal:conent-core>
</metal:conent-core>
<metal:content-core fill-slot="content-core">
<metal:content-core define-macro="content-core">
...
<h1 tal:content="context/title">Title</h1>
...
<div id="content-body">...</div>
</metal:content-core>
</metal:content-core>
</body>
</html>
Other solution is edit the template where slots are defined, but this solution is enough for me.

plone + formlib: how to reference form.pt

I'm working on plone 3.2.1 and I've made a formlib's form with a custom template:
from Products.Five.formlib import formbase
from Products.Five.browser.pagetemplatefile import ViewPageTemplateFile
...
class MyForm(formbase.PageForm):
...
template = ViewPageTemplateFile('myform.pt')
I want to make a simple change to the standard formlib template. My question is: how do I reference the parts/zope2/lib/python/zope/formlib/pageform.pt inside my template?
<!-- myform.pt -->
<metal:macro metal:use-macro="WHAT GOES HERE??">
<div metal:fill-slot="extra-info">
I just want to put a text before the standard formlib template
</div>
</metal:macro>
Finally, I found the answer:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:metal="http://xml.zope.org/namespaces/metal"
xmlns:tal="http://xml.zope.org/namespaces/tal"
xmlns:i18n="http://xml.zope.org/namespaces/i18n"
metal:use-macro="context/main_template/macros/master">
<body>
<div metal:fill-slot="main">
<div metal:use-macro="context/##base-pageform.html/macros/form">
<metal:block fill-slot="extra_info">
<!-- HERE we go -->
</metal:block>
</div>
</div>
</body>
</html>
Just watch out there (for anyone looking for this, like me): the line:
<divmetal:fill-slot="main">
needs a space in between div and metal:
<div metal:fill-slot="main">
Thanks; very helpful solution.

Resources