I'm using the WP-Property and WP-Property: Importer plugins to manage properties on a WordPress powered website. The importer uses xpath rules to help map fields in an imported XML file to their corresponding field on the site.
e.g. 'Display Address' maps to 'address/full'
I have a set of elements that look like this:
<property>
<feature1>Feature</feature>
<feature2>Feature</feature>
<feature3>Feature</feature>
<feature4>Feature</feature>
<feature5>Feature</feature>
<address>
<full>abc</full>
<street>def</street>
<postcode>ghi</postcode>
</address>
</property>
I want to group these together into one entry rather than setting up separate fields for each one, so I'm looking for a means to match feature* but everything I've tried so far seems to have missed the mark. Goes without saying that I've never dabbled with xpath before today!
Related
i am noting that in a RSS feed you can add the tag
Source: https://www.w3schools.com/xml/rss_tag_category_item.asp
But I don't undestand one thing: is there a list with all the categories? Or can I write anything? I need a category about videogames
Or can I write anything?
You can write anything.
Unless you're submitting your feed to a directory, with a documented set of categories, it's essentially free text.
However, in RSS:
It has one optional attribute, domain, a string that identifies a categorization taxonomy.
The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy.
and in Atom:
The "scheme" attribute is an IRI that identifies a categorization
scheme.
you can indicate that your term is from a specific scheme.
In practice, some schema extensions like iTunes introduce a separate element:
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
...
<itunes:category text="Sports">
<itunes:category text="Wilderness"/>
</itunes:category>
rather than suggesting use of the scheme attribute. The iTunes guide currently includes:
<itunes:category text="Leisure">
<itunes:category text="Video Games" />
</itunes:category>
I'm using scrapy to scrape privacy policies by crawling a website from its homepage as such, I want to intelligently crawl specific links within pages containing specific keywords (privacy, data, protection etc...).
I saw that scrapy's CrawlSpider and the LinkExtractor object allow for just that, however I would like the LinkExtractor to not only apply a regex to the discovered links, but also to the text within the <a></a> tags
In order to, for example, better identify cases like these:
Check out our privacy policy
In which, the URL might not be a perfect match, but the text within the HTML tags is more helpful.
I saw that scrapy's LinkExtractor object already has an argument called process_value which can launch an operation on the text within the HTML tag, but I'm unsure how I could "return a Positive link match" (like the regex expression given in the allow parameter would) and thus "add this link to the list of things to parse by the CrawlSpider object"
You’ll be able to do this in Scrapy 1.7.0 or later. See #3635.
The changes add a restrict_text parameter to LinkExtractor. From the master branch of the Scrapy documentation on LinkExtractor:
restrict_text (a regular expression (or list of)) – a single regular expression (or list of regular expressions) that the link’s text must match in order to be extracted. If not given (or empty), it will match all links. If a list of regular expressions is given, the link will be extracted if it matches at least one.
I have researched a lot on internet, but still not able to get exact solution for my problem. The problem mentioned in below link is much similar to mine, but this will not work.
If record exists then map, if not map another - Biztalk mapping
My Problem:
I have xml source file which has invoice information, something like this.
-<invoice>
<invoiceNumber>1124</invoiceNumber>
+<invoiceHeader>
+<invoiceDetails>
</invoice>
<invoiceNumber>1598</invoiceNumber>
+<invoiceHeader>
+<invoiceDetails>
</invoice>
It is possible that, sometimes the subnode ie. invoiceDetails information can be missing for few invoice Numbers,but header information will be present.
Output required:
The output file (.csv) should have all invoice numbers irrespective of detail tag present or not?
Add a Looping Functoid to the Map with the source link from and the target from the Looping Functoid linked to the row element in the Flat File Schema.
It will work, trust us. :)
It depends on how is the output schema, but solution must be easy only link tag to the output.
Behind the scenes, the XSLT produced is a loop for each tag and copy that value to the output, irrespective of detail tag present or not.
I have a list of a 100+ unique URLs and I want to use them as Triggers for a couple different audience tags. Sadly they have no common path thread that could be used to shorten/batch the process. Currently I am making each one into a path Trigger and then assigning it to the proper audience. It would be great if there was a way to upload a group of them to one trigger as an "or" type list, or to batch upload them as different triggers and then assign them to each tag individually.
After re-reading the question, this might be a good job for a lookup table. Input variable would be page path and you would be setting the output value to true if the path is matched by your urls (and false as the default value). Then all urls are in one place and you'd need to assign only one trigger to your tags.
Alternatively you can use 'Some Page Views' and 'matches RegEx':
Make a list of all URLs and remove the protocol, subdomain, domain name and slashes from them all (so 'https://www.example.com/my-page/' becomes 'my-page' without quotes).
Put all of these partial URLs into a list separated by |
So: |my-page-1|my-page-2|my-third-page|
Create a Page View trigger
Select 'Some Page Views', then...
Page URL, matches RegEx, [paste list of bar separated partial URLs]
Save
Google Tag Manager using matches RegEx to find generic 'terms of service' and 'about' style pages
We recently changed the directory structure for our website. For example the following page was listed changed as follows:
Previous: /degrees/msm-technology/
Current: /degrees/masters/ms-management/technology/
They are the same directory with the same pages just listed differently.
I'm attempting to do a custom filter Search and Replace to update the previous page so that there is not an interruption in data when tracking the pages. However it doesn't seem to be working.
Filter Field Request URI -
Search string ^/degrees/msm-technology/
Replace string: /degrees/masters/ms-management/technology/
The data for the previous page listed ends this week, while the other begins.
You need to escape the hyphen like this
Search string: ^/degrees/msm\-technology/
because GA filters use POSIX Regex.
[source: https://support.google.com/analytics/answer/1034836?hl=en]
Hope this helps.