Can we get the attribute names themselves from Sheets importxml

Can we get the attribute names themselves from Sheets importxml - web-scraping

I am trying to get a list of the data attributes from a website using Google Sheets IMPORTXML function. To clarify, I want the names of the attributes themselves as well as their values, and I don't need the text of the table itself.
Trying the website: https://thesilphroad.com/catalog
A sample of the code I want to extract from is:
<div class="pokemonOption sighted" data-nests="1" data-raid-boss="0" data-obtainable="1" data-released="1" data-shiny-obtainable="1" data-shiny-released="1" data-shadow-available="" data-shadow-released="1" data-pokemon-slug="bulbasaur" style="background-image:url(https://assets.thesilphroad.com/img/pokemon/icons/96x96/1.png), radial-gradient(#a9f712, #2ecc71);"><span>#001</span></div>
The formula I'm using is:
=importxml("https://thesilphroad.com/catalog","//div[#class='pokemonOption sighted']/#*
and it returns all the values of the attributes in 1 column, eg:
attributes
pokemonOption sighted
1
1
Bulbasaur
etc...
But what I need is the names as well, eg data-nests="1". Alternatively, what about just a list of the attributes of that , eg.
attribute names
class
data-nests
data-raid-boss
data-released
etc...
Does someone know how to extract this into Sheets using IMPORTXML or another method?
Thanks!

I am not so sure, but I think IMPORTXML uses XPath 1.0.
To get all attributes, you need XPath 2.0.
You may want to create a custom formula with Google Apps Script.
function pokemon(cell) {
const url = 'https://thesilphroad.com/catalog';
const response = UrlFetchApp.fetch(url);
const content = response.getContentText();
const results = content.match(/(<div class="pokemonOption sighted".+?<\/div>)/g)
/* ... */
}

Related

How to replace comma with a dot in GTM for JSON structured data?

I am noob with structured data implementation and don't have any code knowledge.
I have been looking for a week how to solve a warning with price in Google structured data testing tool.
My prices are with a comma which is not accepted by Google.
By checking the http://schema.org/price it tells me that "Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator."
I have a CSS variable element #PdtPrixRef named in a variable "Product-price" with a comma "12.5" but I can't find how to replace it in my structured data with the value "12.5"... Someone to help me?
Hereafter my actual script :
My actual GTM script
Should I add something to my script or making an VARIABLE (Custom Js)?
I think it's something like
value.replace(",", ".")
But I do't know how to write the full proper function from beginning to end...

Yes you can just create a Custom JavaScript Variable
Here is the code
function(){
var price = {{Product-price}};
return price.replace("," , ".");
}
Then using this variable to your JSON-LD script.

Read embedded data that starts with numbers?

I have embedded data that I have imported into Qualtrics use a web service block. The data comes from a .json file and reads something like 0.male, 1.male, 2.male, etc.
I have been trying to read this into my survey using the Qualtrics.SurveyEngine.getEmbeddedData method but without luck.
I'm trying to do something that takes the form.
let n = 2
Qualtrics.SurveyEngine.getEmbeddedData(n + ".male")
but this has been returning a NULL result. Is it possible to read embedded data that starts with a number?
Also see:
https://community.qualtrics.com/XMcommunity/discussion/15991/read-in-embedded-variables-using-a-loop#latest

The issue isn't the number, it is the dot. getEmbeddedData() doesn't work when the name contains a dot. See https://stackoverflow.com/a/51802695/4434072 for possible alternatives.

How to translate TextMeshPro-StyleTags to the actual RichText in Unity?

I have the following string in TextMeshPro: "<style=Title>This is a Title (...)".
I would like to translate the StyleTag to the defined Opening Tags.
For this example it would translate the string above to the following: "<size=125%><align=center>This is a Title (...)".
How can I do this?

You can get the OpeningTags to a StyleTag by calling the following function: TMP_StyleSheet.GetStyle("[StyleName]").styleOpeningDefinition (with TMP_StyleSheet being a reference to the used TMP-StyleSheet).
So a possible solution is to extract the StyleName from your string (e.g. "(...text) <style=Example> (text...)" would become "Example") and feed it to the function above. Regular Expressions can help to extract the StyleName from your string. Then replace the whole tag with whatever the function returns (e.g. "<size=125%>"). (Note: It returns Null if the tag does not exist). Then do the same with the closing tag.

How I can read multiple web addresses with "%" sign in address that block dynamic iteration with sprintf?

Code I used to scrape website with multiple pages uses sprintf function that iterate by changing url's dynamic part "%d" for pages. But recently website I scrape added into address some variables which has "%". So further I cannot scrape because it gives error mapping function I use with sprintf for these newly added % sign?
url_base <- "https://www.xxxxxx.com/girne?s-r=S&property_type=1&property=&min_price=&max_price=&currency=1&min_m2=&max_m2=&title-type%5B0%5D=1&page=%d&sort=mr"
map_df(1:10,function(i){
emlak <- read_html(sprintf(url_base,i))
fiyat <-emlak%>%html_nodes("#properties .price")%>%html_text()
alan <-emlak%>%html_nodes(".glyphicons-vector-path-square+ .detail-value")%>%html_text()
ilanno <-emlak%>%html_nodes(".fa-hashtag+ .detail-value")%>%html_text()
bolge <-emlak%>%html_nodes("#properties figure")%>%html_text()
data.frame(fiyat,alan,ilanno,bolge,stringsAsFactors = FALSE)
}) -> emlak_table3
Is there any way to define dynamic iterator other than "%"? I would like to use same procuedure to scrape website and download pages data

To insert a literal % in sprintf, use %%. I.e. sprintf('Your rate: %.1f%%', 31.4).
Thus, every place in your string where you need a literal '%', use two. Every place where you need to insert a value, use one.

Acquiring all nodes that have ids beginning with "ABC"

I'm attempting to scrape a page that has about 10 columns using Ruby and Nokogiri, with most of the columns being pretty straightforward by having unique class names. However, some of them have class ids that seem to have long number strings appended to what would be the standard class name.
For example, gametimes are all picked up with .eventLine-time, team names with .team-name, but this particular one has, for example:
<div class="eventLine-book-value" id="eventLineOpener-118079-19-1522-1">-3 -120</div>
.eventLine-book-value is not specific to this column, so it's not useful. The 13 digits are different for every game, and trying something like:
def nodes_by_selector(filename,selector)
file = open(filename)
doc = Nokogiri::HTML(file)
doc.css(^selector)
end
Has left me with errors. I've seen ^ and ~ be used in other languages, but I'm new to this and I have tried searching for ways to pick up all data under id=eventLineOpener-XXXX to no avail.

To pick up all data under id=eventLineOpener-XXXX, you need to pass 'div[id*=eventLineOpener]' as the selector:
def nodes_by_selector(filename,selector)
file = open(filename)
doc = Nokogiri::HTML(file)
doc.css(selector) #doc.css('div[id*=eventLineOpener]')
end
The above method will return you an array of Nokogiri::XML::Element objects having id=eventLineOpener-XXXX.
Further, to extract the content of each of these Nokogiri::XML::Element objects, you need to iterate over each of these objects and use the text method on those objects. For example:
doc.css('div[id*=eventLineOpener]')[0].text

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Can we get the attribute names themselves from Sheets importxml - web-scraping

Related

How to replace comma with a dot in GTM for JSON structured data?

Read embedded data that starts with numbers?

How to translate TextMeshPro-StyleTags to the actual RichText in Unity?

How I can read multiple web addresses with "%" sign in address that block dynamic iteration with sprintf?

Acquiring all nodes that have ids beginning with "ABC"

Categories

Resources