Problem with getting specified image link using importxml - web-scraping

I am a newbie to importxml and I am having trouble scraping Product data using importxml to Google Spreadsheet.
the image element on the webpage is as the following:
<div class="pd-img"><img src="https://img-trendyol.mncdn.com/Assets/ProductImages/oa/47/4778846/1/1032019101285_2_org.jpg" alt="" style="width: 78px; height: 114px; min-width: 78px; min-height: 114px;"></div>
when I try to import "//div[contains(#class,'pd-img')]/img/#src"
it doesn't return the image link at all
after reading the page source I figured out that this xml query:
"//div/img/#src"
would return the link it gave me the link but duplicated and with the 4 previous statements all together (6 total cells)
Product link I am working with: https://www.trendyol.com/u-s-polo-assn/erkek-gomlek-g081sz004-000-855736-p-4778846?fbclid=IwAR1pOVpTNOyelKsgVpTQZJ0FRrb_37R-HlI_gm0XWb_ka9RaPGTO8JZZpZc
what I explicitly need is an importxml function that will only return the product image solely from the product page.

try maybe:
=QUERY(IMPORTXML(
"https://www.trendyol.com/u-s-polo-assn/erkek-gomlek-g081sz004-000-855736-p-4778846?fbclid=IwAR1pOVpTNOyelKsgVpTQZJ0FRrb_37R-HlI_gm0XWb_ka9RaPGTO8JZZpZc",
"//div/img/#src"),
"where Col1 starts with 'http' limit 1", 0)

Related

Scraper : How do we download the image within Div Class that has url openning the image

How do we capture the image that has the following code ? since it is different from general format where i only capture the src and img.
<div class="avatar" style="background-image:
url(customavatars/545/5453285_1410924046.jpg);background-size: cover;background-repeat: no-repeat;background-position: center;width: 120px;height: 120px;border-radius: 70px;margin: 10px 0 10px 10px;" onmouseover="showUserInfoBox('userinfo234340519')" onmouseout="clearShowUserInfoTimer()"></div>
Original web site
Visit https://mobile.uwants.com/viewthread.php?tid=19780494&extra=page%3D1
The image : Visit https://imgur.com/Gbckna1
Thanks
Since the image you want to get is defined as a css background, you'll need to extract the contents of the style attribute, and parse the css you get from that.
You can try to do this yourself, but it would probably be easier to use an existing library, such as cssutils or tinycss2.
You can use re_first()
response.xpath('//div[#class="avatar"]/#style').re_first(r'url\([^\)]+')

Create shortcode in Wordpress that disables wpautop() on wrapped content?

Looking to place a graphic on the same line/row as an h3 title like this:
<h3 style="display:inline;">Graphic Advantage</h3><img style="max-width: 50px; width: 100%;" src="https://sgdesign.com/images/SGDadvantage.png" alt="SGD Advantage" />
First I make the h3 display = inline so it only occupies it's actual width rather than the full row. I could float the h3 and the graphic left and then do a clearfix but it seems excessive. The goal is to simply place both the h3 and small graphic on the same line.
It fails because of Wordpresses auto formatting function wpautop() which sets automatically, a paragraph tag < p >
But yet this works when the secondary content (graphic) is wrapped in a shortcode:
<h3 style="display: inline;">Advantage SGDesign</h3>
[tooltip text="Throughout our site you'll see this icon that will help identify significant differences between SGDesign and other companies"]<img style="max-width: 50px; width: 100%;" src="https://sgdesign.com/images/SGDadvantage.png" alt="SGD Advantage" />[/tooltip]
QUESTION: Best practice to thwart the wpautop() function within a wrapped tag of some type?
Maybe creating a null shortcode like
function no_wp_autoformat_shortcode() {
return null;
}
add_shortcode('nowpautop', 'no_wp_autoformat_shortcode');
This will work but now it hides the wrapped < img ...
So the question changes to how to make an image show when wrapped by a Shortcode ?
<h3 style="display:inline;">Graphic Advantage</h3>[nowpautop]<img style="max-width: 50px; width: 100%;" src="https://sgdesign.com/images/SGDadvantage.png" alt="SGD Advantage" />[/nowpautop]
There is a way to stop Wordpress editor to automatically add in <p> element using remove_filter( 'the_content', 'wpautop' );. The down side of this will disable all the <p> elements in wordpress which is not good when comes to paragraphing your text. (It also gets problematic when people cut and paste text from word documents...)
Yes using shortcode seems good to me without having to overwrite the Wordpress core functionalities
Solved by creating a Shortcode that uses the linked file src as an Attribute. Details here:
How to create a Wordpress Shortcode that does nothing?
The primary problem was that Shortcodes REPLACE content so could not simply "wrap" an element to create the desired effect of disabling the auto format wpautop() function.

Dead link in an absolutely positioned div

I'm trying desperately to make the link on a linked image work but whenever I click on the link, my browsers (Chrome & Firefox) just tell me they are connecting to the page and I get a continuous spinning image as if the browser were doing something. The link works perfectly if I open in another tab.
The div in question is coded and styled in a Wordpress page. I am making changes to another developer's work for a customer.
Here's the div:
<div style="position: absolute; top: 645px; left: 80px; width: 150px;">
<a href="http://themoneycouple.com/?p=4681" target="_top" style="display:inline-block">
<img src="http://themoneycouple.com/wp-content/uploads/2013/10/kit-history.png" alt="Toolkit History Link" width="150" height="170" style="position:relative; display:block; z-index:999999999999;" />
</a>
</div>
And here's the page it's on: http://themoneycouple.com/resources/love-and-money-kit/
I've tried changing the z-index on every element in multiple ways, tried changing the link target in every way possible, changed display settings, etc.
One last note, there are lots of iframes on this page and I'm not sure if those are messing things up somehow. Massive thanks for any help.
It looks like the URL for that HREF is trying to load a page as a modal using the fancybox protocol. Is this what is expected?
Does that page ID exist? It looks URL looks different from the other page URLs on the site.
There is a jQuery function that targets every IMG tag w/ an A tag and applies a class of "fancybox". Modify the specificity of this function to exclude the last item in the gallery. Better yet, create a new class and add it to the elements that should have the fancybox functionality. The same issue happens on other pages where the HREF goes to a page instead of opening an image.
Current jQuery function:
var thumbnails = jQuery("a:has(img)").not(".nolightbox").filter( function() { return /\.(jpe?g|png|gif|bmp)$/i.test(jQuery(this).attr('href')) });
thumbnails.addClass("fancybox").attr("rel","fancybox").getTitle();
I think I found the link you mean and it has a fancybox class on it, so I presume it is trying to open a fancybox javascript plugin popout for your link.. which is a redirect and failing to do so, since I can right click the link and go to open in new tab and it works.

how to change content in iframe in asp.net [VB]

i have a problem
On my website i got iframe
<iframe src="www.blog.com" style="height: 340px; width: 491px"></iframe>
www.blog.com is outside page that contains user posts. I need to change some words (like swearing) into "#$%#"
is that possible and how? thank you.
or just how to remove that div or a like adBlock.. AdBlock removes:
<A class="slick_grey bolded subscribeToNewspaper" href="javascript:;" >
and also
A id="2291109" class="vote_1" href="javascript:;"
Thanks
You could use a div instead of a iframe and load the data through ajax that whay you could manipulate the content before you show it to the user
something like
$.get('ajax/test.html', function(data) {
data = data.replace("Luck","#$%#");
$('.result').html(data);
});

How to switch to dynamically named iframe with Behat/Mink

Writing test for an iFrame generated when a button is clicked. The iFrame name and id are similar but generated dynamically. I have used "switchToiFrame" successfully before.
<div class="bbbb-frame-container" style="height: 400px; width: 665px; margin-top: -200px; margin-left: -332.5px;">
<div class="user-support-frame-close-container" style="display: block;">
<div class="user-support-frame-close">Close</div>
</div>
<iframe id="poplock_default9636_priv" frameborder="0" name="poplock_default9636_priv" src="blah.blaag.com">
<!DOCTYPE html>
.....................
The poplock_defaultNNNN_priv is the dynamically generated iFrame that I would like to switch to..
First, find an iframe with one of the find methods. If it's the only iframe on the page (or the first one) use the find() method. If it's not, you'll have to use findAll() and rely on the order (since there's no distinct attribute you could search for).
Once you find your iframe, you can use getAttribute to get its name and then use switchToIframe to... well, switch to the iframe.
Thank you Jakub Zalas you inspired me for testing Stripe PopUp code which is embeded into an iframe.
Just for sharing my experience on testing a Stripe Payment PopUp:
$page = $this->getSession()->getPage();
$this->getSession()->getDriver()->switchToIFrame('stripe_checkout_app');
$page = $this->getSession()->getPage();
/** #var NodeElement $stripeInputField */
$stripeInputField = $page->findField($field); // where $field can be: 'Email, CVC, Card Number, MM / YY'
$emailInput->setValue($value);
// Switch Back to Main Window
$this->getSession()->getDriver()->switchToIFrame(null);
For whom it may help, to do the lookup for the dynamically named iframe I'm using the CSS selector on a known prefix:
// Switch to the payment iframe.
$iframe = $this->getSession()->getPage()->find('css', 'iframe[name^="__privateStripeFrame"]');
$iframe_name = $iframe->getAttribute('name');
$this->getSession()->switchToIFrame($iframe_name);
The CSS selector ^= refers to "starts with ..." - instead, you can use *= for "contains ..." if it is not a prefix.
This example is to get the Stripe 3D Secure popup.
If you need to simply manipulate the iFrame, see this question. The answers, for example, present Javascript to get the first iFrame so you can manipulate it:
window.frames[0].document.body.<your action here>
They also give jQuery examples for arbitrary iFrames:
<iframe id="my_iframe" ...></iframe>
$('#my_iframe').contents().find('html').html();
While those aren't exactly what you've asked for, you could alter them to find the n'th iFrame, if you know the number of iFrames generated and which you'd like to manipulate, and then manipulate it as need be.

Resources