Getting first image url using scrapy - web-scraping

I'm trying to get first url "https://example.com/example.jpg" using xpath #src but the results always /Content/images/defaultThumb.jpg
<img class="pimg" src="https://example.com/example.jpg" onerror="this.src = '/Content/images/defaultThumb.jpg'" title="test" alt="test" style="">

Your xpath was wrong, try this...
scrapy shell "https://www.trendyol.com/versace?qt=versace"
response.xpath('//img[#class="primg lazy"]/#data-original').extract()

Related

ASP.NET ResolveUrl returns only root

I am wondering why ResolveUrl() function removes href attribute value and whatever I pass as a URL string ultimately leads to http://localhost:PORT. For example:
SomeText
resolves to
<a href>SomeText</a>
[EDIT] A workaround that helped me, but didn't solve the actual problem:
I didn't put the URL in ResolveURL function. I added the runat="server" instead. Like so:
SomeText
Use single-quotes like this:
<a href='<%= ResolveUrl(#"~/Home.aspx?param=1") %>'>SomeText</a>
Please mark as correct answer if this helped you :)

How to get verification code using css or xpath in Selenium Webdriver

<body>
<div class="row-fluid">
<div class="span12">
<div class="mailview" style="margin-right:18px;">
<p>Dear MohanNimmala First,</p>
<p>Thank you for registering with MediAngels!</p>
<p>
<p>
Verification Code:
<b> 95527</b>
Can someone help me to get verification code as 95527 from above
using xpath or css?
I am using following xpath=html/body/div[1]/div/div/p[4]/b
Based only on the HTML snippet posted in question, you may want to try the following XPath :
//p[contains(text(), 'Verification Code')]/b
I think you are setting yourself up for problems later if you nest the 'verification code' so deep in the HTML. When someone changes the layout/html, the test is very likely to fail.
I suggest giving the b tag an ID and to use that ID in your selector. Your test will be more resilient.
Add the HTML attribute to the b-tag:
Verification Code:
<b id="verification-code"> 95527</b>
....
And the xpath
//*[#id='verification-code']
The issue with your xpath is probably that you're missing a slash at the beginning. One slash starts from the root, two slashes starts anywhere.
/html/body/div/div/div/p[4]/b
Thanks Deef& Har, for your answers below code works to extract value from innterhtml tags
driver.switchTo().frame("rendermail");
WebElement OTP=driver.findElement(By.tagName("b"));
System.out.println(OTP.getText());

Image Hyperlink in ASP.NET - MVC 4

I try to create a project for the first time in asp.net (mvc4).
and what i try to do is to create a image which is a hyperlink to go to the index page.
i have search a lot of things and it shows very simple to do that.
but i can´t understand why it doesn´t work for me.
someone can give a hand?
Code:
<a href="<%= Url.Action("Index","Home")%><img src="~/Content/imagens/nav-arrow-back.png"/></a>
The Action is "Index" in the controller calls Home.
you miss a quote
<a href="<%=Url.Action("Index","Home")%>"> ...
^
about this quote you missed
For bad request, fix the whole <img> part
<img src="<%=Url.Content("~/Content/imagens/nav-arrow-back.png")%>"/>
First up, as previously noted you're missing a closing quote on that href. Second, MVC 4 doesn't use the <% %> syntax, at least not by default; it should be using Razor v2 which uses #, so your code should look like this:
<img src="~/Content/imagens/nav-arrow-back.png"/>
If you use the old syntax I assume it would try to handle the actual text <%= Url.Action("Index","Home")%> as a URL, which clearly won't work.

ASP.NET site move to IIS7 results in gibberish characters in page output

I have an ASP.NET site that was working fine running on Windows Server 2003 / IIS6.
I moved it to Windows Server 2008 / IIS7 and the aspx page output now includes gibberish text.
For example:
p����
�����
The majority of the page renders properly, but there is gibberish here and there.
I have checked the event logs and there is nothing.
Any idea what's going on here?
How can I fix this?
I have noticed that this issue shows up when I include multiple Server.Execute statements in the aspx code:
<% Server.Execute("/inc/top.inc"); %>
<% Server.Execute("/inc/footer.inc"); %>
The .inc files above contain just html. It appears that the files have to be of a significant length to cause the error. Here is the sample html I've been testing with:
<div class="logo">
<a href="/">
<img src="/logo.png" alt="logo" width="31" height="29" class="logoimg" />
</a>
</div>
<div class="logo">
<a href="/">
<img src="/logo.png" alt="logo" width="31" height="29" class="logoimg" />
</a>
</div>
<div class="logo">
<a href="/">
<img src="/logo.png" alt="logo" width="31" height="29" class="logoimg" />
</a>
</div>
<div class="logo">
<a href="/">
<img src="/logo.png" alt="logo" width="31" height="29" class="logoimg" />
</a>
</div>
<div class="logo">
<a href="/">
<img src="/logo.png" alt="logo" width="31" height="29" class="logoimg" />
</a>
</div>
<div class="logo">
<a href="/">
<img src="/logo.png" alt="logo" width="31" height="29" class="logoimg" />
</a>
</div>
Also, the gibberish characters appear inconsistently. If I ctrl+F5 the pages, the gibberish characters change and occasionally don't appear at all.
I would bet the problem is that what you're seeing is the regular error page, gzip-compressed. However, the gzip compression HTTP header got lost when the server was redirected to the error page, so the browser doesn't know to uncompress it. Do you have some custom module that is doing compression? Are you setting the Response.Filter?
http://forums.asp.net/p/329153/330330.aspx contains a discussion of a similar issue, I wonder if it's the same problem you're seeing. Here's an excerpt from rox.scott's answer:
if you are transfering execution of the page after Response.Type, etc., is set then the resulting Response will have the Response.Type and encoding set by the initial page -- which might not be compatible with characters on the second page.
solution: make sure you are correctly specifying Response type and encoding on BOTH pages.
Want to try this and see if it works?
If that doesn't work, http://msdn.microsoft.com/en-us/library/39d1w2xf.aspx has an interesting discussion of various configuration options you can try to force consistent encodings throughout your site. You may want to try some of those. Also, that MSDN article does not use the ContentType directive but instead recommends this:
<%# Page RequestEncoding="utf-8" ResponseEncoding="utf-8" %>
Not sure if that will generate equivalent results as adjusting ContentType, but it's easy enough to try.
Also you can check globalization element in web.config
It must be in system.web section:
<globalization
requestEncoding="utf-8"
responseEncoding="utf-8"
fileEncoding="utf-8"
responseHeaderEncoding="utf-8"
/>
Pop in to Firefox and try manually switching the page encoding (maybe to Windows-1252 first), see if the gibberish clears up. If it's suddenly readable, then at least you will know it's an encoding problem.
I'd also suggest looking at the output in a hex editor to see what you're actually getting. That could also give you a clue where to look. Might also try turning off gzip encoding or page compression.
this has nothing to do with encoding. This is IIS not knowing how to handle error messages if you are using it in combination with a virtual directory. My guess is that if you run this site in local debug on a machine, the error messages show up fine. Try that, if so, then you can start to infer where the actual error is.
We could never get this resolved.
The only solution that worked was to eliminate use of Server.Execute().
Try setting charset parameter.
<% Page ContentType="text/html; charset=utf-8"%>
This is a bit of a stab, but try setting your pipeline mode to "classic" in IIS.
You are probably sending a different encoding than the encoding of the .inc files.
Check the encodings of your .aspx and .inc files, and check the charset parameter of the content-type header being sent to the browser.
EDIT: Since the server is sending UTF-8, you should convert your .inc files to UTF-8.
To do that, open the file in Visual Studio, click File, Advanced Save Options, and select Unicode (UTF-8 without signature) - Codepage 65001. (Near the bottom of the list)
In the Windows Event Viewer you most probably will be able to find the real cause of the error. That way you can fix it.
Unfortunately you require RPD / physical access to the server, and it's only solving the error. Not the cause of the error not appearing.

ASP.NET Img tag

I have a problem with how ASP.Net generates the img tag.
I have a server control like this:
<asp:Image runat="server" ID="someWarning" ImageUrl="~/images/warning.gif" AlternateText="Warning" />
I expect it to generate this:
<img id="ctl00_ContentPlaceHolder1_ctl00_someWarning" src="../images/warning.gif" />
but instead it generates this:
<img alt="" src="/Image.ashx;img=%2fimages%2fwarning.gif"</img>
This give me errors when I execute the following js:
document.getElementById('ctl00_ContentPlaceHolder1_someWarning')
Any idea why it won't generate the expected html?
Looks like it's trying to use a custom handler (ashx) to deliver the image. Do you have any additional modules that may be overriding the default behaviour of the asp:Image?
Your JavaScript won't work because the image tag has not been given an ID in the HTML that was generated.
You can get the actual ID that is generated by using ClientID. I use this to get the ID of a control for use in JavaScript using syntax similar to the following:
document.getElementById('<%=ddlCountry.ClientID%>').style.display = "block";
However you can also use it in your code-behind to get the same thing.

Resources