parsing html and following a javascript link - r

I have been asked to extract info by an academic colleague from a website where I need to link the content of a webpage in a table - not too hard with the contents of a text file which is only reacheable (as far as I can tell) by clicking on a javascript link... e.g.
<a id="tk1" href="javascript:__doPostBack('tk1$ContentPlaceHolder1$grid$tk$OpenFileButton','')">
The table is conveniently inside a table with id='tk1' which is nice... but how do I follow the link which pulls the text file.
Ideally I'd like to do this in R... I can grab the relevant table in text format by saying
u <- the url of interest...
library(XML)
tables = readHTMLTable(u)
interestingTable <- tables[grep('tk1', names(tables))]
And this will give the text in the table, but how do I grab the html for that particular table? and how do I "click" on the button and get the text file behind it?
I note that there is a form with massive hidden values - the site appears to be asp.net driven and uses impenetrable URLs.
Many thanks!

This is somewhat tricky, and not fully integrated in R, but some system()-fiddling will get you started.
Download and install phantom javascript: http://code.google.com/p/phantomjs/
Check the short script on http://menne-biomed.de/uni/JavaButton.html, which emulates your case. When you click the javascript anchor, it redirects http://cran.at.r-project.org/ via doPostBack(inaccessibleJavascriptVar).
Save the following script locally as javabutton.js
var page = new WebPage();
page.open('http://www.menne-biomed.de/uni/JavaButton.html', function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
var ua = page.evaluate(function () {
var t = document.getElementById('tk1').href;
var re = new RegExp('\((.*)\)');
return eval(re.exec(t)[1]);
});
console.log(ua);// Outputs http://cran.at.r-project.org/
}
phantom.exit();
});
With phantomjs on path, call
phantomjs javabutton.js
The link will be displayed on the console. Use any method to get it into Rcurl.
Not elegant, but maybe someones wraps phantomjs into R one day. In case the link to JaveButton.html should be lost, here it is as code.
<!DOCTYPE html >
<head>
<script>
inaccesibleJavascriptVar = 'http://' + 'cran.at.r-project.org/';
function doPostBack(myref)
{
window.location.href= myref;
return false;
}
</script>
</head>
<body>
<a id="tk1" href="javascript:doPostBack(inaccesibleJavascriptVar)" >Click here</a>
</body>
</html>

Have a look at the RCurl package:
http://www.omegahat.org/RCurl/

Related

Best way to retrieve image from server using ajax [duplicate]

Is it possible to reload an image with an identical file name from a server using jQuery?
For example, I have an image on a page, however, the physical image can change based on user actions. Note, this does not mean the file name changes, but the actual file itself.
ie:
User views image on default page
User uploads new image
Default image on page does not change(I assume this is due to the file name being identical, the browser uses the cached version)
Regardless of how often the code below is called, the same issue persists.
$("#myimg").attr("src", "/myimg.jpg");
In the jQuery documentation, the "load" function would be perfect if it had a default method of firing the event as opposed to binding a callback function to a successful/complete load of an element.
Any assistance is greatly appreciated.
It sounds like it's your browser caching the image (which I now notice you wrote in your question). You can force the browser to reload the image by passing an extra variable like so:
d = new Date();
$("#myimg").attr("src", "/myimg.jpg?"+d.getTime());
It's probably not the best way, but I've solved this problem in the past by simply appending a timestamp to the image URL using JavaScript:
$("#myimg").attr("src", "/myimg.jpg?timestamp=" + new Date().getTime());
Next time it loads, the timestamp is set to the current time and the URL is different, so the browser does a GET for the image instead of using the cached version.
This could be one of the two problems you mention yourself.
The server is caching the image
The jQuery does not fire or at least doesn't update the attribute
To be honest, I think it's number two. Would be a lot easier if we could see some more jQuery. But for a start, try remove the attribute first, and then set it again. Just to see if that helps:
$("#myimg").removeAttr("src").attr("src", "/myimg.jpg");
Even if this works, post some code since this is not optimal, imo :-)
with one line with no worries about hardcoding the image src into the javascript (thanks to jeerose for the ideas:
$("#myimg").attr("src", $("#myimg").attr("src")+"?timestamp=" + new Date().getTime());
To bypass caching and avoid adding infinite timestamps to the image url, strip the previous timestamp before adding a new one, this is how I've done it.
//refresh the image every 60seconds
var xyro_refresh_timer = setInterval(xyro_refresh_function, 60000);
function xyro_refresh_function(){
//refreshes an image with a .xyro_refresh class regardless of caching
//get the src attribute
source = jQuery(".xyro_refresh").attr("src");
//remove previously added timestamps
source = source.split("?", 1);//turns "image.jpg?timestamp=1234" into "image.jpg" avoiding infinitely adding new timestamps
//prep new src attribute by adding a timestamp
new_source = source + "?timestamp=" + new Date().getTime();
//alert(new_source); //you may want to alert that during developement to see if you're getting what you wanted
//set the new src attribute
jQuery(".xyro_refresh").attr("src", new_source);
}
This works great! however if you reload the src multiple times, the timestamp gets concatenated to the url too. I've modified the accepted answer to deal with that.
$('#image_reload_button').on('click', function () {
var img = $('#your_image_selector');
var src = img.attr('src');
var i = src.indexOf('?dummy=');
src = i != -1 ? src.substring(0, i) : src;
var d = new Date();
img.attr('src', src + '?dummy=' + d.getTime());
});
Have you tried resetting the image containers html. Of course if it's the browser that is caching then this wouldn't help.
function imageUploadComplete () {
$("#image_container").html("<img src='" + newImageUrl + "'>");
}
Some times actually solution like -
$("#Image").attr("src", $('#srcVal').val()+"&"+Math.floor(Math.random()*1000));
also not refresh src properly, try out this, it worked for me ->
$("#Image").attr("src", "dummy.jpg");
$("#Image").attr("src", $('#srcVal').val()+"&"+Math.floor(Math.random()*1000));
Using "#" as a delimiter might be useful
My images are kept in a "hidden" folder above "www" so that only logged users are allowed access to them. For this reason I cannot use the ordinary <img src=/somefolder/1023.jpg> but I send requests to the server like <img src=?1023> and it responds by sending back the image kept under name '1023'.
The application is used for image cropping, so after an ajax request to crop the image, it is changed as content on the server but keeps its original name. In order to see the result of the cropping, after the ajax request has been completed, the first image is removed from the DOM and a new image is inserted with the same name <img src=?1023>.
To avoid cashing I add to the request the "time" tag prepended with "#" so it becomes like <img src=?1023#1467294764124>. The server automatically filters out the hash part of the request and responds correctly by sending back my image kept as '1023'. Thus I always get the last version of the image without much server-side decoding.
Based on #kasper Taeymans' answer.
If u simply need reload image (not replace it's src with smth new), try:
$(function() {
var img = $('#img');
var refreshImg = function(img) {
// the core of answer is 2 lines below
var dummy = '?dummy=';
img.attr('src', img.attr('src').split(dummy)[0] + dummy + (new Date()).getTime());
// remove call on production
updateImgVisualizer();
};
// for display current img url in input
// for sandbox only!
var updateImgVisualizer = function() {
$('#img-url').val(img.attr('src'));
};
// bind img reload on btn click
$('.img-reloader').click(function() {
refreshImg(img);
});
// remove call on production
updateImgVisualizer();
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<img id="img" src="http://dummyimage.com/628x150/">
<p>
<label>
Current url of img:
<input id="img-url" type="text" readonly style="width:500px">
</label>
</p>
<p>
<button class="img-reloader">Refresh</button>
</p>
I may have to reload the image source several times. I found a solution with Lodash that works well for me:
$("#myimg").attr('src', _.split($("#myimg").attr('src'), '?', 1)[0] + '?t=' + _.now());
An existing timestamp will be truncated and replaced with a new one.
If you need a refresh of the exact URL and your browser has the image cached, you can use AJAX and a request header to force your browser to download a new copy (even if it isn't stale yet). Here's how you'd do that:
var img = $("#myimg");
var url = img.attr("src");
$.ajax({
url: url,
headers: { "Cache-Control": "no-cache" }
}).done(function(){
// Refresh is complete, assign the image again
img.attr("src", url);
});
Nothing else worked for me because while appending a token to the query string would download the new image, it didn't invalidate the image in the cache at the old URL so future requests would continue to show the old image. The old URL is the only one sent to the browser, and the server was directing the client to cache the image for longer than it should.
If this still doesn't refresh the image for you, see if this answer helps. For more information, here is documentation on the Cache-Control request header.
In the html:
foreach (var item in images) {
<Img src="#Url.Content(item.ImageUrl+"?"+DateTime.Now)" >
}
I simply do this in html:
<script>
$(document).load(function () {
d = new Date();
$('#<%= imgpreview.ClientID %>').attr('src','');
});
</script>
And reload the image in code behind like this:
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
image.Src = "/image.jpg"; //url caming from database
}
}

hide iframe url in HTML source code

How to hide iframe url From HTML source code?
<iframe src="http://mysite.com" frameborder="0" scrolling="no" width="728" height="90"></iframe>
You can use javascript to load the source, and it will not be visible in iframe url in page source code.
For example with jQuery:
<script type="text/javascript">
$(document).ready(function(e) {
$('iframe').attr('src','http://www.flickr.com/');
});
</script>
<body>
<iframe src="" />
</body>
Example here.
You can combine it with $.post to get the value serverside:
$.post('get-iframe-src.php', function(data) {
$('iframe').attr('src',data);
});
You can even load iframe itself to some element like:
$.post('get-iframe.php', function(data) {
$('#element_id').html(data);
});
etc. solutions are many, this is just one of.
You can't. If the URL isn't in the HTML, how would the browser know where to get it?
One thing you could try is to obscure it to make it slightly harder for someone to find it. You could have the src attribute be blank and then when the document is ready fetch the URL value from the server in a separate AJAX request and update the iframe tag to include that value in the src.
This would be a fair amount of work, however, and wouldn't really accomplish anything. The only thing it would prevent is somebody finding it by viewing the page source. They can still look at the "current version" of the HTML in any web browser's debugging tools. (Right click on an element and inspect it, which is nearly ubiquitous at this point.) Or any other normal traffic-sniffing tools will see it plain as day.
Ultimately, if the web browser needs to know a piece of information, then that information needs to be visible on the client-side.
There's no way to fully block source viewing. But there are a couple ways to disable right-clicking:
1) Javascript:
<script language="JavaScript">
<!--
var message="Your message goes here.";
function click(e) {
if (document.all) {
if (event.button == 2) {
alert(message);
return false;
}
}
if (document.layers) {
if (e.which == 3) {
alert(message);
return false;
}
}
}
if (document.layers) {
document.captureEvents(Event.MOUSEDOWN);
}
document.onmousedown=click;
// -->
2) Add the following into your tag: oncontextmenu="return false"
reference https://forum.powweb.com/archive/index.php/t-36161.html
I decided for solution that does not use javascript, because most of the time it will be possible to read the "hidden" content.
Moreover, changing iframe SRC with javascript, will keep URL hidden when checking the source. However, inspecting the code will show the real URL.
My code is in PHP; however, I believe that the logic can be translated to other programming languages. This is how it works:
I kept the iframe tag as usual:
<iframe src="dash_url.php"></iframe>
The trick is inside the iframe_url.php, where I validate the referer. If it is valid, page is redirected to iframe URL. If it is not, than URL will be a message.
<?
$iframe_url = "https://example.com";
$Referer = #$_SERVER["HTTP_REFERER"];
$RefererHost = #explode(":", explode("/", explode("//", $Referer)[1])[0])[0];
if ($RefererHost == $_SERVER["SERVER_NAME"]) {
header("Location: " . $iframe_url);
} else {
echo "Invalid URL";
}
?>
If visitor inspects the page or checks the source, iframe tag will show SRC as dash_url.php.

jQuery - Display URL of iFrame clicked

I have MySQL database with multiple URL's.
Each URL is displayed in one iframe, so 3 URL's = 3 iframes. I display them like this.
...
$resu = mysql_fetch_array ($consultation);
echo "<iframe src='".$resu['URL']."' onload='load(this)' id='iframe'></iframe>";
...
With this script I'm able to know the URL of the iframe clicked (not exactly, besides it doesn't work in jsFiddle but in my PC yes. (http://jsfiddle.net/7UxHv/)).
<script type='text/javascript'>
$(window).load(function(){
$('iframe#iframe').load(function(){
alert($('#iframe').attr('src'));
});
});
</script>
But really it displays the first database row and not the one clicked.
Anyway to display the one clicked with that script or another form to do this? Thanks.
edit:
Note you are generating multiple elements with the same id.. id should be unique.
This might be why you get only the first row to work.
I think the below code should work:
Html:
<iframe src='.$resu['URL'].' onload='loadIframe(this)' scrolling='no'></iframe>​
JS:
window.loadIframe = function(frame) {
alert("Loaded "+$(frame).attr('src'));
}
Try this... change 'src' to 'data-src'... and try the following...
echo "<iframe data-src='".$resu['URL']."' onload='load(this)' id='iframe'></iframe>";
$('#iframe').click(function() {
alert($('#iframe').data('src'));
});
also instead of using onload try using onclick()...
Might work...

Image load timeout in Internet Explorer

I have a page for an internal app that displays document images streamed from a document storage system using a web service. The problem I am having is that when a user does their search they may get hundreds of hits, which I have to display on one large page so they can print them all. This works fine in Firefox, but in IE it stops loading the images after a while so I get a hundred or so displayed and the rest just have the broken image symbol. Is there a setting somewhere that I can change this timeout?
If the issue is indeed a timeout, you might be able to work around it by using a "lazy load" script and adding new images to the document only after existing images have loaded.
There are a lot of ways to do this, but here's a simple example I threw together and tested. Instead of this:
<img src="image001.jpg" />
<img src="image002.jpg" />
<img src="image003.jpg" />
<img src="image004.jpg" />
<!-- Etc etc etc -->
You could do this:
<div id="imgsGoHere">
</div>
<script type="text/javascript">
function crossBrowserEventAttach(objectRef, eventName, functionRef)
{
try {
objectRef.addEventListener(eventName, functionRef, false);
}
catch(err) {
try {
objectRef.attachEvent("on" + eventName, functionRef);
}
catch(err2) {
// event attachment failed
}
}
}
function addImageToPage()
{
var newImageElement = document.createElement("img");
newImageElement.src = imageArray[nextImageNumber];
var targetElement = document.getElementById("imgsGoHere");
targetElement.appendChild(newImageElement);
nextImageNumber++;
if (nextImageNumber < imageArray.length) {
crossBrowserEventAttach(newImageElement, "load", addImageToPage);
crossBrowserEventAttach(newImageElement, "error", addImageToPage);
}
}
var nextImageNumber = 0;
var imageArray = new Array();
imageArray[imageArray.length] = "image001.jpg";
imageArray[imageArray.length] = "image002.jpg";
imageArray[imageArray.length] = "image003.jpg";
// .
// .
// .
// Snip hundreds of rows
// .
// .
// .
imageArray[imageArray.length] = "image999.jpg";
addImageToPage();
</script>
Each image is added to the page only after the previous image loads (or fails to load). If your browser is timing out, I think that will fix it.
Of course, the problem might actually not be a timeout, but rather that you're running out of memory/system resources and IE is giving up. Or there might be an IE DOM limitation like Sra said.
No final solution, but some hints...
I think the ie Dom hangs up. I,ve seen this in other cases. I needed simply to show the images and used a js which loads the image the time they came into focus, but that want work if you directly hit print I think. Can you use the new css ability to store imagedata directly instead of links. That should solve your problem. I am not quite sure but I think it is supported since ie 7
My guess is that you have to work around the IE setting, the easiest way to do it is simply not showing images that are not loaded or replacing them with a default image:
your html:
<img src="http://domain.com/image.jpg" />
your js:
$('img').load(function(){
// ... loaded
}).error(function(){
// ... not loaded, replace
$(this).attr('src','/whatever/default.jpg');
// ... not loaded, hide
$(this).hide();
});
That is a problem with microsoft. Unfortunately, this is a setting that would have to be changed on every single computer, as there is no remote way to alter it. To change it on your computer, try opening regedit and adding the RecieveTimeout DWORD with a Value of (#of minutes)*6000. Hope this helps-CodeKid1001
Edit: Sorry about that, I forgot to put in the file path:
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\InternetSettings
I used something similar to laod HD pictures as a background using ASP Pages
But i used jQuery to handle the images and its loading. This is a sample for 1 image but with a bit of tweaking you can load dynamically
myImage = new Image();
$(myImage).load(function ()
{
$(this).hide(); //Stops the loading effect of large images. can be removed
$('.csBackground li').append(this); //Append image to where you need it
$(myImage).show();
}).attr('src', settings.images[0]) //I pass an array from ASP code behind so 0 can be 'i'
.error( function { checkImages(); } ) //try and relaod the image or something?
So instead of changing the timeout- just try and reload the images on error.
Otherwise i only found a solution that is client specific (HTTP Timeout)
http://support.microsoft.com/kb/813827

jQuery $.get refreshing page instead of providing data

I have written some code using jQuery to use Ajax to get data from another WebForm, and it works fine. I'm copying the code to another project, but it won't work properly. When a class member is clicked, it will give me the ProductID that I have concatenated onto the input ID, but it never alerts the data from the $.get. The test page (/Products/Ajax/Default.aspx) that I have set up simply returns the text "TESTING...". I installed Web Development Helper in IE, and it shows that the request is getting to the test page and that the status is 200 with my correct return text. However, jQuery refreshes my calling page before it will ever show me the data that I'm asking for. Below are the code snippets from my page. Please let me know if there are other code blocks that you need to see. Thank you!
<script type="text/javascript">
$(document).ready(function() {
$(".addtocart_a").click(function() {
var sProdIDFileID = $(this).attr("id");
var aProdIDFileID = sProdIDFileID.split("_");
var sProdID = aProdIDFileID[5];
// *** This alert shows fine -- ProdID: 7
alert("ProdID: " + sProdID);
$.get("/Products/Ajax/Default.aspx", { test: "yes" }, function(data) {
// *** This alert never gets displayed
alert("Data Loaded: " + data);
}, "text");
});
});
</script>
<input src="/images/add_to_cart.png" name="ctl00$ctl00$ContentPlaceHolder1$ContentPlaceHolder1$aAddToCart_7" type="image" id="ctl00_ctl00_ContentPlaceHolder1_ContentPlaceHolder1_aAddToCart_7" class="addtocart_a" />
The easiest way is to tell jQuery not to return anything.
$(".addtocart_a").click(function(e){
// REST OF FUNCTION
return false;
});
Good luck! If you need anything else let me know.

Resources