fastest way to get all links and images from a webpage? - wordpress

So this isn't relly a problem but more like automate thingy...
I built a website and had to copy loads of content from previous webpage. I did that by copy-pasting the content from old page to the new page made with wordpress.
All link and images in the content still point to the old page. So I'd like to find something like a webscraping tools which would analyze list of selected links and then output would be all link pointing outside of my webpage and list of all images that I have to download

Considering that your old and new websites are going to have the same URL structure, here is a bookmarklet that you can save as a bookmark to your toolbar.
To make your job easy, open an old website page, and simply click on the bookmarklet button you've saved (code below). This code will replace the links from old website to new website. The images will be treated similarly. Next, you can copy the updated content and paste it into the editor of your new website (wordpress admin).
On the developer's console (F12 key), you will get a list of all the images that you have to download.
javascript:(function(){
var jqscript = document.createElement('script');
jqscript.onload = function() {
// treat the <a> tags
jQuery('#my-content-container').find('a[href^="http://my-old-website.com"]').each(function(i, anchor) {
jQuery(anchor).attr('href', jQuery(anchor).attr('href').replace('http://my-old-website.com', 'http://my-new-website.com/new-directory'));
});
// treat the <img> tags, and make a list of images to download
var images_to_download = [];
jQuery('#my-content-container').find('img').each(function(i, image) {
images_to_download.push(jQuery(image).attr('src'));
jQuery(image).attr('src', jQuery(image).attr('src').replace('http://my-old-website.com', 'http://my-new-website.com/new-directory'));
});
// output a list of images to the developer console
console.log(images_to_download);
};
jqscript.src = "//ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js";
}());
P.S. To save this bookmarklet code, rightclick the toolbar of your browser and create a new bookmark, and enter the above code as the Location/URL.

This is just an option you should think about: You could use absolute path instead of Relative path, this will help you reuse code without to have to remap every link in it.
Relatif Path :
Read about my Tahiti vacation.
Absolute path :
Read about my Tahiti vacation.

Related

Programmatically added stylesheet ASP.NET (VB)

I'm currently programmatically adding a stylesheet to my content page using VB:
Dim link As New HtmlLink()
link.Attributes.Add("rel", "stylesheet")
link.Attributes.Add("type", "text/css")
link.Href = "Styles/AddNewModelStyles.css"
Me.Page.Header.Controls.Add(link)
At first glance, this seems to be working as my web page is formatted properly. However, I added new styling to the stylesheet and nothing happened. I completely removed all styling from the stylesheet, refreshed the page, and nothing changed. It seems to me like the original stylesheet is being cached, but I can't figure out how to clear the cache. I've tried F5, Shift+F5, closing browser and opening new, all to no avail. Anyone have any ideas why this is happening?
Internet Explorer Go inside your browsers tools and select -> Internet Options -> General -> Browsing History -> Settings -> View Files
Select all of them and delete them and reopen your browser.
Google Chrome Ctrl+Shift+Del -> Select from the beginning of time in the drop down -> Clear Browsing Data
I was able to figure out the solution to my problem by using Inspect element to find where the stylesheets were being loaded in. When you programmatically add a stylesheet to a content page, the stylesheet link gets added to the master web page head tag, not the content page head tag. My stylesheet was ultimately in the wrong directory (the stylesheet directory of the content page, NOT of the master page), and moving it to the master page's directory solved the issue.

How to reload single file in chrome developer tools

I'm working on a complicated site that has a lot of css files and js files that load on every page. I'm working on a single css using Chrome's developer tools. Once the css is mostly correct in developer tools, (Element tab, Styles side bar), the css is copied to a local css file and then uploaded to the web server. Since only a single css file has been modified it would be faster to reload a single css file instead of hard refreshing and reloading the entire site including images, js, and css, etc.
The site has an option to minify the css file and combine it with the other css files, creating one single very large css file. That option is turned off while in development mode. Adding a version number to the css file name isn't the trick I'm looking for.
Is it possible in Chrome Developer tools to click on a source file and refresh only that file?
This is a bit of a hack, but I think it'll work for your scenario.
When I initially load an example page, you can see three CSS requests:
I want to refresh the devsite-googler-buttons.css file, so I find it in my DOM Tree:
(Command+F on Mac or Control+F on Windows / Linux opens up that search panel at the bottom of the Elements panel... makes it easier to find stuff in a big DOM)
Right-click, select Edit as HTML, and then append a random query string to the end of the link:
And in the Network panel, you can see that the file was re-downloaded:
See also: Konrad's answer provides some handy code for automating this via a Snippet.
It might be handy, in your situation, to automate it a bit:
function reloadCSS() {
const links = document.getElementsByTagName('link');
Array.from(links)
.filter(link => link.rel.toLowerCase() === 'stylesheet' && link.href)
.forEach(link => {
const url = new URL(link.href, location.href);
url.searchParams.set('forceReload', Date.now());
link.href = url.href;
});
}
reloadCSS();
What this function does is it forces all CSS files to be reloaded by appending current time to their URLs.
You can modify it to target a specific file. You can run it from console, via DevTools 'snippets' functionality or make it into an extension.
If you don't mind refreshing the page, but don't want to re-download all resources, try the following.
Open the css file in a new tab. (You can right click css files from the Chrome developer tools and choose "open in new tab");
Hard-refresh this tab (ctrl/cmd + f5);
Soft-refresh the page (f5 or ctrl/cms + r).
According to me only Live editing is the only possible way what you are looking for I suppose. There is no way to refresh a single css file.

Prevent iframe from adding entries to browser's history with links inside a pdf document

I cannot figure out any way to prevent that behavior with hyperlinks inside a PDF document. Is there any way to achieve this?
We cannot replace iframe with object or embed tag.
# Jared Farrish: Nope, we want to prevent links in a PDF document being added to browser's history. For example, a PDF document is loaded in an iframe, it also contains links to other pdf documents. When users click on a link, new pdf one is loaded into target iframe, also an entry is added into browser's history. I need to prevent that.
#GolezTrol: Because it's the requirement, if you place it in an embed or object tag, when clicking on a links, your site will be redirected to another new pdf document. I need to load a new pdf document without navigating away the current web page.
At least in my test using FF4 and Chrome (latest), using a Javascript appendChild did not add it to the history.
var iframe = document.createElement('iframe');
iframe.src = "https://web3.unt.edu/riskman/PDF/UNTSafetyManual__Revised__3-2-06.pdf";
document.body.appendChild(iframe);
http://jsfiddle.net/ZML3u/

How do I get a chrome extension to load an iframe containing a local page

I'm writing a chrome extension where the standard popup page is used as a menu and I add a iframe at the bottom of the page to display some output. the display.html page contains the output I intend to display in the iframe appended to the page. This code inside my content script appends the iframe but it searches for a display.html page on the webserver rather than in the code packaged with the extension. Is there some way for me to get it to load my display.html page rather than one that may or may not be there on whichever page the extension is used on.
ifrm = document.createElement("iframe");
ifrm.setAttribute("src", "display.html");
ifrm.style.width = "100%";
ifrm.style.height = "20%";
document.body.appendChild(ifrm);
Updated answer
Per #Cnly's comment, please use chrome.runtime.getURL to get the URL of the embedded resource. (The original answer is extremely old, predating even manifest v2).
Original answer
I think you may want chrome.extension.getURL to get the URL of the embedded resource.

How do I search an iframe for a specific image or grab the source code

My main goal is to search an iframe for a specific image. I know what the image will be (abc_clicked.gif) but am not sure how I can access the iframe to either:
1) search the iframe for the image itself
2) grab the source code in which I will manually search myself for the image
I am looking trying to accomplish this with javascript, as I don't see how PHP could help me at all in this case.
Anyone have any ideas???? I'm lost....
If the iFrame is hosted on the same domain, you can access the DOM the same as you would for the main page using contentDocument.
For example:
var iframeElement = document.getElementById('myiframe');
var imageElement = iframeElement.contentDocument.getElementById('myImage');
(assuming you're working in a Web page and looking for a JavaScript solution)
If the iframed page is in a different domain, there's not much you can do.
If it's in the same domain, here is a cross-browser way to access it's content:
var doc=ifr.contentWindow||ifr.contentDocument;
if (doc.document) doc=doc.document;
You can then search your iframe:
var imgs = doc.getElementsByTagName("img");
// etc.
Your second option is also valid (but might be more complicated), use ajax to retrieve and parse the page source.

Resources