How to find out upload/post time of an special website URL? - web-search

Often when searching for information i hit the problem, that the author of an article/website/blog post doesnt give out a date.
Is there any way (maybe special meta search engine, web-archives, use of google search operators to find out at least on which month & year a website URL was uploaded?
thx

puttin
javascript:alert(document.lastModified)
in the adress bar of a browser with loaded page pops up a date and time. Where this time data is coming from i have no idea, probably time html or php file was created on server. On the other way i thought javascript cannot access filesystem, but im no expert...
Still curious if someone knows a reliable method of finding out when a specific .html site was created as i find it useful for enquiry.

Related

Facebook shares not using og:url when clicked in Facebook?

One of the purposes of og:url -- I thought -- was that it was a way you could make sure sessions variables, or any other personal information that might find its way into a URL, would not be passed along by sharing in places like Facebook. According to the best practices on Facebook's developer pages: "URL
A URL with no session id or extraneous parameters. All shares on Facebook will use this as the identifying URL for this article."
(under good examples: developers.facebook.com/docs/sharing/best-practices)
This does NOT appear to be working, and I am puzzled as to either -- how I misunderstood, and/or what I have wrong in my code. Here's an example:
https://vault.sierraclub.org/fb/test.html?name=adrian
When I drop things into the debugger, it seems to be working fine...
https://developers.facebook.com/tools/debug/sharing/?q=https%3A%2F%2Fvault.sierraclub.org%2Ffb%2Ftest.html%3Fname%3Dadrian
og:url reads as expected (without name=adrian).
But if I share this on facebook -- and then click the link. The URL goes to the one with name=adrian in it, not the og:url.
Am I doing something incorrectly here, or have I misunderstood? If the latter, how does one keep things like sessions variables out of shares?
Thanks for any insight.
UPDATE
Facebook replied to a bug report on this, and I learned that I indeed was reading the documentation incorrectly
developers.facebook.com/bugs/178234669405574/
The question then remains -- is there any other method to keeping sessions variables/authentication tokens out of shares?

Retrieving relevant posts from Wordpress blogs

I have a requirement to write a program in Java to retrieve all the posts from all the wordpress sites containing a keyword(s).
This is how I approached the problem. I initially thought I would crawl the wordpress sites looking for the keywords I am interested in. But I realized if there is an endpoint for wordpress search, it makes my job a lot easier. So I have looked around to see if there is any search endpoint to submit queries and get the links for the posts.
All I found is just http://wwww.en.search.wordpress.com. I can still tweak the url and get some links. But
I like to know if there is any better way to handle this problem
The search link I posted is for the users and it might be limiting my search results since I query it through a program
Also I like to retrieve posts from the given date range. I am not sure if this is possible with my approach.
Appreciate any help in this regard. Thank you.
How about this approach:
Assuming you don't need to go back to the history and scrap all the data I would just stick to tags
http://en.wordpress.com/tags/
I would crawl it every day get the most popular tags (by font size) then on each tag get the articles published in the past 24 hours
On each post get all the comments and search for your keywords
Would that work? if not please share more details
Good luck

How to refresh facebook scrape cache for a whole site

I need to re-scrape facebook's cache for every page in my web site (3000+ pages)
The only way i know how to do that is too tough Open graph debugger
I Cannot run this with 3000...
I read From Facebook developer support page that this (StackOverflow) is the place to ask questions but there is little to none knowledge about refreshing facebook url cache
Can you please suggest any working solution to re-scrape a page?
my web site: Mentallica
One possible answer, given the number of URL's you've got, is to use the batch invalidator. You could go to an access list of your URL's, or maybe do a recursive directory listing and replace the folders with URL's (if it's a flat site), or the like. At least, you don't have to do them one at a time. Once you have a list, paste the list into the invalidator (multiple lines).
The batch invalidator is here:
https://developers.facebook.com/tools/debug/sharing/batch/
It IS frustrating. I've searched several places, and don't really see a solution. We have a website with all of the proper tags, yet FB refuses to refresh any past posts with the new website data.
3 years later, but this can help someone: Paste yout url here and click fetch new scrape information: https://developers.facebook.com/tools/debug/og/object/

Find the actual page opened from the URL in the asp.net

HI so i keep running across websites which when looked through or searched (using their own search function) return's a static URL ie.) ?id=16 or default.aspx no mater what page of the website you visit after the search has been performed. This becomes a problem when i want to go directly to a post/page within one of these sites so i'm wondering. If anyone knows How could i actually find out what the absolute URL is.
So that i can navigate straight to it. I'm not really familiar with coding but have tried looking in the page source but i wasn't really able to gleam anything from there.
The basics around asp.net urls: http://www.codeproject.com/Articles/142013/There-is-something-about-Paths-for-Asp-net-beginne
It all really depends on what you're trying to find, as far as finding a backway to locate a absolute path, is highly doubtful. If the owner of the site(most blogs) want you to have a perma link to a page, they use url-rewriting for putting things in the URI like title page and such. Alot of MVC sites do this now.
The '?id=16' you're seeing is just a query string, a holder for other logic they are doing.

Flex 3: Project Architecture & SEO

I've got a Flex 3 project. One of the problems I have is that not very much of its content is indexed by Google. Currently, I pull data from a mySQl database, so the Googlebot doesn't see most of the site.
My goal is to increase the amount of content indexed by Google, improve the SEO, and improve SERPs.
I thought that instead of pulling the data from the database that I would change the project's architecture and create separate "pages". So, in my case, I would compile each puzzle separately and upload it to the server in its own directory. This way the info in each puzzle would get indexed.
The negative is that if I add a puzzle, I'd have to add a link to it in all of the puzzles that are already on the server. I would have to add the link, re-compile each puzzle and upload it to the server. Is there a way to get around this problem? Also, if I wanted to communicate some data from one puzzle to another in the future, I wouldn't be able to do so.
Any suggestions?
Thank you.
-Laxmidi
The usual way to achieve this goal is to develop a hidden parallel site in HTML.
On the first page you will have your flash and, hidden by javascript, a list of links to the other pages. These links will be parsed by the robots. Ideally, the href pages are virtual (look for "url rewriting"). On each "fake" page, your server-side language will print on the page a content or links from your database AND the flash. The flash will be provided with a string explaining where it is and what it's supposed to show.
Ex: http://www.mysite.com/category1/content7 The URL rewriting sends this request to http://www.mysite.com/index.php?uri=category1/content7. The page should display the Flash with FlashVar "uri=category1/content7". The Flash knows which content it has to display so when an user comes from google, following this link, he will find the content he was looking for.
Every linking and content for SEO should be in HTML, don't trust robots capability of reading Flash.
have a look at Adobe's reference on deep-linking.
you can generate a website's sitemap.xml with a cron process (daily), such that the URLs encode the state of the application you need. This URL will encode whatever content you need to retrieve from the db, with just one index.html page.
good luck!

Resources