I need to make a web scraper that uses an input address from the client, and then retrives data from that address from a specific site. I downloaded Webharvest, is that the right thing to begin with to learn how to write the program to do it?
Also, can someone direct me to a good tutorial to learn how to do it if possible.
Here is a good web-scraper comparison table. It may help you to choose the right scraper.
Related
I am trying to scrap content from some webpages of a site. I tried html-agility-pack with c#, which is doing good in scraping html.Here I need to go through some numbers of pages while scraping. Now my question is how can I hide my self as webscraper? As I do not want other side come to know that i am scraping their content.Please Let me know if there is any way that can help me.Looking forward for your responses.
Thanks
Use a tor proxy:
Tor Project
You can reset the proxy after every page or after every site. Keep in mind that some sites look for certain patterns and can tell your scraping them. With html agility pack the web is one big data repository, just make sure your not use someone else's data in a way that would get you in trouble.
I am interested to implement Facebook way of link sharing feature in my web application. In FB when we paste a link it shows the content of link as thumbnail,few text etc.
How can I do that?
I know its Open Graph Protocol but how to implement it in my web application(based on spring MVC)?
What are the technology needed for this? I am a java,jquery guy.
Is it necessary to use facebook for this?
Open Graph isn't a library or a script you can use to build an application that is capable of doing what you want. Open Graph is a Protocol that follows a set of rules that provide a convinient scheme of building social applications.
By doing so, it's made sure that there is a standarized way to work with that data.
So the short answer: OG does not provide such a functionality, you have to build it by yourself (though there are pretty good links and scripts that make your life much easier: http://ogp.me/ scroll to the very bottom). Instead by using OG, you make sure that every application that works with OG (facebook and google to name a few examples) can work with your data properly.
It might not be the answer you searched for, but I think it should give you a little information on what OG really is.
I will explain exactly what I am trying to do, and maybe someone can tell me a simple way that I can do it.
I want to track the amount of money pledged on a Kickstarter project page. The amount pledged is consistently kept within a certain tag. What are all the ways I can do this programatically?
I am just starting out to learn how to develop on the web, so that should be a good context to allow you to better help me. (I've learned bits and peices of C, Python, VB, JS, HTML/CSS)
Is there a simple hack way to do this with free tools? How would I do it all on my own? Extending this idea further, how would I notify my android device when the amount has surpassed a predefined threshold? Is this the process known as scraping? What tool do I need at my disposal to accomplish this? What language do I need to use? Do I need my own web space?
If I eventually made this concept into an android app, is there a way to only load a small portion of a website (maybe even just enough source to get to the tag I am looking for) so that I can get the data I want on the page but not have to waste a bunch of my smartphone data loading the rest of the stuff that I didn't want?
Thank you for any help you can provide!
I'm not familiar with Kickstarter's API -- do they have one? -- but here is how I'd approach this problem:
You want to "ping" the Kickstarter periodically for information. One way to do it on Android is using BuzzBox SDK
With each execution of the background task:
Load a portion of the Kickstarter page with jQuery into your own HTML document.
Compare it with a threshold and possibly the previous stored value. Should be doable with basic <= unless you want to go anal-retentive with parsing and stuff.
Use notification in Android to notify the user once the amount is updated.
Wrap all this into an app.
I am looking at building the login/registration part of a website (ASP.NET) and would like to see some example code or instructions on how to do this properly. For example, how to correctly use cookies and how to encrypt what is stored in the cookie to ensure the session persists until they logout/timeout.
I do not want to use the builtin ASP.NET Membership/Provider stuff as it looks painful to use and not very flexible. Please do not answer with 'This is how easy the ASP.NET Membership/Providre stuff is to use, just check this out and you will use it!' as I don't want to use it!
Don't get me wrong, I'm just wondering what exactly do you find painful and not flexible about the ASP.NET providers? I've used them a lot and I find them very flexible. I've even written some custom providers which is a straight forward process.
If you don't want to use ASP.NET providers, what are your exact requirements? I might help you out if I understand a bit better what you are trying to achieve.
Michael
I was wonder if anyone can help me with this. I've been looking everywhere for this information, but I want to make a web application using dascode rss. I know that you can't link external sources. Does anyone know a way I can get around it. From what I understand a little php can get around this, but I'm unsure where to look.
OK, first thing no PHP. Dashcode is limited to HTML, CSS and JavaScript. Although having said that there are a whole range of system calls that cna be made using the functionality provided by various parts of the x-cde system.
Second yes you can link to external sources such as other web sites, api on say Twitter, google etc. RSS feeds and so on, not sure where you got the idea to the contrary.
If you want to learn how to do a Dashcode RSS then open up Dashcode, start a new project, either web based or Dsashboard based and choose the RSS project. This will give you an out of the box template to add you own information and then see how it works. Then customise it.
In the above i am assuming Snow Leopard and the latest Dashcode/X-code but it will still gove you most of what you want on earlier version.