Is web-scraping legal for scientific purposes? [closed] - web-scraping

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am writing a research on a service ranking algorithm, and I want to prove its performance and accuracy by running it on a public data. let's say apple store data, google play, expedia etc. Can I parse their data from HTML and use it in my research? or I would be performing illegal act (web scraping)?
And should i mention explicitly in my research that the data is used only for scientific reasons?
I've read about webscraping and the controversies about its illegality, but i did not find any article about if it's used for scientific purposes only.
Thanks in advance

There is nothing inherently illegal about web-scraping a site.
However, I would suggest that you pay attention to the particular site's "Terms of Use" to see if it is something which they expressly forbid. For example, the Expedia Terms of Use here http://www.expedia.ie/p/support/termsofuse outline:
you may not visit or make available the website or any part of the web
pages of the website by automatic means, such as by using crawlers or
shop bots to systematically retrieve or copy information or connect
the content of the website functionally to another website via links
*That being said, as long as you don't exert an unreasonable load on the site, or republish their content as your own, I don't expect you will run into any problems.

Related

When is a web page or project considered complete? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have been asked to design a website for a client as a "side job". I am trying to write up a statement of work for the project. In the past, I have done similar work, and often run into a situation where I believe the work is "done", but the client wants endless tweaks and changes. (As you know, websites are perpetually "under construction").
When you have requirements such as "Design a Home page, design a Contact Us page" how you define a page as "done"?
Don't put anything live, until they accept your work is complete. This should be enough of an incentive for them not to string you along, and allows them to have the quality website they require.
Ask the client to set up a requirements specification for version 1. When you met the requirements contained in this document is your job completed. Everything else belongs to the next version.
In the same situations, I tell my client "you want A, B, C and D. OK, sign here, and we are agreed that the end of application is A-D. Now if you wanted something more in future, it is not a part of our contract, so we'll deal with that in future and of course it has it's own price." This way you make them think before signing and lot's of things become clearer, and lots of needs show up suddenly, but in future they'll either pay more for more needs or won't talk any more :)

Legality of Mining Crowdsourced Data [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a project idea for which I want to mine publicly available data on another website that it received by crowd-sourcing. This is so I have initial data for my own project. To reiterate, I want to write a robot to grab data that is displayed on another website and use it for my own website. Does anyone know the legality of this sort of thing? Does the original website own the data that was given to it by a crowd? Even if so, can I use it?
Web scraping is a legally complicated issue.
The hassles of legal action and enforceability often keep scrapers from getting in trouble.
Outright duplication is considered actionable, although courts have ruled that "duplication of facts" is permitted (US).
I advise you read up here: http://en.wikipedia.org/wiki/Web_scraping#Legal_issues
Best,
legally, you should be fine. as long as the data is made available and the people have consented; you aren't hacking and the other site has permission to share. check for a license on the other site, if there isn't one inquire or be prepared for access to be denied at some point. and even though it is publicly available doesn't mean the other site wants it to be.
also, double check and make sure that you don't inadvertently publish private data as well.

Devepole a journal system with Drupal [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am going to develop a journal system which has paper submission and review actions with evalution forms,something like OJS system. I want to use drupal for it but I am not sure if it is a good choice.
Does Drupal have ability to create such applications ?
It is a very generic question. To answer some part:
Drupal can be customized and used for a lot of projects, thanks to the powerful community and module developers.
Let me give a glimpse of possibilities, you can find the rest:
Each paper can be a content type. Each user can have specific roles and permissions (eg. publisher, editor, reviewer etc) who are allowed to do specifically what you allow them to do. They can apply for higher roles as well.
Each review process can be captured and maintained using workflow module. There are plenty of tutorials for that.
List of articles can be shown with various properties and filters using views. They can be shown in various regions of a theme you select or make of your own (or customize).
The community can be built using forums.
In short there are thousands of possible ways you can make this. But one note from personal experience: sometimes you will find extremely tough things to be done in simple ways, while simple things will take time. This is mostly because like all systems, it takes a bit of time to get used to with the drupal api.
Best of luck!

What should I do when a standard is made private and only accessible for a fee? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have some software which we added an open common file format (.iwb) to. The government organisation that initiated that work has been cut in the cutbacks.
Now a not for profit organisation has taken up the mantle, however its going to cost and once you pay you are not allowed to reveal the "materials" you gain.
http://www.imsglobal.org/iwbcff/jointheIWBCFFIalliance.cfm
I understand people need to be paid but the whole not sharing thing makes it feel like its going against what a standard is meant for.
What's a good strategy:
Pay up and shut up (there might be plenty of closed standards
that work in this way)
Fork the standard to an organisation that will not require people to pay to read it
Drop the file format
Stay behind the curve and reverse engineer the files
Any standard that is not freely accessible is no standard at all but is instead a proprietary format. I'd say either:
petition them to open the standard up
Drop your support for it (and tell your customers why you have to)
Fork an earlier open version and create a free version of the standard
Paying for access to a standard sounds like a horrible idea because:
It encourages this behavior
It's likely to just be wasted money because others won't want to pay either, and a standard used by no one is not a standard.
Publish the last version you had access to.
Site that you support that version of the standard.

How to handle flagged content in a community? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
On a multi-lingual community with almost only user-generated content, is there a commonly used way to treat flagged content (profanity, racism, general illegal stuff etc)?
As there will be a lot non-english content, the only way to handle the flagging itself is crowdsourcing by the community itself and somehow automaticly hide/delete the flagged stuff at a threshold. But what method could be used to stop abuse? e.g. "I don't like him, lets all report this and get it deleted"
FIrst of all, it depends on your content.
But in general, I would start by hide/delete the flagged stuff at a threshold.
When the community grows I would add crowdsourcing and create a balance from both.
I would also do a general scan on all posts to search for keywords which might lead or contain bad content.
Also, you will need to create some tolerance as some posts might contain a reference to illegal stuff but intended for god reasons.
ex: dont take drugs
If the community builds well, I would mostly rely on it.
Another option you might consider is to allow your users to "hide" other users, i.e. not see the content of hidden users.
This allows people to "remove" other users that they don't feel contribute to the community.
You could also allow users to report bad posts, and allow a human to decide whether or not to hide or delete the post. You would have to have community rules for this to be effective.

Resources