Copy information from external website to my website - web-scraping

Apologies if this is the wrong place to ask and if it is, please do let me know where best to do so.
I want to write a script that will pull data from website B (external site, not owned by myself) and display that data on website A (site owned by myself).
Now, I know how to do this programmatically and so my question is more about the legalities of the approach.
For example, Twitter provides API access so that you can embed tweets or a twitter feed into your page. The sites that I would like to pull data from may or may not have such APIs and so I would have to write a scraper.
Am I allowed to scrape information from websites and display it on my own site? I will of course make it absolutely clear where the information has come from; I do not intend to use any information and claim that is is my own.

I think this is generally frowned upon, as you are basically doing the same as copying a CD putting your own label on it and selling it to others (i.e. taking someone else's stuff and pretending it's your own). I suppose it depends on the licence of the web site you are scraping. If the web site provides an API (like Twitter), then they probably allow copying.

Related

Wordpress/Woocomerce plugin for upload/download files client & admin

new here, don't know if I'm doing this right but thought I may ask here. I tell you about the project real quick.
I'm developing a WordPress and Woocomerce website where the product is to request several electricity contracts (some paid, others not) where the clients need to upload a number of documents for us to do said contract. We, the site admins, need to download the client's documents and, once they are greenlight, ready, confirmed, we will upload them back again for the user/client to download. Also, we need this documents to be accesible to the client via their My Account page.
I have struggled so much to find a plugin that does this. I've found tons of file management plugins but those show the root folder of the server and we don't that. Don't know if you guys can recommend me a plugin that does something like this. Maybe one that uses s shortcode? Any help is very much welcome!

Duplicate Wordpress Site

I am writing to see if anyone has any tips on how one might be able to duplicate a Wordpress site.
We have branded and designed a research study site, and would like to copy this site entirely and rebrand it for a different study.
Does anyone know what might need to go into this to do so? Having trouble figuring this out!
Best,
Taylor
I guess it depends on what you mean by "rebrand". To just duplicate the site should be a relatively easy job to do. You will have to download everything from you public_html and also get a backup of your database. Then upload the files from public_html to a new hosting and also import the database.
After that part now arise some more things. First of the domain name. You will need to change the domain name to reflect a new one you want to use -- i will not get into details about that since you can find lots of good tutorials on how to do that with simple googling. If you need to change any pictures/logos and anything else since you designed it you should know what to change.
Then if there are remaining parts that need to be changed, for example there are many cases in texts fields where the brand of the previous research site is mentioned then I can suggest using a tool like wp-cli which is the only tool currently that comes to my mind for such a cache. It is a really useful and powerful tool but it requires you to have access to ssh to the hosting.
If i come up with something else as well, I will update this.

Protect Website Against Piracy

I have a membership website where I sell video content but I have found out that users are downloading the content. Although I had tried Amazon with cloudfront and firewall and now moved to vimeo pro, users are always able to download the content using various extensions for chrome or firefox.
Is there a way that the website can detect such extensions and prevent the user from accessing the website? Maybe an overlay with a message would do the trick.
The website is in Wordpress, so any plugin or code would be highly appreciated.
Thanks for your help!
The simple answer is that there is really no effective way to stop people downloading your videos, if you want them to be able to actually view them.
You can authenticate users and control access that way but even this does not stop authenticated users copying and sharing the video.
The usual approach is to accept it will be downloaded and use an encryption mechanism along with a key exchange mechanism which means that only people with the proper rights can see it - this is what the common DRM systems do.
Even with this, your protection level will depend on what you need to protect - if the video is an entertainment video and you just don't want people viewing it for free then this is likely a good enough solution for you. If your video contains sensitive information, e.g. company data etc, that you don't want anyone to know at all then even this won't stop someone simply pointing a camera at the screen and getting (albeit a low quality) copy.

best implementation for user group display differences

I am developing a site in Wordpress that offers functionality and content to companies.
Each company will have hundreds of users. All users of all companies get the same content.
However, the main header changes (it needs to include the companies own logo). They also will have their own sub-domain, at least fo the login page, preferably for all pages.
The content will change regularly, so I would prefer having only one copy of that.
So the requirements are:
Same content for all users at same relative url
Different header based on group of current user
Different base url per group
forwarding of user to the correct base url if they login under a wrong one
What is the best way to implement this?
Straight WP with a sub-theme that deals with the header. Mod-rewrite to deal with the urls
WP-MultiSite (how would the same content under different base urls work here?)
Several copies of the site and somehow sync the content (how would I do the sync?)
Use a different CMS
Which of these is the most future proof way to go, assuming I might have to deal with thousands of companies each with hundreds to thousands of users.
Also, If there is an easier way because I missed something in my research like an existing plugin, that would be great too.
Thanks for your help.
I would say that such a thing depends on a lot more than these requirements. For instance, how granular would you like to have your user management? And how much are the users allowed to do on the different groups? Is unique information allowed on the different domains, or is all the information shared?
Based on the information you are providing, I think youy would be best off using the multisite version of wordpress. You then could use a broadcast plugin to share the information on all sites, and create a template site from which to create new sites (using the NS cloner plugin for instance).
There are of course some problems with this approach, for instance search engine optimisation. You will get a lot of duplicate content that will hurt the google ranking of the individual sites.
It would also be possible to do this using a single site install, but then you'll run into problems with the multiple domain structure. It can be done, but the available caching plugins will not support it (at least not that I know off), whereas a multisite environment is supported out of the box. It is also more difficult to keep users from posting on different domains, as they are using a single install. A multisite environment also has as shared user base, but they can be added or removed from the different sites at will.
Using a multisite environment would also allow you greater flexibility template-wise.

Architecture ideas to allow customers to build their own site, based off external site's data?

I'm not entirely sure how to properly ask this, so please bear with me.
I have an idea for a site I would like to build, which would basically be a site for members to create some data and have it housed in my database. I would like to offer a value-add to the site which would allow people to spin off their own website via my own "website builder" tool (probably some sort of CMS). Their website would be able to communicate with my master database to display their data.
Getting down to the crux of the topic, I'm looking for architectural advice/ideas/etc. regarding what services I could use to do this. I'm not looking a 100% automated solution, but something along these lines (which may not be completely correct, I admit):
Customer puts in an order to create their own site, using my tools.
I setup a separate domain for them, roll out the CMS foundation to the site, and the customer has full editing control of the CMS to design it however they would like.
The CMS would have some customizations so that it includes functionality to call APIs located on the master site, which would return the relevant data.
In the research I have done on SO, I've seen a lot of mentions of Umbraco which honestly looks like a good start. I'm just worried that when I go to upgrade a version, I have to deal with overwriting my custom API functionality. I'm guessing this is the nature of the beast, and requires me to accept/plan for it.
Does anyone have any thoughts about this? Some high-level starting points? Thanks!
I've been thinking about this same issue for my customers.
It is not hard to automatically roll out a stock cms such as Wordpress or Joomla. This sort of thing is done all the time by "1 click installers" that DreamHost and others have.
Including custom widgets or plugins for the CMS that can connect to your main app is also not hard.
For dns, you can use Amazon Route 53 or other DNS services that include a good api at the dns management level.
I suggest that you focus on using a CMS that is very popular (eg Wordpress or Joomla) rather than something less well known such as Umbraco. Using a more popular system will drastically reduce your training costs--remember that if you supply the CMS to your customers, then they'll also expect you to supply the support for it...

Resources