Can anyone help me to interpret robot.txt from youtube studio? - web-scraping

What I want to accomplish is to know what the exact rules on Youtube (Youtube-sutdio) are for webscraping en using bots to automate data gathering for some channels (around 20). Youtube-studio in particular. In there terms and conditions I read the following:
"Access the Service using automated means (such as robots, botnets, or scrapers), except (a) when using a public search engine in accordance with the YouTube robots.txt file, (b) with YouTube's prior written consent obtained or (c) as permitted by applicable law."
So I went to:
https://studio.youtube.com/robots.txt
When I read the file it seems to me that Youtube studio allows crawlers with a crawl delay of 0.5 seconds. So that means I can write a crawler that gathers information from the Youtube studio right? Since there are no url's disallowed. Maybe its an obvious question since I gave the answer myself, but I just wanted to check what you guys opinions were. Because I don't have a lot of experience with crawlers and it is very important that Youtube doesn't ban my IP or worse. Can you guys take a look and tell me if you think?

Related

First Byte Time scores F

I recently purchased a new theme and installed wordpress on my GoDaddy hosting account for my portfolio. I am still working on it, but as of right now I sometimes get page load speeds of 10-20seconds, and others 2 seconds (usually after the page has been cached). I have done all that I believe I can (without breaking the site) to optimize my performance speed (reducing image sizing, using a free CDN, using W3 Total Cache, etc).
It seems that my main issue is this 'TTFB' wait time I get whenever I go to a new page that hasn't been cached yet. How I can fix this? Is it the theme's fault? Do I NEED to switch hosting providers? I really don't want to go through the hassle of doing that and paying So much more just to have less than optimal results. I am new to this.
My testing site:
http://test.ninamariephotography.com/
See my Web Page Results here:
http://www.webpagetest.org/result/161111_9W_WF0/
Thank you in advance to anyone for your help:)
Time To First Byte should depend on geography. I don't think that's your problem. I reran your test and got a B.
I think the issue is your hosting is a tiny shared instance, and you're serving static files. Here are some ideas to speed things up.
Serve images using an image-serving service. Check out imgix which is $3/m. It could help in unexpected ways serving images off an external domain depending on HTTP protocol version and browser version, and how connections are shared.
Try lossy compression. You lose some image detail, but you also lose some file size. Check out compressor.io for an easy tool.
Concatenate and minify scripts. You have a number of little javascript files that load individually. Consider joining them together and minifying. I don't know the tool chain for Wordpress, perhaps there's a setting?
If none of that helps, you should experiment with different a hosting choice.

I've just bought PAW and registered with their site. How do I use PawPrints?

I've just bought Paw and, while exploring the app found a mention of pawprints, which appear to be some sort of saved snippets or requests or something. I registered with the website and it tells me I have no saved pawprints. I've searched all over the help files and documentation and can't actually see how to create a pawprint, or even a clear definition of what a pawprint actually is.
So my questions are, what are pawprints and how do I use them?
Okay thanks Micha,
From the Blog Post (which Google couldn't find when I searched)
Last May, we launched Pawprint, a quick way to share the requests you tested in Paw. The idea of a getting a short link that you can paste anywhere, sharing what you just see on screen, was very appealing and something we wanted to do almost since the beginning of Paw.
That's handy to report bugs to the API provider (often those backend guys sitting on the other side of the room), or to show to the consumers (often the client folks playing with smartphones and web browsers) how your PATCH endpoint works.
In Paw, just hit ⌘/, and a permalink will be copied. Paste it anywhere from Slack and GitHub tickets, to StackOverflow answers.
You'll also get client code generated in many languages, plus cURL or HTTPie command lines, to run the same request from code.
Apparently the Paw website is being updated now to make this clearer.

How to restrict external access to a specific sub-URL IIS7

I've currently got a reasonably large site up that i've been asked to make changes to.
Currently To login to this site you need to go to:
www.example.com/folder/loginpage.html
This site is only accessible internally at this time and it is unlikely to ever be accessible externally.
We would like to, however, be able to direct external users to a sub-directory on the site (a 'survey' form) which is located in
www.example.com/folder/subfolder/survey.html
This survey writes its results back to the main application and i believe they are integrated tightly.
We initially tried the idea of using an additional IIS7 box as a reverse proxy however it is quite confusing to me, i'm not very familiar with IIS/ARR and the other features required (i'm mostly familiar with networking). I did try and follow a number of tutorials but didn't get very far. I'd like to avoid it if possible.
How can I, using IIS7 (this site is in ASP.NET) restrict external users from accessing anything other than the survey pages (there are a few included files necessary as well)?
Is it possible to make www.example.com/folder/subfolder/survey.html a 'website' in-itself so that i can publish a URL like survey.example.com externally?
I've come across other examples where access is restricted from specific pages but the root of the site is still accessible
ie
www.eg.com/ is allowed but www.eg.com/admin.aspx is denied. I'd like to the the reverse in effect, and if possible, hide the 'true' url.
Hope someone can help! If using a reverse proxy is possible i'm happy to do it but i'd need detailed instructions.
Thanks for reading,
Much appreciated!
Edit: Sorry all, I'm new to stackoverflow, indeed I've just realised that there are several other sub-communities. Is it more appropriate to ask this in a different community? If so, which one?
Thanks!

ASP.NET Browser Debug (support information) page

So one of the many many tasks I'm faced with daily as a developer is trying to get our support department to get as much information about the end users environment as possible.
Browser version, current cookies, plugins, etc etc and it would be handy to point people to a specific page on our site and say "copy paste this to support".
In the past I've always written these by hand, and used third party tools (such as BrowserHawk) to get as much info as possible.
How does everyone else deal with getting this information from end users, is there a nice package I'm unaware of to give a detailed dump a users env without having to get the users to run an app?
Just to clarify I'm not looking at an elmah style reporting (which is very helpful as well!) but this mainly for the client side stuff.
Some months ago I have see the googles ads page have a cool nice report button. What this button do is that capture using javacript the page as it is and send you the report, with all the details, and an image of the actually page.
So I have found this library http://html2canvas.hertzen.com/ that make the same think.
And here are some example pages with this feedback.
http://hertzen.com/experiments/jsfeedback/
So I add this feedback option, and I ask from the users to point out the issue, and send the feedback, so for pages I have a very nice image for what is not going well.
The next think is that I log and check all errors, and I fix them soon.

Do you know any tools to remove badware, malware from my website which google blocks?

I have a website which google blocked because it had badware i removed the viruses from the server and its completely clean now, the problem that this virus changed in the html, js asp files in the site and added hidden iframes and strange scripts, i removed all what i found in the diles, but the website is toooo big, so any one have any tool which i can use to remove all the effects of this badware?
google gave me this site as a reference to remove the badware from my site
http://www.stopbadware.org/home/security
Thanks,
Wipe everything from the server, check all the files, and re upload them if they're clean. Only thing you can do.
Upload the latest version of the site from your source control DB. If you dont follow source control, its high time you start doing it. ;-)
Find a good search and replace tool. If you are using Dream weaver then you can do a site wide search. The same is applicable to Visual Interdev as well.
+1 William's comment. You can do a simple grep for characteristic strings your particular infection has left behind, such as “<iframe” or the start of the encoded scripts, but you can't be sure to find all the changes that have happened without a manual inspection. This is what having a clean copy on your local computer is for.
i removed the viruses from the server
Really? Are you clean of rootkits? How can you be sure? After an infection, the only sure-fire way to recover a clean server is to reinstall everything on it from the operating system upwards.
Have you discovered and fixed the method the intruders used to get in? If not, you can be sure another of the Russian malware gangs' automated exploits will be back soon enough.
Try soswebscan
scan your website at free of cost with soswebscan.
For more details visit soswebscan website : http://soswebscan.jobandproject.com

Resources