how to create script for downloading random youtube videos? - unix

i need a script that:
creates a folder in the current path named videos
continuously checks, via pwgen, if randomly generated youtube URLs retrieve valid URLs
when a URL is created launches a parallel process to download youtube or vimeo videos
the youtube videos are encoded in *.mov and stored in the movie folder
the names range from 1 to infinite
when the video download finishes the parallel process stops
when the script stops it deletes de movie folder
the purpose of this script is to:
create an interactive installation with openframeworks, or a similar tool
i want to use:
youtube-dl, ffmpeg and pwgen
I will be using:
mac os high sierra
everything will be opensource and published on github
pwgen will have to take as arguments:
the number of necessary characters to form a url and the number of hashes to generate
youtube-dl and ffmpeg will start from something like:
youtube-dl -t mov URL
that's all i know by now

while true; do video_id=$(LC_CTYPE=C tr -dc 'A-Za-z0-9_-' < /dev/urandom | head -c 11) if [[ $(curl -s --head -w %{http_code} https://www.youtube.com/watch?v=$video_id -o /dev/null) = 200 ]]; then youtube-dl -t mov youtube-dl https://www.youtube.com/watch?v=$video_id fi done

Related

loop ffmpeg command through sets of subfolders and process one folder at a time

I have this script on .bat file that will process any .mp4 file and create a new one with same name but adds -no-logo after the name for each video file
The problem is that I have 150 folder each has video files that I want to run this script on, is there a way to run it on each folders one by one, and do the same task
script
for %%a in (".mp4") do ffmpeg -i "%%a" -filter:v "crop=1280:700:0:0" -c:a copy "%%~na-old.mp4"
for %%a in (".mp4") do ffmpeg -i "%%~na-old.mp4" -vf scale=1280:720,setsar=1:1 "%%~na-no-Logo.mp4"
for %%a in ("*.mp4") do del "%%~na-old.mp4"
I don't want it to run 150 time at once, this will kill my PC, I want it to go one folder at a time
.If I understood your question correctly, I would get two "for". However it is not clear if you are using linux bash or Windows bat because you selected the bash subject but you mention using .bat.
The below is going to process one MP4 at the time anyway. Do one folder, then the next until all is done.
for dir in */; do
printf "\n\nFolder processed: ${dir}\n\n"
count=0
for mp4file in $dir"**/#(*.mp4); do
ffmpeg -i "$mp4file" -y
-loglevel error \
-v quiet -stats \
place your ffmpeg command here
count+=1
done
printf "\n MP4 processed count = ${count}\n"
done
You can continue from there.

Dynamic exclusion list in lsyncd

Our cloud platform is powered by opennebula. So we have two instances of the frontend in "cold swap". We use lsyncd daemon trying to keep instances in datastores synced, but there are some points: we don't want to sync VM's images that have an extension .bak cause of the other script moves all the .bak to other storage on schedule. The sync script logic looks like find all the .bak in /var/lib/one/datastores/ then create exclude.lst and then start lsyncd. Seems OK until we take a look at the datastores:
oneadmin#nola:~/cluster$ dir /var/lib/one/datastores/1/
006e099c57061d87d4b8f78ec7199221
008a10fa0764c9ac8d6fb9206c9b69bd
069299977f2fea243a837efed271182f
0a73a9adf74d92b4f175abcb578cabac
0b1cc002e370e1acd880cf781df0a6fb
0b470b182ac6d554774a3615ce87e292
0c0d98d1e0aabc23ef548ddb564c578d
0c3fad9c92a8efc7e13a73d8ae85caa3
..and so on.
We solved it with this monstrous function:
function create_exclude {
oneimage list -x | \
xmlstarlet sel -t -m "IMAGE_POOL/IMAGE" -v "ID" -o ";" -v "NAME" -o ";" -v "SOURCE" -o ":" | \
sed s/:/'\n'/g | \
awk -F";" '/.bak;\/var\/lib/ {print $3}' | \
cut -d / -f8 > /var/lib/one/cluster/exclude.lst
}
The result is the list which contains VM IDs with .bak images inside so we can exclude the whole VM folder from syncing. That's not kinda what we wanted, as the original image stays not synced. But it could be solved by restart the lsyncd script at the moment when other script moves all the .bak to other storage.
Now we get to the topic of the question.
It works until a new .bak will created. No way to add new string in exclude.lst "on the go" but to stop lsync and restart script which re-creates exclude.lst. But there is also no possibility to check the moment of creation a new .bak except another script that will monitor it in some period.
I believe that less complicated solution exists. It depends on opennebula of course, particularly in the way of the /datastores/ folder stores VMs.
Glad to know you are using OpenNebula to run your cloud :) Have you tried to use our Community Forum for support? I'm sure the rest of the Community will be happy to give a hand!
Cheers!

wget, recursively download all jpegs works only on website homepage

I'm using wget to download all jpegs from a website.
I searched a lot and this should be the way:
wget -r -nd -A jpg "http://www.hotelninfea.com"
This should recursively -r download files jpegs -A jpg and store all files in a single directory, without recreating website directory tree -nd
Running this command downloads only the jpegs from the homepage of the website, not the whole jpegs of all the website.
I know that a jpeg file could have different extensions (jpg, jpeg) and so on, but this is not the case, also there aren't any robots.txt restrictions acting.
If I remove the filter from the previous command, it works as expected
wget -r -nd "http://www.hotelninfea.com"
This is happening on Lubuntu 16.04 64bit, wget 1.17.1
Is this a bug or I am misunderstanding something?
I suspect that this is happening because the main page you mention contains links to the other pages in the form http://.../something.php, i.e., there is an explicit extension. Then the option -A jpeg has the "side-effect" of removing those pages from the traversal process.
Perhaps a bit dirty workaround in this particular case would be something like this:
wget -r -nd -A jpg,jpeg,php "http://www.hotelninfea.com" && rm -f *.php
i.e., to download only the necessary extra pages and then delete them if wget successfully terminates.
ewcz anwer pointed me to the right way, the --accept acclist parameter has a dual role, it define the rules of file saving and the rules of following links.
Reading deeply the manual I found this
If ‘--adjust-extension’ was specified, the local filename might have ‘.html’ appended to it. If Wget is invoked with ‘-E -A.php’, a filename such as ‘index.php’ will match be accepted, but upon download will be named ‘index.php.html’, which no longer matches, and so the file will be deleted.
So you can do this
wget -r -nd -E -A jpg,php,asp "http://www.hotelninfea.com"
But of course a webmaster could have been using custom extensions
So I think that the most robust solution would be a bash script, something
like
WEBSITE="http://www.hotelninfea.com"
DEST_DIR="."
image_urls=`wget -nd --spider -r "$WEBSITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -i '\.\(jpeg\|jpg\)'`
for image_url in $image_urls; do
DESTFILE="$DEST_DIR/$RANDOM.jpg"
wget "$image_url" -O "$DESTFILE"
done
--spider wget will not download the pages, just check that they are there
$RANDOM asks a random number to the operating system

How to run my script using wget?

I have a URL in my custom module which runs a long script. If i call url via wget it downloads the page content. It doesn't run the script. How to do it?
I would have thought that even though it downloaded the page it would still run the script.
To run without downloading the file use:
wget -O - -q -t 1 http://example.com/path/to/file.php
From memory:
-O and the hyphen are redirecting the output so it's not saved to a file.
-q is for quiet
-t is the number of attempts.
You can use man wget to look any more options up.

Get list of files via http server using cli (zsh/bash)

Greetings to everyone,
I'm on OSX. I use the terminal a lot as a habit from my Linux old days that I never surpassed. I wanted to download the files listed in this http server: http://files.ubuntu-gr.org/ubuntistas/pdfs/
I select them all with the mouse, put them in a txt files and then gave the following command on the terminal:
for i in `cat ../newfile`; do wget http://files.ubuntu-gr.org/ubuntistas/pdfs/$i;done
I guess it's pretty self explanatory.
I was wondering if there's any easier, better, cooler way to download this "linked" pdf files using wget or curl.
Regards
You can do this with one line of wget as follows:
wget -r -nd -A pdf -I /ubuntistas/pdfs/ http://files.ubuntu-gr.org/ubuntistas/pdfs/
Here's what each parameter means:
-r makes wget recursively follow links
-nd avoids creating directories so all files are stored in the current directory
-A restricts the files saved by type
-I restricts by directory (this one is important if you don't want to download the whole internet ;)

Resources