Randomly extract video frames from multiple files - r

Basically I have a folder with hundreds of video files(*.avi) each one with more or less an hour long. What I would like to achieve is a piece of code that could go through each one of those videos and randomly select two or three frames from each file and then stitch it back together or in alternative save the frames in a folder as jpegs.
Initially I thought I could do this using R but quickly I've realised that I would need something else possible working together with R.
Is it possible to call FFMPEG from R to do the task above?
I've trawled the internet looking for things that could help me start but most of what I've found is too specific and really applicable to what I need to do.
Could anyone please help me out or simply point me in the right direction.
Many thanks

I had a related question here recently, and found it was more straightforward to do this in bash, if you're using a Unix system.
I might get downvoted to oblivion for posting this answer here as it's not related to R, but I hope it helps. Something like this worked for me:
#!/bin/bash
for i in *.avi
do TOTAL_FRAMES=$(ffmpeg -i $i -vcodec copy -acodec copy -f null /dev/null 2>&1 | grep frame | cut -d ' ' -f 1 | sed s/frame=//)
FPS=ffmpeg -i 18_1*avi -vcodec copy -acodec copy -f null /dev/null 2>&1 | grep fps | cut -d ' ' -f 2 | sed s/fps=//
for j in {1..3}
do RANDOM_FRAME=$[RANDOM % TOTAL_FRAMES]
TIME=$((RANDOM_FRAME/FPS))
ffmpeg -ss $TIME -i $i -frames:v 1 ${i}_${j}.jpg
done
done
Basically, for each .avi in the directory, the number of frames and FPS is calculated. Then, three random frames are extracted as a .jpg using the $RANDOM function in bash and feeding the random frame into ffmpeg as a hh:mm:ss time by dividing RANDOM_FRAME by FPS.
You could always do these calculations from inside R with system() calls if you're not familiar with bash lingo.

Try this, worked for me in many format.
Written in bash ofcrs. I just randomised the seconds, minutes and hours independently in a range of duration of video by applying constraints over it and combined it to form a proper timestamp. All you need was a random timestamp in a video duration range.
#!/bin/bash
##This script will create timestamps.
NO_OF_FRAMES=10
DURATION=$(ffprobe -select_streams v -show_streams input.avi 2>/dev/null | grep DURATION | sed -e 's/TAG\:DURATION=//')
HOUR=$(echo $DURATION | cut -d ':' -f 1)
MIN=$(echo $DURATION | cut -d ':' -f 2)
SEC=$(echo $DURATION | cut -d ':' -f 3 | cut -d '.' -f 1)
for s in $(shuf -i 0-$SEC -n$NO_OF_FRAMES)
do
m=$(shuf -i 0-$MIN -n1)
h=$(shuf -i 0-$HOUR -n1)
time=$(echo $h:$m:$s)
echo $time
done
Now you can just pass it through a another loop from another script to grab your frames, or just make a function 🙃.
#!/bin/bash
time=$(bash time_stamps.sh | sort)
for TIME in $time
do
ffmpeg -ss $TIME -i input.avi -frames:v 1 sample_$TIME.jpg
done

Related

What is the fastest way to copy a large number of file paths mentioned in a text file, from one directory to another

I have a text file that lists a large number of file paths. I need to copy all these files from the source directory (mentioned in the path in the file one every line) to a destination directory.
Currently, the command line I tried is
while read line; do cp $ line dest_dir; done < my_file.txt
This seems to be a bit slow. Is there a way to parallelise this whole thing or speed it up ?
You could try GNU Parallel as follows:
parallel --dry-run -a fileList.txt cp {} destinationDirectory
If you like what it says, remove the --dry-run.
You could do something like the following (in your chosen shell)
#!/bin/bash
BATCHSIZE=2
# **NOTE**: check exists with -f and points at the right place. you might not need this. depends on your own taste for risk.
ln -s `which cp` /tmp/myuniquecpname
# **NOTE**: this sort of thing can have limits in some shells
for i in `cat test.txt`
do
BASENAME="`basename $i`"
echo doing /tmp/myuniquecpname $i test2/$BASENAME &
/tmp/myuniquecpname $i test2/$BASENAME &
COUNT=`ps -ef | grep /tmp/myuniquecpname | grep -v grep | wc -l`
# **NOTE**: maybe need to put a timeout on this loop
until [ $COUNT -lt $BATCHSIZE ]; do
COUNT=`ps -ef | grep /tmp/myuniquecpname | grep -v grep | wc -l`
echo waiting...
sleep 1
done
done

Read hex data into less

I want to give a big data file to less -s -M +Gg such that read current given data in less -s -M +Gg.
While-loop example (see ntc2's answer)
Less command explained here.
Replacing the yes by a binary file which is converted to binary ascii and hex:
while read -u 10 p || [[ -n $p ]]; do
hexdump -e '/4 "%08x\n"' {$p} \
\
| less -s -M +Gg
done 10</Users/masi/Dropbox/7-8\:2015/r3.raw
where the looping is based on this thread here.
How can you read such data into less?
I don't understand the details of the example, but I think you want to put the less outside of the loop, like this:
while read -u 10 p || [[ -n $p ]]; do
hexdump -e '/4 "%08x\n"' {$p}
done 10</Users/masi/Dropbox/7-8\:2015/r3.raw | less -s -M +Gg

use gpsd or cgps to return latitude and longitude then quit

I would like a dead-simple way to query my gps location from a usb dongle from the unix command line.
Right now, I know I've got a functioning software and hardware system, as evidenced by the success of the cgps command in showing me my position. I'd now like to be able to make short requests for my gps location (lat,long in decimals) from the command line. my usb serial's path is /dev/ttyUSB0 and I'm using a Global Sat dongle that outputs generic NMEA sentences
How might I accomplish this?
Thanks
telnet 127.0.0.1 2947
?WATCH={"enable":true}
?POLL;
gives you your answer, but you still need to separate the wheat from the chaff. It also assumes the gps is not coming in from a cold start.
A short script could be called, e.g.;
#!/bin/bash
exec 2>/dev/null
# get positions
gpstmp=/tmp/gps.data
gpspipe -w -n 40 >$gpstmp"1"&
ppid=$!
sleep 10
kill -9 $ppid
cat $gpstmp"1"|grep -om1 "[-]\?[[:digit:]]\{1,3\}\.[[:digit:]]\{9\}" >$gpstmp
size=$(stat -c%s $gpstmp)
if [ $size -gt 10 ]; then
cat $gpstmp|sed -n -e 1p >/tmp/gps.lat
cat $gpstmp|sed -n -e 2p >/tmp/gps.lon
fi
rm $gpstmp $gpstmp"1"
This will cause 40 sentences to be output and then grep lat/lon to temporary files and then clean up.
Or, from GPS3 github repository place the alpha gps3.py in the same directory as, and execute, the following Python2.7-3.4 script.
from time import sleep
import gps3
the_connection = gps3.GPSDSocket()
the_fix = gps3.DataStream()
try:
for new_data in the_connection:
if new_data:
the_fix.refresh(new_data)
if not isinstance(the_fix.TPV['lat'], str): # check for valid data
speed = the_fix.TPV['speed']
latitude = the_fix.TPV['lat']
longitude = the_fix.TPV['lon']
altitude = the_fix.TPV['alt']
print('Latitude:', latitude, 'Longitude:', longitude)
sleep(1)
except KeyboardInterrupt:
the_connection.close()
print("\nTerminated by user\nGood Bye.\n")
If you want it to close after one iteration also import sys and then replace sleep(1) with sys.exit()
much easier solution:
$ gpspipe -w -n 10 | grep -m 1 lon
{"class":"TPV","device":"tcp://localhost:4352","mode":2,"lat":11.1111110000,"lon":22.222222222}
source
You can use my script : gps.sh return "x,y"
#!/bin/bash
x=$(gpspipe -w -n 10 |grep lon|tail -n1|cut -d":" -f9|cut -d"," -f1)
y=$(gpspipe -w -n 10 |grep lon|tail -n1|cut -d":" -f10|cut -d"," -f1)
echo "$x,$y"
sh gps.sh
43.xx4092000,6.xx1269167
Putting a few of the bits of different answers together with a bit more jq work, I like this version:
$ gpspipe -w -n 10 | grep -m 1 TPV | jq -r '[.lat, .lon] | #csv'
40.xxxxxx054,-79.yyyyyy367
Explanation:
(1) use grep -m 1 after invoking gpspipe, as used by #eadmaster's answer, because the grep will exit as soon as the first match is found. This gets you results faster instead of having to wait for 10 lines (or using two invocations of gpspipe).
(2) use jq to extract both fields simultaneously; the #csv formatter is more readable. Note the use of jq -r (raw output), so that the output is not put in quotes. Otherwise the output would be "40.xxxx,-79.xxxx" - which might be fine or better for some applications.
(3) Search for the TPV field by name for clarity. This is the "time, position, velocity" record, which is the one we want for extracting the current lat & lon. Just searching for "lat" or "lon" risks getting confused by the GST object that some GPSes may supply, and in that object, 'lat' and 'lon' are the standard deviation of the position error, not the position itself.
Improving on eadmaster's answer here is a more elegant solution:
gpspipe -w -n 10 | jq -r '.lon' | grep "[[:digit:]]" | tail -1
Explanation:
Ask from gpsd 10 times the data
Parse the received JSONs using jq
We want only numeric values, so filter using grep
We want the last received value, so use tail for that
Example:
$ gpspipe -w -n 10 | jq -r '.lon' | grep "[[:digit:]]" | tail -1
28.853181286

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character
It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.
In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}
This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

Passing output from one command as argument to another [duplicate]

This question already has answers here:
How to pass command output as multiple arguments to another command
(5 answers)
Closed 5 years ago.
I have this for:
for i in `ls -1 access.log*`; do tail $i |awk {'print $4'} |cut -d: -f 1 |grep - $i > $i.output; done
ls will give access.log, access.log.1, access.log.2 etc.
tail will give me the last line of each file, which looks like: 192.168.1.23 - - [08/Oct/2010:14:05:04 +0300] etc. etc. etc
awk+cut will extract the date (08/Oct/2010 - but different in each access.log), which will allow me to grep for it and redirect the output to a separate file.
But I cannot seem to pass the output of awk+cut to grep.
The reason for all this is that those access logs include lines with more than one date (06/Oct, 07/Oct, 08/Oct) and I just need the lines with the most recent date.
How can I achieve this?
Thank you.
As a sidenote, tail displays the last 10 lines.
A possible solution would be to grepthis way:
for i in `ls -lf access.log*`; do grep $(tail $i |awk {'print $4'} |cut -d: -f 1| sed 's/\[/\\[/') $i > $i.output; done
why don't you break it up into steps??
for file in *access.log
do
what=$(tail "$i" |awk {'print $4'} |cut -d: -f 1)
grep "$what" "$file" >> output
done
You shouldn't use ls that way. Also, ls -l gives you information you don't need. The -f option to grep will allow you to pipe the pattern to grep. Always quote variables that contain filenames.
for i in access.log*; do awk 'END {sub(":.*","",$4); print substr($4,2)}' "$i" | grep -f - $i > "$i.output"; done
I also eliminated tail and cut since AWK can do their jobs.
Umm...
Use xargs or backticks.
man xargs
or
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_04.html , section 3.4.5. Command substitution
you can try:
grep "$(stuff to get piped over to be grep-ed)" file
I haven't tried this, but my answer applied here would look like this:
grep "$(for i in `ls -1 access.log*`; do tail $i |awk {'print $4'} |cut -d: -f 1 |grep - $i > $i.output; done)" $i

Resources