How to move files based on a list (which contains the filename and destination path) in terminal? - directory

I have a folder that contains a lot of files. In this case images.
I need to organise these images into a directory structure.
I have a spreadsheet that contains the filenames and the corresponding path where the file should be copied to. I've saved this file as a text document named files.txt
+--------------+-----------------------+
| image01.jpg | path/to/destination |
+--------------+-----------------------+
| image02.jpg | path/to/destination |
+--------------+-----------------------+
I'm trying to use rsync with the --files-from flag but can't get it to work.
According to man rsync:
--include-from=FILE
This option is related to the --include option, but it specifies a FILE that contains include patterns (one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -, the list will be read from standard input
Here's the command i'm using: rsync -a --files-from=/path/to/files.txt path/to/destinationFolder
And here's the rsync error: syntax or usage error (code 1) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync-52.200.1/rsync/options.c(1436) [client=2.6.9]
It's still pretty unclear to me how the files.txt document should be formatted/structured and why my command is failing.
Any help is appreciated.

Related

rsync exlude list from database?

I know I can exclude rsync files listed in a text file, but can I make rsync read a sqlite (or other) database as an exclude list?
Otherwise I guess I could dump the sqlite to a text file, but I would like to eliminate the extra step, since I have many files in many directories.
The man page says:
--exclude-from=FILE
This option is related to the --exclude option, but it specifies a FILE that contains exclude patterns (one per line). Blank lines in the file and lines starting with ";" or "#" are ignored. If FILE is -, the list will be read from standard input.
So just pipe the file names into rsync:
sqlite3 my.db "SELECT filename FROM t" | rsync --exclude-from=- ...

Is there any way to extract only one file (or a regular expression) from tar file

I have a tar.gz file.
Because of space issues and the time required extract is longer, I need to extract only the selected file.
I have tried the below
grep -l '<text>' *
file1
file2
only file1,file2 should be extracted.
What should I Do to SAVE all the tail -f data to a FILE swa3?
I have swa1.out which has list of online data inputs.
swa2 is a file which should skip the keywords from swa1.
swa3 is a file where it should write the data.
Can anyone help in this?
I have tried below commnad, but I'm not able to get it
tail -f SWA1.out |grep -vf SWA2 >> swa3
You can use do this with --extract option like this
tar --extract --file=test.tar.gz main.c
Here in --file , specify the .gz filename and at the end specify the
filename you want to extract.

Number of lines differ in text and zipped file

I zippded few files in unix and later found zipped files have different number of lines than the raw files.
>>wc -l
70308 /location/filename.txt
2931 /location/filename.zip
How's this possible?
zip files are binary files. wc command is targeted for text files.
zip compressed version of a text file may contain more or less number of newline characters because zipping is not done line per line. So if they both give same output for all commands, there is no point of compressing and keeping the file in different format.
From wc man page:
-l, --lines
print the newline counts
To get the matching output, you should try
$ unzip -c | wc -l # Decompress on stdout and count the lines
This would give (about) 3 extra lines (if there is no directory structure involved). If you compressed directory containing text file instead of just file, you may see a few more lines containing the file/directory information.
In compression algorithm word/character is replaced by some binary sequence.
let's suppose \n is replaced by 0011100
and some other character 'x' is replaced by 0001010(\n)
so wc program search for sequence 0001010 in zip file and count of these can vary.

Unzip only limited number of files in linux

I have a zipped file containing 10,000 compressed files. Is there a Linux command/bash script to unzip only 1,000 files ? Note that all compressed files have same extension.
unzip -Z1 test.zip | head -1000 | sed 's| |\\ |g' | xargs unzip test.zip
-Z1 provides a raw list of files
sed expression encodes spaces (works everywhere, including MacOS)
You can use wildcards to select a subset of files. E.g.
Extract all contained files beginning with b:
unzip some.zip b*
Extract all contained files whose name ends with y:
unzip some.zip *y.extension
You can either select a wildcard pattern that is close enough, or examine the output of unzip -l some.zip closely to determine a pattern or set of patterns that will get you exactly the right number.
I did this:
unzip -l zipped_files.zip |head -1000 |cut -b 29-100 >list_of_1000_files_to_unzip.txt
I used cut to get only the filenames, first 3 columns are size etc.
Now loop over the filenames :
for files in `cat list_of_1000_files_to_unzip.txt `; do unzip zipped_files.zip $files;done
Some advices:
Execute zip to only list a files, redirect output to some file
Truncate this file to get only top 1000 rows
Pass the file to zip to extract only specified files

Why did my use of the read command not do what I expected?

I did some havoc on my computer, when I played with the commands suggested by vezult [1]. I expected the one-liner to ask file-names to be removed. However, it immediately removed my files in a folder:
> find ./ -type f | while read x; do rm "$x"; done
I expected it to wait for my typing of stdin:s [2]. I cannot understand its action. How does the read command work, and where do you use it?
What happened there is that read reads from stdin. When you put it at the end of a pipe, it read from that pipe.
So your find becomes
file1
file2
and so on; read reads that and replaces x successively with file1 then file2, and so your loop becomes
rm "file1"
rm "file2"
and sure enough, that rm's every file starting at the current directory ".".
A couple hints.
You didn't need the "/".
It's better and safer to say
find . -type f
because should you happen to type ". /" (ie, dot SPACE slash) find will start at the current directory and then go look starting at the root directory. That trick, given the right privileges, would delete every file in the computer. "." is already the name of a directory; you don't need to add the slash.
The find or rm commands will do this
It sounds like what you wanted to do was go through all the files in all the directories starting at the current directory ".", and have it ASK if you want to delete it. You could do that with
find . -type f -exec rm -i {} \;
or
find . -type f -ok rm {} \;
and not need a loop at all. You can also do
rm -r -i *
and get nearly the same effect, except that it will try to delete directories too. If the directory is empty, that'll even work.
Another thought
Come to think of it, unless you have a LOT of files, you could also do
rm -i `find . -type f`
Now the find in backquotes will become a bunch of file names on the command line, and the '-i' interactive flag on rm will ask the yes or no question.
Charlie Martin gives you a good dissection and explanation of what went wrong with your specific example, but doesn't address the general question of:
When should you use the read command?
The answer to that is - when you want to read successive lines from some file (quite possibly the standard output of some previous sequence of commands in a pipeline), possibly splitting the lines into several separate variables. The splitting is done using the current value of '$IFS', which normally means on blanks and tabs (newlines don't count in this context; they separate lines). If there are multiple variables in the read command, then the first word goes into the first variable, the second into the second, ..., and the residue of the line into the last variable. If there's only one variable, the whole line goes into that variable.
There are many uses. This is one of the simpler scripts I have that uses the split option:
#!/bin/ksh
#
# #(#)$Id: mkdbs.sh,v 1.4 2008/10/12 02:41:42 jleffler Exp $
#
# Create basic set of databases
MKDUAL=$HOME/bin/mkdual.sql
ELEMENTS=$HOME/src/sqltools/SQL/elements.sql
cat <<! |
mode_ansi with log mode ansi
logged with buffered log
unlogged
stores with buffered log
!
while read dbs logging
do
if [ "$dbs" = "unlogged" ]
then bw=""; cw=""
else bw="-ebegin"; cw="-ecommit"
fi
sqlcmd -xe "create database $dbs $logging" \
$bw -e "grant resource to public" -f $MKDUAL -f $ELEMENTS $cw
done
The cat command with a here-document has its output sent to a pipe, so the output goes into the while read dbs logging loop. The first word goes into $dbs and is the name of the (Informix) database I want to create. The remainder of the line is placed into $logging. The body of the loop deals with unlogged databases (where begin and commit do not work), then run a program sqlcmd (completely separate from the Microsoft new-comer of the same name; it's been around since about 1990) to create a database and populate it with some standard tables and data - a simulation of the Oracle 'dual' table, and a set of tables related to the 'table of elements'.
Other scripts that use the read command are bigger (by far), but generally read lines containing one or more file names and some other attributes of relevance, and then apply an appropriate transform to the files using the attributes.
Osiris JL: file * | grep 'sh.*script' | sed 's/:.*//' | xargs wgrep read
esqlcver:read version letter
jlss: while read directory
jlss: read x || exit
jlss: read x || exit
jlss: while read file type link owner group perms
jlss: read x || exit
jlss: while read file type link owner group perms
kb: while read size name
mkbod: while read directory
mkbod:while read dist comp
mkdbs:while read dbs logging
mkmsd:while read msdfile master
mknmd:while read gfile sfile version notes
publictimestamp:while read name type title
publictimestamp:while read name type title
Osiris JL:
'Osiris JL: ' is my command line prompt; I ran this in my 'bin' directory. 'wgrep' is a variant of grep that only matches entire words (to avoid words like 'already'). This gives some indication of how I've used it.
The 'read x || exit' lines are for an interactive script that reads a response from standard input, but exits if the command gets EOF (for example, if standard input comes from /dev/null).

Resources