rsync exlude list from database? - sqlite

I know I can exclude rsync files listed in a text file, but can I make rsync read a sqlite (or other) database as an exclude list?
Otherwise I guess I could dump the sqlite to a text file, but I would like to eliminate the extra step, since I have many files in many directories.

The man page says:
--exclude-from=FILE
This option is related to the --exclude option, but it specifies a FILE that contains exclude patterns (one per line). Blank lines in the file and lines starting with ";" or "#" are ignored. If FILE is -, the list will be read from standard input.
So just pipe the file names into rsync:
sqlite3 my.db "SELECT filename FROM t" | rsync --exclude-from=- ...

Related

How to move files based on a list (which contains the filename and destination path) in terminal?

I have a folder that contains a lot of files. In this case images.
I need to organise these images into a directory structure.
I have a spreadsheet that contains the filenames and the corresponding path where the file should be copied to. I've saved this file as a text document named files.txt
+--------------+-----------------------+
| image01.jpg | path/to/destination |
+--------------+-----------------------+
| image02.jpg | path/to/destination |
+--------------+-----------------------+
I'm trying to use rsync with the --files-from flag but can't get it to work.
According to man rsync:
--include-from=FILE
This option is related to the --include option, but it specifies a FILE that contains include patterns (one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -, the list will be read from standard input
Here's the command i'm using: rsync -a --files-from=/path/to/files.txt path/to/destinationFolder
And here's the rsync error: syntax or usage error (code 1) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync-52.200.1/rsync/options.c(1436) [client=2.6.9]
It's still pretty unclear to me how the files.txt document should be formatted/structured and why my command is failing.
Any help is appreciated.

Loading multiple CSV files with SQLite

I'm using SQLite, and I need to load hundreds of CSV files into one table. I didn't manage to find such a thing in the web. Is it possible?
Please note that in the beginning i used Oracle, but since Oracle have a 1000 columns limitation per table, and my CSV files have more than 1500 columns each one, i had to find another solution. I wan't to try SQLite, since i can install it fast and easily.
These CSV files have been supplied with such as amount of columns and i can't change or split them (nevermind why).
Please advise.
I ran into a similar problem and the comments to your question actually gave me the answer that finally worked for me
Step 1: merge the multiple csv's into a single file. Exclude the header for most of them but write down the header from one of them in the beginning.
Step 2: Load the single merged csv into SQLite.
For step 1 I used:
$ head -1 one.csv > all_combined.csv
$ tail -n +2 -q *.csv >> all_combined.csv
The first command writes only the first line of the csv file (you can choose whichever one file), the second command writes the whole document starting from line 2 and therefore excluding the header. The -q option makes sure that tail never writes the file name as a header.
Make sure to put all_combined.csv in a separate folder, or in some distributions, it will be included recursively!
To load into SQLite (Step 2) the answer given by Hot Licks worked for me:
sqlite> .mode csv
sqlite> .import all_combined.csv my_new_table
This assumes that my_new_table hasn't been created. Alternatively you can create beforehand and then load, but in that case exclude the header from Step 1.
http://www.sqlite.org/cli.html --
Use the ".import" command to import CSV (comma separated value) data into an SQLite table. The ".import" command takes two arguments which are the name of the disk file from which CSV data is to be read and the name of the SQLite table into which the CSV data is to be inserted.
Note that it is important to set the "mode" to "csv" before running the ".import" command. This is necessary to prevent the command-line shell from trying to interpret the input file text as some other format.
sqlite> .mode csv
sqlite> .import C:/work/somedata.csv tab1
There are two cases to consider: (1) Table "tab1" does not previously exist and (2) table "tab1" does already exist.
In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file.
For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, that row will be read as data and inserted into the table. To avoid this, make sure that table does not previously exist.
Note that you need to make sure that the files DO NOT have an initial line defining the field names. And, for "hundreds" of files you will probably want to prepare a script rather than typing in each file individually.
I didn't find a nicer way to solve this so I used find along with xargs to avoid creating a huge intermediate .csv file:
find . -type f -name '*.csv' | xargs -I% sqlite3 database.db ".mode csv" ".import % new_table" ".exit"
find prints out the file names and the -I% parameter to xargs causes the command after it to be run once for each line, with % replaced by a name of a csv file.
You can use DB Browser for SQLite to do this pretty easily.
File > Import > Table from CSV file... and then select all the files to open them together into a single table.
I just tested this out with a dozen CSV files and got a single 1 GB table from them without any work. As long as they have the same schema, DB Browser is able to put them together. You'll want to keep the 'Column Names in first line' option checked.

Number of lines differ in text and zipped file

I zippded few files in unix and later found zipped files have different number of lines than the raw files.
>>wc -l
70308 /location/filename.txt
2931 /location/filename.zip
How's this possible?
zip files are binary files. wc command is targeted for text files.
zip compressed version of a text file may contain more or less number of newline characters because zipping is not done line per line. So if they both give same output for all commands, there is no point of compressing and keeping the file in different format.
From wc man page:
-l, --lines
print the newline counts
To get the matching output, you should try
$ unzip -c | wc -l # Decompress on stdout and count the lines
This would give (about) 3 extra lines (if there is no directory structure involved). If you compressed directory containing text file instead of just file, you may see a few more lines containing the file/directory information.
In compression algorithm word/character is replaced by some binary sequence.
let's suppose \n is replaced by 0011100
and some other character 'x' is replaced by 0001010(\n)
so wc program search for sequence 0001010 in zip file and count of these can vary.

Make rsync exclude all directories that contain a file with a specific name

I would like rsync to exclude all directories that contain a file with a specific name, say ".rsync-exclude", independent of the contents of the ".rsync-exclude" file.
If the file ".rsync-exclude" contained just "*", I could use rsync -r SRC DEST --filter='dir-merge,- .rsync-exclude'.
However, the directory should be excluded independent of the contents of the ".rsync-exclude" file (it should at least be possible to leave the ".rsync-exclude" file empty).
Any ideas?
rsync does not support this (at least the manpage does not mention anything), but you can do it in two steps:
run find to find the .rsync-exclude files
pipe this list to --exclude-from (or use a temporary file)
--exclude-from=FILE
This option is related to the --exclude option, but it specifies a FILE that contains exclude patterns
(one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -,
the list will be read from standard input.
alternatively, if you do not mind to put something in the files, you can use:
-F The -F option is a shorthand for adding two --filter rules to your command. The first time it is used
is a shorthand for this rule:
--filter='dir-merge /.rsync-filter'
This tells rsync to look for per-directory .rsync-filter files that have been sprinkled through the
hierarchy and use their rules to filter the files in the transfer. If -F is repeated, it is a short-
hand for this rule:
--filter='exclude .rsync-filter'
This filters out the .rsync-filter files themselves from the transfer.
See the FILTER RULES section for detailed information on how these options work.
Old question, but I had the same one..
You can add the following filter:
--filter="dir-merge,n- .rsync-exclude"
Now you can place a .rsync-exclude file in any folder and write the names of the files and folders you want to exclude line by line. for example:
#.rsync-exclude file
folderYouWantToExclude
allFilesThatStartWithXY*
someSpecialImage.png
So you can use patterns in there too.
What you can't do is:
#.rsync-exclude file
folder/someFileYouWantToExlude
Hope it helps! Cheers
rsync -avz --exclude 'dir' /source /destination

Compare two folders which have many files inside contents

Have two folders with approx. 150 java property files.
In a shell script, how to compare both folders to see if there is any new property file in either of them and what are the differences between the property files.
The output should be in a report format.
To get summary of new/missing files, and which files differ:
diff -arq folder1 folder2
a treats all files as text, r recursively searched subdirectories, q reports 'briefly', only when files differ
diff -r will do this, telling you both if any files have been added or deleted, and what's changed in the files that have been modified.
I used
diff -rqyl folder1 folder2 --exclude=node_modules
in my nodejs apps.
Could you use dircmp ?
Diff command in Unix is used to find the differences between files(all types). Since directory is also a type of file, the differences between two directories can easily be figure out by using diff commands. For more option use man diff on your unix box.
-b Ignores trailing blanks (spaces and tabs)
and treats other strings of blanks as
equivalent.
-i Ignores the case of letters. For example,
`A' will compare equal to `a'.
-t Expands <TAB> characters in output lines.
Normal or -c output adds character(s) to the
front of each line that may adversely affect
the indentation of the original source lines
and make the output lines difficult to
interpret. This option will preserve the
original source's indentation.
-w Ignores all blanks (<SPACE> and <TAB> char-
acters) and treats all other strings of
blanks as equivalent. For example,
`if ( a == b )' will compare equal to
`if(a==b)'.
and there are many more.

Resources