How to accept dynamic attributes (no of columns present in csv file) - unix

How to accept dynamic attributes (no of columns present in csv file)? But I want to do aggregation on particular column for ex cust_id, notification_type,count . And output of these should be stored in another csv file.
I have tried this
awk 'BEGIN{FS=OFS=","}{a[$2 OFS $3]+=$4}END{for(i in a)print i,a[i]}' file_name
It is single line command. I want proper script.
It should be wrapped in script
like script_name.sh input_file output Folder
Sample file (actual file may in GB's)
1,A,OTC,1
2,B,RC,1
3,C,PB,1
4,A,OTC,1
5,A,RC,1
6,B,RC,1
Output Should be this:-
1,A,OTC,2
,RC,1
2,B,RC,1
3,C,PB,1

Related

how to include filename in first column of existing csv file

I have existing csv file.I want to modify the csv file and include filename in first column of the file.
Example
file.csv
1,love,anger
Modified csv
file.csv
file.csv,1,love,anger
Can we do it using one liner in awk or unix
Thanks a lot in advance
another one
$ awk '{print FILENAME (NF?",":"") $0}' file
It's just as simple as: awk '{if($0) printf("%s,%s\n", FILENAME, $0); else print FILENAME;}' file.csv
where file.csv is input file name.
UPD.: I modified command adding condition to correctly deal with empty lines

unix search a pattern on one column and remove those lines

I am trying to search one column with a particular pattern, and eliminate those rows and create a new file without that pattern.
Sample Data:
col1|col2|col3|col4
abc|test123|demo|test
def|test345|exam|write
ghf|456|test|account
ijk|789|travel|destination
Expected Output:
col1|col2|col3|col4
ghf|456|test|account
ijk|789|travel|destination
I want to search for the pattern "test" in 2nd column, and remove those rows from the source file, and create a new file as shown in the expected output.
File is a delimited file "|".
awk -F"|" '{if(index($2,"test")==0) printf "%s\n", $0}' test > test_out
test is original file.
test_out is final expected file.

Loading multiple CSV files with SQLite

I'm using SQLite, and I need to load hundreds of CSV files into one table. I didn't manage to find such a thing in the web. Is it possible?
Please note that in the beginning i used Oracle, but since Oracle have a 1000 columns limitation per table, and my CSV files have more than 1500 columns each one, i had to find another solution. I wan't to try SQLite, since i can install it fast and easily.
These CSV files have been supplied with such as amount of columns and i can't change or split them (nevermind why).
Please advise.
I ran into a similar problem and the comments to your question actually gave me the answer that finally worked for me
Step 1: merge the multiple csv's into a single file. Exclude the header for most of them but write down the header from one of them in the beginning.
Step 2: Load the single merged csv into SQLite.
For step 1 I used:
$ head -1 one.csv > all_combined.csv
$ tail -n +2 -q *.csv >> all_combined.csv
The first command writes only the first line of the csv file (you can choose whichever one file), the second command writes the whole document starting from line 2 and therefore excluding the header. The -q option makes sure that tail never writes the file name as a header.
Make sure to put all_combined.csv in a separate folder, or in some distributions, it will be included recursively!
To load into SQLite (Step 2) the answer given by Hot Licks worked for me:
sqlite> .mode csv
sqlite> .import all_combined.csv my_new_table
This assumes that my_new_table hasn't been created. Alternatively you can create beforehand and then load, but in that case exclude the header from Step 1.
http://www.sqlite.org/cli.html --
Use the ".import" command to import CSV (comma separated value) data into an SQLite table. The ".import" command takes two arguments which are the name of the disk file from which CSV data is to be read and the name of the SQLite table into which the CSV data is to be inserted.
Note that it is important to set the "mode" to "csv" before running the ".import" command. This is necessary to prevent the command-line shell from trying to interpret the input file text as some other format.
sqlite> .mode csv
sqlite> .import C:/work/somedata.csv tab1
There are two cases to consider: (1) Table "tab1" does not previously exist and (2) table "tab1" does already exist.
In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file.
For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, that row will be read as data and inserted into the table. To avoid this, make sure that table does not previously exist.
Note that you need to make sure that the files DO NOT have an initial line defining the field names. And, for "hundreds" of files you will probably want to prepare a script rather than typing in each file individually.
I didn't find a nicer way to solve this so I used find along with xargs to avoid creating a huge intermediate .csv file:
find . -type f -name '*.csv' | xargs -I% sqlite3 database.db ".mode csv" ".import % new_table" ".exit"
find prints out the file names and the -I% parameter to xargs causes the command after it to be run once for each line, with % replaced by a name of a csv file.
You can use DB Browser for SQLite to do this pretty easily.
File > Import > Table from CSV file... and then select all the files to open them together into a single table.
I just tested this out with a dozen CSV files and got a single 1 GB table from them without any work. As long as they have the same schema, DB Browser is able to put them together. You'll want to keep the 'Column Names in first line' option checked.

Subsetting a file into multiple files based on a value in the last two positions of the record in the file using powershell

I want to subset a file into multiple txt files based on a value in the last two positions of the record in the file in powershell. The source file is from IBM z/OS machine and it does not have file extension. What i currently do is use a awk command to subset it based on the values in the last two positions of the record in the file like below
awk '{print > "file.txt" substr($0,length-2,2) }' RAW
The file name is RAW and it creates multiple files depending on the last two distinct values in a record. So if AA is the value in the last two position of the record of the file. i would get a file outputted like fileAA.txt. How can i achieve this in powershell?
Thanks
You could try something like:
Get-Content RAW | %{ $fn="file"+$_.Substring($_.length-2)+".txt"; $_ | Out-File $fn -Append; }

To replace the first character of the last line of a unix file with the file name

We need a shell script that retrieves all txt files in the current directory and for each file checks if it is an empty file or contains any data in it (which I believe can be done with wc command).
If it is empty then ignore it else since in our condition, all txt files in this directory will either be empty or contain huge data wherein the last line of the file will be like this:
Z|11|21||||||||||
That is the last line has the character Z then | then an integer then | then an integer then certain numbers of | symbols.
If the file is not empty, then we just assume it to have this format. Data before the last line are garbled and not necessary for us but there will be at least one line before the last line, i.e. there will be at least two lines guaranteed if the file is non-empty.
We need a code wherein, if the file is non-empty, then it takes the file, replaces the 'Z' in the last line with 'filename.txt' and writes the new data into another file say tempfile. The last line will thus become as:
filename.txt|11|21|||||||
Remaining part of the line remains same. From the tempfile, the last line, i.e., filename.txt|int|int||||| is taken out and merged into a finalfile. The contents of tempfile is cleared to receive data from next filename.txt in the same directory. finalfile has the edited version of the last lines of all non-empty txt files in that directory.
Eg: file1.txt has data as
....
....
....
Z|1|1|||||
and file2.txt has data as
....
....
....
Z|2|34|||||
After running the script, new data of file1.txt becomes
.....
.....
.....
file1.txt|1|1||||||
This will be written into a new file say temp.txt which is initially empty. From there the last line is merged into a file final.txt. So, the data in final.txt is:
file1.txt|1|1||||||
After this merging, the data in temp.txt is cleared
New data of file2.txt becomes
...
...
...
file2.txt|2|34||||||
This will be written into the same file temp.txt. From there the last line is merged into the same file final.txt.
So, the data in final.txt is
file1.txt|1|1||||||
file2.txt|2|34||||||
After considering N number of files that was returned to be as of type txt and non-empty and within the same directory, the data in final.txt becomes
file1.txt|1|1||||||
file2.txt|2|34||||||
file3.txt|8|3||||||
.......
.......
.......
fileN.txt|22|3|||||
For some of the conditions, I already know the command, like
For finding files in a directory of type text,
find <directory> -type f -name "*.txt"
For taking the last line and merging it into another file
tail -1 file.txt>>destination.txt
You can use 'sed' to replace the "z" character. You'll be in a loop, so you can use the filename that you have in that. This just removes the Z, and then echos the line and filename.
Good luck.
#!/bin/bash
filename=test.txt
line=`tail -1 $filename | sed "s/Z/$filename/"`
echo $line
Edit:
Did you run your find command first, and see the output? It has of course a ./ at the start of each line. That will break sed, since sed uses / as a delimiter. It also will not work with your problem statement, which does not have an extra "/" before the filename. You said current directory, and the command you give will traverse ALL subdirectories. Try being simple and using LS.
# `2>/dev/null` puts stderr to null, instead of writing to screen. this stops
# us getting the "no files found" (error) and thinking it's a file!
for filename in `ls *.txt 2>/dev/null` ; do
... stuff ...
done

Resources