Unix Command to get the count of lines in a csv file - unix

Hi I am new to UNIX and I have to get the count of lines from incoming csv files. I have used the following command to get the count.
wc -l filename.csv
Consider files coming with 1 record iam getting some files with * at the start and for those files if i issue the same command iam getting count as 0. Does * mean anything in here and if i get a file with ctrlm(CR) instead of NL how to get the count of lines in that file. gimme a command that solves the issue.

The following query helps you to get the count
cat FILE_NAME | wc -l

All of the answers are wrong. CSV files accept line breaks in between quotes which should still be considered part of the same line. If you have either Python or PHP on your machine, you could be doing something like this:
Python
//From stdin
cat *.csv | python -c "import csv; import sys; print(sum(1 for i in csv.reader(sys.stdin)))"
//From file name
python -c "import csv; print(sum(1 for i in csv.reader(open('csv.csv'))))"
PHP
//From stdin
cat *.csv | php -r 'for($i=0; fgetcsv(STDIN); $i++);echo "$i\n";'
//From file name
php -r 'for($i=0, $fp=fopen("csv.csv", "r"); fgetcsv($fp); $i++);echo "$i\n";'
I have also created a script to simulate the output of wc -l: https://github.com/dhulke/CSVCount

In case you have multiple .csv files in the same folder, use
cat *.csv | wc -l
to get the total number of lines in all csv files in the current directory. So,
-c counts characters and -m counts bytes (identical as long as you use ASCII). You can also use wc to count the number of files, e.g. by: ls -l | wc -l

wc -l mytextfile
Or to only output the number of lines:
wc -l < mytextfile
Usage: wc [OPTION]... [FILE]...
or: wc [OPTION]... --files0-from=F
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified. With no FILE, or when FILE is -,
read standard input.
-c, --bytes print the byte counts
-m, --chars print the character counts
-l, --lines print the newline counts
--files0-from=F read input from the files specified by
NUL-terminated names in file F;
If F is - then read names from standard input
-L, --max-line-length print the length of the longest line
-w, --words print the word counts
--help display this help and exit
--version output version information and exit

You can also use xsv for that. It also supports many other subcommands that are useful for csv files.
xsv count file.csv

echo $(wc -l file_name.csv|awk '{print $1}')

Related

Using grep to get the count of files where keyword exist

i am trying to get the count of files which has matching keywords in directory. Code i used is:
grep -r -i --include=\*.sas 'keyword'
Can any one help me to, how to get the count of the files which contains the keyword.
Thanks
You will need to do two things. The first is to suppress normal output from grep and print only the file name with -l. The second is to pipe the output through to wc -l to get the count of the lines, hence the count of the files.
grep -ril "keyword" --include="*.sas" * | wc -l

Tar running log file unix

I have a huge log file. I know I can tar it at end but I want the file to get zipped after every 10K line and also ensure that no data is lost.
The final goal is stop the increasing size of the file and keep it at specific limit.
Just a sample code :--
sh script.sh > log1.log &
Now, I want keep zipping log1.log so that it never crosses specific size limit.
Regards,
Abhay
let file be file.txt, then you can do :-
x=$(wc -l file.txt|cut -f 1 -d " ")
if [[ $x >> 10000 ]]
then
sed '1,10000d' file.txt > file2.txt
fi
After that just zip file2.txt and remove file2.txt
Consider using the split command. It can split by lines, bytes, pattern, etc.
split -l 10000 log1.log `date "+%Y%m%d-%H%M%S-"`
This will split the file named "log1.log" into one or more files. Each file will contain no more than 10,000 lines. These files will be named something like 20180327-085711-aa, 20180327-085711-ab, etc. You can use split's -a argument for really large log files so that it will use more than two characters in the file suffix.
The tricky part is that your shell script is still writing to the file. After the contents are split, the log must be truncated. Note that there is a small time slice between splitting the file and truncating it, so some logging data might be lost.
This example splits into 50,000 line files:
$ wc log.text
528193 1237600 10371201 log.text
$ split -l 50000 log.text `date "+%Y%m%d-%H%M%S-"` && cat /dev/null > log.text
$ ls
20180327-090530-aa 20180327-090530-ae 20180327-090530-ai
20180327-090530-ab 20180327-090530-af 20180327-090530-aj
20180327-090530-ac 20180327-090530-ag 20180327-090530-ak
20180327-090530-ad 20180327-090530-ah log.text
$ wc 20180327-090530-aa
50000 117220 982777 20180327-090530-aa
If you only want to truncate the file if it reaches a certain size (number of lines), wrap this split command in a shell script that gets run periodically (such as through cron). Here's an example of checking file size:
if (( `wc -l < log.text` > 1000000 ))
then
echo time to truncate
fi

Unix - How do you pass / pipe the output of a sed command to a ls command in Unix

I am able to determine the size of an input file using the following command :
ls -l employee.txt | awk '{print $5}'
This will print only the file size.
And I am able to print the contents of a file ignoring first and last line (refers to header & trailer) of the file. Below command does this work for me :
sed '1d;$d' employee.txt
But how to combine both these commands in such a way that it should determine the size of the file ignoring header and trailer. At the same time the header and trailer should not be removed from the input file.
I am able to achieve this by writing two statements. One to copy the full file into a new file except header and trailer and then doing a ls on the new file as below :
sed '1d;$d' employee.txt > employee1.txt
ls -l employee1.txt
I tried to do it in a single statement as below but to no avail. Any inputs will be helpful.
sed '1d;$d' employee.txt | ls -l employee.txt
ls -l `sed '1d;$d' employee.txt`
sed '1d;$d' employee.txt |xargs ls -l $1
Don't parse ls to get the file size. To get the size of a file on disk, use stat -c '%s' filename, or to get the size of a stream of characters, use wc -c.
# size of employee.txt
stat -c '%s' employee.txt
# size without header and footer
sed '1d;$d' employee.txt | wc -c
You can do it through
cat someFilename.txt | wc -c
The first command outputs the content of the file and the second one counts the size of the input.
This does not remove the header and the trailer. For that you can use 'sed' command as before.
I think you're approaching this in the wrong way.
My suggestion: query the total file size using the "stat" command:
stat --format=%s employee.txt
You can store the result in a variable like this (assuming you're running a Bash shell):
fileSize="$(stat --format=%s employee.txt)"
Now you retrieve the first line (assuming Bash shell again):
firstLine="$(head -n 1 employee.txt)"
Next you retrieve the last line (assuming Bash shell again):
lastLine="$(tail -n 1 employee.txt)"
Now compute the size of the first and the last line and subtract it from the file size:
echo "Total adjusted size is:" $((fileSize - ${#firstLine} - ${#lastLine}))

Unix: How can I count all lines containing a string in all files in a directory and see the output for each file separately

In UNIX I can do the following:
grep -o 'string' myFile.txt | wc -l
which will count the number of lines in myFile.txt containing the string.
Or I can use :
grep -o 'string' *.txt | wc -l
which will count the number of lines in all .txt extension files in my folder containing the string.
I am looking for a way to do the count for all files in the folder but to see the output separated for each file, something like:
myFile.txt 10000
myFile2.txt 20000
myFile3.txt 30000
I hope I have made my self clear, if not you can see a somewhat close example in the output of :
wc -l *.txt
Why not simply use grep -c which counts matching lines? According to the GNU grep manual it's even in POSIX, so should work pretty much anywhere.
Incidentally, your use of -o makes your commands count every occurence of the string, not every line with any occurences:
$ cat > testfile
hello hello
goodbye
$ grep -o hello testfile
hello
hello
And you're doing a regular expression search, which may differ from a string search (see the -F flag for string searching).
Use a loop over all files, something like
for f in *.txt; do echo -n $f $'\t'; echo grep 'string' "$f" | wc -l; done
But I must admit that #Yann's grep -c is neater :-). The loop can be useful for more complicated things though.

Is there any way to extract only one file (or a regular expression) from tar file

I have a tar.gz file.
Because of space issues and the time required extract is longer, I need to extract only the selected file.
I have tried the below
grep -l '<text>' *
file1
file2
only file1,file2 should be extracted.
What should I Do to SAVE all the tail -f data to a FILE swa3?
I have swa1.out which has list of online data inputs.
swa2 is a file which should skip the keywords from swa1.
swa3 is a file where it should write the data.
Can anyone help in this?
I have tried below commnad, but I'm not able to get it
tail -f SWA1.out |grep -vf SWA2 >> swa3
You can use do this with --extract option like this
tar --extract --file=test.tar.gz main.c
Here in --file , specify the .gz filename and at the end specify the
filename you want to extract.

Resources