counting number of lines starting with a specific alphabet in UNIX? - unix

How do I calculate the number of lines starting with "N" in a fastq file in UNIX? I have tried sed but I am not getting an expected output

Try:
sed -n '/^N/p' file.txt | wc -l

I don't know the layout of a fastq file, but can you use grep?
grep -c "^N" file.txt

Related

Extract text from variable in netCDF file using ncks

I am trying to extract the variable "flash_lon" from a file and output to a text file in plain text - using ncks.
When I use the following command, it displays the variables I need on screen and outputs to a file.
ncks -v flash_lon -x file.nc output.txt
However, the file is not in readable text. In the documentation for ncks, it says that "ncks will print netCDF data in ASCII format ".
What do I need to do in order to simply extract the variable to text? It is just text. I have attached an image below showing the data in the command line working, surely there must be a way to get it to output. I am on Windows 10.
If you have ncdump and sed you can output just the data only like this
ncdump -v flash_lon file.nc | sed -e '1,/data:/d' -e '$d' > output.txt
A solution I use frequently and found here:
https://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2011/msg00317.html
If you don't want even the first lines with the variable name, you can cut those with tail:
ncdump -v flash_lon file.nc | sed -e '1,/data:/d' -e '$d' | tail -n +3 > output.txt

Unix: How can I count all lines containing a string in all files in a directory and see the output for each file separately

In UNIX I can do the following:
grep -o 'string' myFile.txt | wc -l
which will count the number of lines in myFile.txt containing the string.
Or I can use :
grep -o 'string' *.txt | wc -l
which will count the number of lines in all .txt extension files in my folder containing the string.
I am looking for a way to do the count for all files in the folder but to see the output separated for each file, something like:
myFile.txt 10000
myFile2.txt 20000
myFile3.txt 30000
I hope I have made my self clear, if not you can see a somewhat close example in the output of :
wc -l *.txt
Why not simply use grep -c which counts matching lines? According to the GNU grep manual it's even in POSIX, so should work pretty much anywhere.
Incidentally, your use of -o makes your commands count every occurence of the string, not every line with any occurences:
$ cat > testfile
hello hello
goodbye
$ grep -o hello testfile
hello
hello
And you're doing a regular expression search, which may differ from a string search (see the -F flag for string searching).
Use a loop over all files, something like
for f in *.txt; do echo -n $f $'\t'; echo grep 'string' "$f" | wc -l; done
But I must admit that #Yann's grep -c is neater :-). The loop can be useful for more complicated things though.

How to delete duplicate lines in file in unix?

I can delete duplicate lines in files using below commands:
1) sort -u and uniq commands. is that possible using sed or awk ?
There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.
sort and uniq these only need to remove duplicates
cat filename | sort | uniq >> filename2
if its file consist of number use sort -n
After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'

WC command of mac showing one less result

I have a text file which has over 60MB size. It has got entries in 5105043 lines, but when I am doing wc -l it is giving only 5105042 results which is one less than actual. Does anyone have any idea why it is happening?
Is it a common thing when the file size is large?
Last line does not contain a new line.
One trick to get the result you want would be:
sed -n '=' <yourfile> | wc -l
This tells sed just to print the line number of each line in your file which wc then counts. There are probably better solutions, but this works.
The last line in your file is probably missing a newline ending. IIRC, wc -l merely counts the number of newline characters in the file.
If you try: cat -A file.txt | tail does your last line contain a trailing dollar sign ($)?
EDIT:
Assuming the last line in your file is lacking a newline character, you can append a newline character to correct it like this:
printf "\n" >> file.txt
The results of wc -l should now be consistent.
60 MB seems a bit big file but for small size files. One option could be
cat -n file.txt
OR
cat -n sample.txt | cut -f1 | tail -1

find and replace from command line unix

I have a multi line text file where each line has the format
..... Game #29832: ......
I want to append the character '1' to each number on each line (which is different on every line), does anyone know of a way to do this from the command line?
Thanks
sed -i -e 's/Game #[0-9]*/&1/' file
-i is for in-place editing, and & means whatever matched from the pattern. If you don't want to overwrite the file, omit the -i flag.
Using sed:
cat file | sed -e 's/\(Game #[0-9]*\)/\11/'
sed 's/ Game #\([0-9]*\):/ Game #1\1:/' yourfile.txt
GNU awk
awk '{b=gensub(/(Game #[0-9]+)/ ,"\\11","g",$0); print b }' file

Resources