cut function on specific line

cut function on specific line - unix

I have a file in unix, say as below:
$ cat file
1.this is the test file.
The file have data related to recent survey.
The requirement is to cut the first two characters only from the 1st line and print rest of the content of the file as it is, I have tried the cut command for same..
$ cut -c 3- file
this is the test file.
e file have data related to recent survey.
but its removing the 2 characters from each line, is there any way to implement the cut only on the first line or any other command we can use to get the required result (also the first two characters could be anything which need to remove, not particularly 1.)

Using shell utilities head, tail and cut:
head -n1 file | cut -c3-; tail -n+2 file
or a sed one-liner:
sed '1s/..//' file

Related

Unix Text Processing - how to remove part of a file name from the results?

I'm searching through text files using grep and sed commands and I also want the file names displayed before my results. However, I'm trying to remove part of the file name when it is displayed.
The file names are formatted like this: aja_EPL_1999_03_01.txt
I want to have only the date without the beginning letters and without the .txt extension.
I've been searching for an answer and it seems like it's possible to do that with a sed or a grep command by using something like this to look forward and back and extract between _ and .txt:
(?<=_)\d+(?=\.)
But I must be doing something wrong, because it hasn't worked for me and I possibly have to add something as well, so that it doesn't extract only the first number, but the whole date. Thanks in advance.
Edit: Adding also the working command I've used just in case. I imagine whatever command is needed would have to go at the beginning?
sed '/^$/d' *.txt | grep -P '(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)' *.txt --colour -A 1
The results look like this:
aja_EPL_1999_03_02.txt:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
A desired output would be this:
1999_03_02:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda

First off, you might want to think about your regular expression. While the one you have you say works, I wonder if it could be simplified. You told us:
(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)
It looks to me as if this is intended to match lines that start with a case insensitive "PALL", possibly preceded by any number of other characters that start with a capital letter, and that lines must not end in a backslash or a dot. So valid lines might be any of:
PALLILENNUD : korraga üritavad etc etc
Õlu on kena. Do I have appalling speling?
Peeter Pall is a limnologist at EMU!
If you'd care to narrow down this description a little and perhaps provide some examples of lines that should be matched or skipped, we may be able to do better. For instance, your outer parentheses are probably unnecessary.
Now, let's clarify what your pipe isn't doing.
sed '/^$/d' *.txt
This reads all your .txt files as an input stream, deletes any empty lines, and prints the output to stdout.
grep -P 'regex' *.txt --otheroptions
This reads all your .txt files, and prints any lines that match regex. It does not read stdin.
So .. in the command line you're using right now, your sed command is utterly ignored, as sed's output is not being read by grep. You COULD instruct grep to read from both files and stdin:
$ echo "hello" > x.txt
$ echo "world" | grep "o" x.txt -
x.txt:hello
(standard input):world
But that's not what you're doing.
By default, when grep reads from multiple files, it will precede each match with the name of the file from whence that match originated. That's also what you're seeing in my example above -- two inputs, one x.txt and the other - a.k.a. stdin, separated by a colon from the match they supplied.
While grep does include the most minuscule capability for filtering (with -o, or GNU grep's \K with optional Perl compatible RE), it does NOT provide you with any options for formatting the filename. Since you can'd do anything with the output of grep, you're limited to either parsing the output you've got, or using some other tool.
Parsing is easy, if your filenames are predictably structured as they seem to be from the two examples you've provided.
For this, we can ignore that these lines contain a file and data. For the purpose of the filter, they are a stream which follows a pattern. It looks like you want to strip off all characters from the beginning of each line up to and not including the first digit. You can do this by piping through sed:
sed 's/^[^0-9]*//'
Or you can achieve the same effect by using grep's minimal filtering to return every match starting from the first digit:
grep -o '[0-9].*'
If this kind of pipe-fitting is not to your liking, you may want to replace your entire grep with something in awk that combines functionality:
$ awk '
/[\.]$/ {next} # skip lines ending in backslash or dot
/^([A-ZÖÄÜÕŠŽ].*)?PALL/ { # lines to match
f=FILENAME
sub(/^[^0-9]*/,"",f) # strip unwanted part of filename, like sed
printf "%s:%s\n", f, $0
getline # simulate the "-A 1" from grep
printf "%s:%s\n", f, $0
}' *.txt
Note that I haven't tested this, because I don't have your data to work with.
Also, awk doesn't include any of the fancy terminal-dependent colourization that GNU grep provides through the --colour option.

Delete files from a list in a text file

I have a text file containing around 500 lines. Each line is an absolute path to a file. I want to delete these files using a script.
There's a suggestion here but my files have spaces in them. They have been treated with \ to escape the space but it still doesn't work. There is discussion on that thread about problems with white spaces but no solutions.
I can't simply use the find command as that won't give me the precise result, I need to use the list (which was created by running find and editing out the discrepancies).
Edit: some context. I noticed that iTunes has re-downloaded and copied multiple songs and put them in the same directory as the original songs, e.g., inside a particular album directory is '01 This Song.aac' and '01 This Song 1.aac'.
I ran a find to produce a text file with all songs matching "* 1.*" to get songs ending in 1 but of any file type. I ran this in my iTunes Media/Music directory.
Some of these songs included in the file had the number 1 in but weren't actually duplicates (victims of circumstance), so I manually deleted them.
The file I am left with is around 500 lines with songs all including spaces in the filenames. Because it's an iTunes issue, there are just a few songs in one directory, then more in another, then another, and so on -- I can't just run a script on a single directory, it has to work recursively and run only on the files named in my list.txt

As you would expect, the trick is to get the quoting right:
while read line; do rm "$line"; done < filename

To remove the file which name has spaces you can just wrap the whole path in quotes.
And to delete the list of files I would recommend to change each line of your file so that it looks like rm call. The fastest way is to use sed. So if your file is in following format:
/home/path/file name.asd
/opt/some/string/another name.wasd
...
The oneliner for that would be something like this:
sed -e 's/^/rm -f "/' file.txt | sed -e 's/$/" ;/' > newfile.sh
First sed replaces beginning of the line with rm -f ", second sed end of the line with " ;.
It would produce file with following content:
rm -rf "/home/path/file name.asd" ;
rm -rf "/opt/some/string/another name.wasd" ;
...
So you can just execute this file as a bash script.

Extract Middle Substring from a given String in Unix

I have a string in different ranges :
WATSON_AJAY_AB04_DOTHING.data
WATSON_NAVNEET_CK4_DOTHING.data
WATSON_PRASHANTH_KJ56_DOTHING.data
WATSON_ABHINAV_KD323_DOTHING.data
On these above string how can I extract
AB04,CK4,KJ56,KD323
in Unix?

echo "$string" | cut -d'_' -f3
You could use sed or grep for this task. But since the string is so simple, I dont think you will need to.

One method is to use the bash 'cut' command. Below is an example directly on the BASH shell/command line:
jimm#pi$ string='WATSON_AJAY_AB04_DOTHING.data'
jimm#pi$ cut -d '_' -f 3 <<< "$string"
AB04 <-- outputs the result directly
(edit: of course Lucas' answer above is also a quick 'one-liner' that does the same thing as above - he beat me to it) :)
The cut will take an _ character as the delimiter (the -d '_' part), then display the 3rd slice of the string (the -f 3 part).
Or, if you want to output that 3rd slice from a list of content (using your list above), you can write a simple BASH script.
First, save the lines above ('WATSON...etc') into something like text.txt. Then open up your favorite text editor and type:
#!/bin/sh
cut -d '_' -f 3 < $1
Save that script to some useful name like slice.sh, and make sure it is executable with something like chmod 775 slice.sh.
Then at the command line you can execute the script against your text file, and immediately get an output of those parts of the file you want (in this case the third set of text, separated by the _ character):
$ ./slice.sh text.txt
AB04
CK4
KJ56
KD323
Hope that helps! Bear in mind that the commands above may vary a bit, depending on the flavor of *nix you are using, but it should at least point you in the right direction.

How to cat using part of a filename in terminal?

I'm using terminal on OS 10.X. I have some data files of the format:
mbh5.0_mrg4.54545454545_period0.000722172513951.params.dat
mbh5.0_mrg4.54545454545_period0.00077271543854.params.dat
mbh5.0_mrg4.59090909091_period-0.000355232058085.params.dat
mbh5.0_mrg4.59090909091_period-0.000402015664015.params.dat
I know that there will be some files with similar numbers after mbh and mrg, but I won't know ahead of time what the numbers will be or how many similarly numbered ones there will be. My goal is to cat all the data from all the files with similar numbers after mbh and mrg into one data file. So from the above I would want to do something like...
cat mbh5.0_mrg4.54545454545*dat > mbh5.0_mrg4.54545454545.dat
cat mbh5.0_mrg4.5909090909*dat > mbh5.0_mrg4.5909090909.dat
I want to automate this process because there will be many such files.
What would be the best way to do this? I've been looking into sed, but I don't have a solution yet.

for file in *.params.dat; do
prefix=${file%_*}
cat "$file" >> "$prefix.dat"
done
This part ${file%_*} remove the last underscore and following text from the end of $file and saves the result in the prefix variable. (Ref: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion)

It's not 100% clear to me what you're trying to achieve here but if you want to aggregate files into a file with the same number after "mbh5.0_mrg4." then you can do the following.
ls -l mbh5.0_mrg4* | awk '{print "cat " $9 " > mbh5.0_mrg4." substr($9,12,11) ".dat" }' | /bin/bash
The "ls -s" lists the file and the "awk" takes the 9th column from the result of the ls. With some string concatenation the result is passed to /bin/bash to be executed.
This is a linux bash script, so assuming you have /bind/bash, I'm not 100% famililar with OS X. This script also assumes that the number youre grouping on is always in the same place in the filename. I think you can change /bin/bash to almost any shell you have installed.

unix command to read line from a file by passing line number

I am looking for a unix command to get a single line by passing line number to a big file (with around 5 million records). For example to get 10th line, I want to do something like
command file-name 10
Is there any such command available? We can do this by looping through each record but that will be time consuming process.

This forum entry suggests:
sed -n '52p' (file)
for printing the 52th line of a file.

Going forward, There are a lot of ways to do it, and other related stuffs.
If you want multiple lines to be printed,
sed -n -e 'Np' -e 'Mp'
Where N and M are lines which will only be printed. Refer this 10 Awesome Examples for Viewing Huge Log Files in Unix

command | sed -n '10p'
or
sed -n '10p' file

You could do something like:
head -n<lineno> <file> | tail -n1
That would give you the <lineno> lines, then only give the last line of output (your line).
Edit: It seems all the solutions here are pretty slow. However, by definition you'll have to iterate through all the records since the operating system has no way to parse line-oriented files since files are byte-oriented. (In some sense, all these programs are going to do is count the number of \n or \r characters.) In lieu of a great answer, I'll also present the timings on my system of several of these commands!
[mjschultz#mawdryn ~]$ time sed -n '145430980p' br.txt
0b10010011111111010001101111010111
real 0m25.871s
user 0m17.315s
sys 0m2.360s
[mjschultz#mawdryn ~]$ time head -n 145430980 br.txt | tail -n1
0b10010011111111010001101111010111
real 0m41.112s
user 0m39.385s
sys 0m4.291s
[mjschultz#mawdryn ~]$ time awk 'NR==145430980{print;exit}' br.txt
0b10010011111111010001101111010111
real 2m8.835s
user 1m38.076s
sys 0m3.337s
So, on my system, it looks like the sed -n '<lineno>p' <file> solution is fastest!

you can use awk
awk 'NR==10{print;exit}' file
Put an exit after printing the 10th line so that awk won't process the 5 million records file further.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex