How to delete lines from a file that start with certain words - unix

My file extension is CSV file looks below format in unix server.
"Product_Package_Map_10302017.csv","451","2017-10-30 05:02:26"
"Targeting_10302017.csv","13","2017-10-30 05:02:26",
"Targeting_Options_10302017.csv","42","2017-10-30 05:02:27"
I want to delete a particular line based on filename keyword.

You can use grep -v:
grep -v '^"Product_Package_Map_10302017.csv"' file > file.filtered
'^"Product_Package_Map_10302017.csv"' matches the string "Product_Package_Map_10302017.csv" exactly at the line beginning
or sed can do it in-place:
sed -i '/^"Product_Package_Map_10302017.csv"/d' file
See this related post for other alternatives:
Delete lines in a text file that contain a specific string

See this previous question. A grep-based answer would be my first choice but, as you can see, there are many ways to address this one!
(Would have just commented, but my 'rep' is not yet high enough)

Related

How to handle a file having header in between the records after removing duplicates from the file

We have a file which has been processed by unix command for removing duplicates. After the de-duplication new file has the header in-between the records. Please help to solve this and thanks in advance for inputs.
Unix Command : Sort -u >
I would do something like this:
grep "headers" >output.txt
grep -v "headers" >>output.txt
The idea is the following: first take the headers and put them into output.txt, and afterwards take everything which is not a header and put it into that output file.
First you need to put the information in the output file (which means you need to create the output file, hence the single > character), secondly you need to append the information to the already existing output file (hence the double >> character).

Delete files from a list in a text file

I have a text file containing around 500 lines. Each line is an absolute path to a file. I want to delete these files using a script.
There's a suggestion here but my files have spaces in them. They have been treated with \ to escape the space but it still doesn't work. There is discussion on that thread about problems with white spaces but no solutions.
I can't simply use the find command as that won't give me the precise result, I need to use the list (which was created by running find and editing out the discrepancies).
Edit: some context. I noticed that iTunes has re-downloaded and copied multiple songs and put them in the same directory as the original songs, e.g., inside a particular album directory is '01 This Song.aac' and '01 This Song 1.aac'.
I ran a find to produce a text file with all songs matching "* 1.*" to get songs ending in 1 but of any file type. I ran this in my iTunes Media/Music directory.
Some of these songs included in the file had the number 1 in but weren't actually duplicates (victims of circumstance), so I manually deleted them.
The file I am left with is around 500 lines with songs all including spaces in the filenames. Because it's an iTunes issue, there are just a few songs in one directory, then more in another, then another, and so on -- I can't just run a script on a single directory, it has to work recursively and run only on the files named in my list.txt
As you would expect, the trick is to get the quoting right:
while read line; do rm "$line"; done < filename
To remove the file which name has spaces you can just wrap the whole path in quotes.
And to delete the list of files I would recommend to change each line of your file so that it looks like rm call. The fastest way is to use sed. So if your file is in following format:
/home/path/file name.asd
/opt/some/string/another name.wasd
...
The oneliner for that would be something like this:
sed -e 's/^/rm -f "/' file.txt | sed -e 's/$/" ;/' > newfile.sh
First sed replaces beginning of the line with rm -f ", second sed end of the line with " ;.
It would produce file with following content:
rm -rf "/home/path/file name.asd" ;
rm -rf "/opt/some/string/another name.wasd" ;
...
So you can just execute this file as a bash script.

sed usage not able to understand

I have come across unix sed command usage and not able to understand what it does. Could you please help me to understand the usage ? If possible please share some reference to understand such usages of sed command.
sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/java/default\nexport HADOOP_PREFIX=/usr/local/hadoop\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
The command is simple, though it assumes GNU sed because of the way it uses the -i option; for macOS Sierra and related systems, you'd need to use -i '' in place of just -i.
Overall, it corresponds to:
sed -i '/Pattern/ s:.*:Replacement:' file
where:
-i means overwrite each input file with its edited output without creating a backup copy.
/Pattern/ is ^export JAVA_HOME; a line starting with the word export and then JAVA_HOME separated by a single space.
s:.*:Replacement: is a substitute command, using : instead of the more conventional / (often s/.*/Replacement/) as the pattern delimiter. This is done because the replacement text contains slashes. The .* matches the whole line. The rest of the material is written in place of the original export JAVA_HOME line. The \n sequence expands to a newline, so it actually produces a number of lines in the output.
file is $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
As others have pointed out, this is a sed command invocation. The command is short for "Stream EDitor" and is quite useful for modifying files programaticallly. Your best bet is to read the man pages (man sed, but I've broken down your particular command here for instructive purposes:
sed # The command
-i # Edit file in place (no backup)
'/^export JAVA_HOME/ # For every line that begins with 'export JAVA_HOME'...
s: # substitue...
.*: # the entire line with...
export JAVA_HOME=/usr/java/default
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
:' # End of command
$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh # Run on the following file
Points of interest:
Commands can be limited to a particular address range or scope. Here, the scope was a search.
The substitue command can be delimited by almost any character (usually it is /, but in this case, : was chosen to prevent escaping of the / in the filepaths
The sed expression was enclosed in ' to prevent shell expansion of variables. Although no expansions would have taken place in this scenario, it is fairly common to see the expression wrapped in ' to eliminate the possibility.

Trim a file name in Unix

I have a file with name
ROCKET_25_08:00.csv
I want to trim the name of the file to
ROCKET_25_.csv
I tried mv but mv is not what I required because there will be cases where the files may be more than one.
I want the name till the second _.
How to get that in unix.
Please advise.
There are some utilities that provide more flexible renaming. But one solution that won't use anything other but included UNIX tools (like sed) would be:
ls -d * | sed -re 's/^([^_]*_[^_]*_)(.*)(\....)$/mv -v \1\2\3 \1\3/' | bash
This will only work in one directory, it won't process subdirectories.
It's not at all clear what you are actually trying to do, but if you just want to remove text between the last underscore and the period, you can do:
f=ROCKET_25_08:00.csv
echo ${f%_*}_.csv

How to delete one specific line in a file and modify the next line in unix?

I have a text file and there was a mistake when it was created. To fix this I need to delete a line with a specific unique string and delete the characters in the following line that precede the # symbol. I was able to do this with sed and cut but it only output that one line, not the many other 1000s of lines in my file. Here is an example of the part of the file that needs fixing. I know the line #s (delete 45603341 and modify 45603342) where this mistake occurs.
#HWI-1KL135:70:C305EACXX:5:2105:6727:102841 1:N:0:CAGATC
CCAAGTGTCACCTCTTTTATTTATTGATTT#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
I need the output to look like this and for it to leave the rest of the file intact.
#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
Thanks!
How about:
sed -i -e '45603341d;45603342s/^.*\(#.*\)$/\1/' <filename>
where you replace <filename> with the name of your file.
If you want to change a particular line and delete the above line then run,
sed -ri '45603342s/^([^#]*)(#.*)$/\2/g; 45603341d' aa
Example:
$ cat aa
#HWI-1KL135:70:C305EACXX:5:2105:6727:102841 1:N:0:CAGATC
CCAAGTGTCACCTCTTTTATTTATTGATTT#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
$ sed -r '2s/^([^#]*)(#.*)$/\2/g; 1d' aa
#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
This might work for you (GNU sed):
sed '45603341!b;N;s/^.*\n[^#]*//' file
Leave as is any other line ecsept 45603341. On this line , append the following line and then remove everything from the start to the first non-# in the the appended line.
An alternative approach to 'sed' can be to use vim macros (This also works on Windows). The main disadvantage is that you will not be able to integrate inside scripts like 'sed' does. The main advantage is that it allows for complex replacements like "search for this pattern, then clear the line, go down 3 lines, move to column 40, switch lines,...). If you are already familiar with VIM it's also much more intuitive.
In this particular case you will have to do something like
qq (start macro recording)
/^#HWI.*CAGATC$ (search pattern)
dd (delete line)
vw (select word)
d (delete selected word)
q (end macro)
To run the macro 100 times:
100#q

Resources