How to handle a file having header in between the records after removing duplicates from the file

How to handle a file having header in between the records after removing duplicates from the file - unix

We have a file which has been processed by unix command for removing duplicates. After the de-duplication new file has the header in-between the records. Please help to solve this and thanks in advance for inputs.
Unix Command : Sort -u >

I would do something like this:
grep "headers" >output.txt
grep -v "headers" >>output.txt
The idea is the following: first take the headers and put them into output.txt, and afterwards take everything which is not a header and put it into that output file.
First you need to put the information in the output file (which means you need to create the output file, hence the single > character), secondly you need to append the information to the already existing output file (hence the double >> character).

Related

Issue while renaming a file with file pattern in unix

As part of our process, we get an input file in the .gz format. We need to unzip this file and add some suffix at the end of the file. The input file has timestamp so I am trying to use filter while unzipping and renaming this file.
Input file name :
Mem_Enrollment_20200515130341.dat.gz
Step 1:
Unzipping this file : (working as expected)
gzip -d Mem_Enrollment_*.dat.gz
output :
Mem_Enrollment_20200515130341.dat
Step 2: Renaming this file : (issues while renaming)
Again, I am going with the pattern but I know this won't work in this case. So, what should I do rename this file?
mv Mem_Enrollment_*.dat Mem_Enrollment_*.dat_D11
output :
Mem_Enrollment_*.dat_D11
expected output :
Mem_Enrollment_20200515130341.dat_D11

try
for fn in Mem_Enrollment_*.dat
do
mv ${fn} ${fn}_D11;
done

With just datastage you could loop over ls output from an execute command stage via "ls Mem_Enrollment_*.dat.gz" and then use an #FM as a delimiter when looping the output list. You could then breakout the gzip and rename into two separate commands, which helps with readability in your job.
Only caveat here is that the Start Loop stage doesn't accept the #FM in the delimiter due to some internal funkyness inside Datastage. So you need to set a user variable equal to it and pass that to the mark.

How to delete lines from a file that start with certain words

My file extension is CSV file looks below format in unix server.
"Product_Package_Map_10302017.csv","451","2017-10-30 05:02:26"
"Targeting_10302017.csv","13","2017-10-30 05:02:26",
"Targeting_Options_10302017.csv","42","2017-10-30 05:02:27"
I want to delete a particular line based on filename keyword.

You can use grep -v:
grep -v '^"Product_Package_Map_10302017.csv"' file > file.filtered
'^"Product_Package_Map_10302017.csv"' matches the string "Product_Package_Map_10302017.csv" exactly at the line beginning
or sed can do it in-place:
sed -i '/^"Product_Package_Map_10302017.csv"/d' file
See this related post for other alternatives:
Delete lines in a text file that contain a specific string

See this previous question. A grep-based answer would be my first choice but, as you can see, there are many ways to address this one!
(Would have just commented, but my 'rep' is not yet high enough)

Converting Filename to Filename_Inode

I'm writing my first script that takes a file and moves it to another folder, except that I want to change the filename of the file to filename_inode instead of just filename incase there are any files with the same name
I've figured out how to show this by creating the following 4 variables
inode=$(ls -i $1 | cut -c1-7) #lists the file the user types, cuts the inode from it
space="_" #used to put inbetween the filename and bname
bname=$(basename $1) #gets the basename of the file without the directory etc
bnamespaceinode=$bname$space$inode #combines the 3 values into one variable
echo "$bnamespaceinode #prints filename_inode to the window
So the bottom echo shows filename_inode which is what I want, except now when I try to move this using mv or cp i'm getting the following errors
I dont think it's anything wrong with the syntax i'm using for the mv and cv commands, and so I'm thinking I need to concatenate the 3 variables into a new file or use the result of the first and then append the other 2 to that file?
I've tried both of the above but still not having any luck, any ideas?
Thanks

Without clearer examples, I guess this could work:
$TARGETDIR=/my/target/directory
mv $1 $TARGETDIR/$(basename "$1" | sed 's/_.*/_inode/')

To replace the first character of the last line of a unix file with the file name

We need a shell script that retrieves all txt files in the current directory and for each file checks if it is an empty file or contains any data in it (which I believe can be done with wc command).
If it is empty then ignore it else since in our condition, all txt files in this directory will either be empty or contain huge data wherein the last line of the file will be like this:
Z|11|21||||||||||
That is the last line has the character Z then | then an integer then | then an integer then certain numbers of | symbols.
If the file is not empty, then we just assume it to have this format. Data before the last line are garbled and not necessary for us but there will be at least one line before the last line, i.e. there will be at least two lines guaranteed if the file is non-empty.
We need a code wherein, if the file is non-empty, then it takes the file, replaces the 'Z' in the last line with 'filename.txt' and writes the new data into another file say tempfile. The last line will thus become as:
filename.txt|11|21|||||||
Remaining part of the line remains same. From the tempfile, the last line, i.e., filename.txt|int|int||||| is taken out and merged into a finalfile. The contents of tempfile is cleared to receive data from next filename.txt in the same directory. finalfile has the edited version of the last lines of all non-empty txt files in that directory.
Eg: file1.txt has data as
....
....
....
Z|1|1|||||
and file2.txt has data as
....
....
....
Z|2|34|||||
After running the script, new data of file1.txt becomes
.....
.....
.....
file1.txt|1|1||||||
This will be written into a new file say temp.txt which is initially empty. From there the last line is merged into a file final.txt. So, the data in final.txt is:
file1.txt|1|1||||||
After this merging, the data in temp.txt is cleared
New data of file2.txt becomes
...
...
...
file2.txt|2|34||||||
This will be written into the same file temp.txt. From there the last line is merged into the same file final.txt.
So, the data in final.txt is
file1.txt|1|1||||||
file2.txt|2|34||||||
After considering N number of files that was returned to be as of type txt and non-empty and within the same directory, the data in final.txt becomes
file1.txt|1|1||||||
file2.txt|2|34||||||
file3.txt|8|3||||||
.......
.......
.......
fileN.txt|22|3|||||
For some of the conditions, I already know the command, like
For finding files in a directory of type text,
find <directory> -type f -name "*.txt"
For taking the last line and merging it into another file
tail -1 file.txt>>destination.txt

You can use 'sed' to replace the "z" character. You'll be in a loop, so you can use the filename that you have in that. This just removes the Z, and then echos the line and filename.
Good luck.
#!/bin/bash
filename=test.txt
line=`tail -1 $filename | sed "s/Z/$filename/"`
echo $line
Edit:
Did you run your find command first, and see the output? It has of course a ./ at the start of each line. That will break sed, since sed uses / as a delimiter. It also will not work with your problem statement, which does not have an extra "/" before the filename. You said current directory, and the command you give will traverse ALL subdirectories. Try being simple and using LS.
# `2>/dev/null` puts stderr to null, instead of writing to screen. this stops
# us getting the "no files found" (error) and thinking it's a file!
for filename in `ls *.txt 2>/dev/null` ; do
... stuff ...
done

Unix: prepending a file without a dummy-file?

I do not want:
$ cat file > dummy; $ cat header dummy > file
I want similar to the command below but to the beginning, not to the end:
$ cat header >> file

You can't append to the beginning of a file without rewriting the file. The first way you gave is the correct way to do this.

This is easy to do in sed if you can embed the header string directly in the command:
$ sed -i "1iheader1,header2,header3"
Or if you really want to read it from a file, you can do so with bash's help:
$ sed -i "1i$(<header)" file
BEWARE that "-i" overwrites the input file with the results. If you want sed to make a backup, change it to "-i.bak" or similar, and of course always test first with sample data in a temp directory to be sure you understand what's going to happen before you apply to your real data.

The whole dummy file thing is pretty annoying. Here's a 1-liner solution that I just tried out which seems to work.
echo "`cat header file`" > file
The ticks make the part inside quotes execute first so that it doesn't complain about the output file being an input file. It seems related to hhh's solution but a bit shorter. I suppose if the files are really large this might cause problems though because it seems like I've seen the shell complain about the ticks making commands too long before. Somewhere the part that is executed first must be stored in a buffer so that the original can be overwritten, but I'm not enough of an expert to know what/where that buffer would be or how large it could be.

You can't prepend to a file without reading all the contents of the file and writing a new file with your prepended text + contents of the file. Think of a file in Unix as a stream of bytes - it's easy to append to an end of a stream, but there is no easy operation to "rewind" the stream and write to it. Even a seek operation to the beginning of the file will overwrite the beginning of with any data you write.

One possibility is to use a here-document:
cat > "prependedfile" << ENDENDEND
prepended line(s)
`cat "file"`
ENDENDEND
There may be a memory limitation to this trick.

Thanks to right searchterm!
echo "include .headers.java\n$(cat fileObject.java )" > fileObject.java
Then with a file:
echo "$(cat .headers.java)\n\n$(cat fileObject.java )" > fileObject.java

if you want to pre-pend "header" to "file" why not append "file" to "Header"
cat file >> header

Below is a simple c-shell attempt to solve this problem. This "prepend.sh" script takes two parameters:
$1 - The file containing the pre-appending wording.
$2 - The original/target file to be modified.
#!/bin/csh
if (if ./tmp.txt) then
rm ./tmp.txt
endif
cat $1 > ./tmp.txt
cat $2 >> ./tmp.txt
mv $2 $2.bak
mv ./tmp.txt $2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to handle a file having header in between the records after removing duplicates from the file - unix

We have a file which has been processed by unix command for removing duplicates. After the de-duplication new file has the header in-between the records. Please help to solve this and thanks in advance for inputs. Unix Command : Sort -u >

Related

Issue while renaming a file with file pattern in unix

How to delete lines from a file that start with certain words

Converting Filename to Filename_Inode

To replace the first character of the last line of a unix file with the file name

Unix: prepending a file without a dummy-file?

Categories

Resources