splitting the file basis of Line Number - unix

Can u pls advise the unix command as I have a file which contain the records in the below format
333434
435435
435443
434543
343536
Now the total line count is 89380 , now i want to create a seprate
I am trying to split my large big file into small bits using the line numbers. For example my file has 89380 lines and i would like to divide this into small files wach of which has 1000 lines.
could you please advise unix command to achieve this
can unix split command can be used here..!!

Use split
Syntax split [options] filename prefix
Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:
-l linenumber
-b bytes
If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.
The split command will give each output file it creates the name prefix with an extension tacked to the end that indicates its order. By default, the split command adds aa to the first output file, proceeding through the alphabet to zz for subsequent files. If you do not specify a prefix, most systems use x .
Example1:
split myfile
This will output three 1000-line files: xaa, xab, and xac.
Example2:
split -l 500 myfile segment
This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, and segmentaf.
Example3:
Assume myfile is a 160KB file:
split -b 40k myfile segment
This will output four 40KB files: segmentaa, segmentab, segmentac, and segmentad.

You can use the --lines switch or its short form -l
split --lines=1000 input_file_name output_file_prefix

I think you can use sed command.
you can use sed -n "1, 1000p" yourfile > outputfile to get line 1 to line 1000.

Related

UNIX split command splitting this file, but what names are resulting?

We receive a big csv file from a client (500k lines, est) that we split into smaller chunks using the split command.
You can see how we're using the command below, but my bash knowledge is a bit rusty, could someone refresh me on the ${processFile}_ bit below, and how the files are being named in the end? Not recalling what the underscore does...
split -l 50000 $PROCESSING_CURRENT_DIR/$processFile ${processFile}_
This isn't anything to do with bash but how split(1) command processes its arguments to split the input.
Syntax is:
split [OPTION]... [FILE [PREFIX]]
DESCRIPTION
Output pieces of FILE to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'.
With no FILE, or when FILE is -, read standard input.
So it uses the given prefix and makes output files.

Adding line numbers in vi and concatenating files in vi

How do I concatenate files 3 files in vi with a blank line after every file's content?
Also :set number does not save changes. I want to set numbers permanently for a file. How can I do that?
Look for a commandline option and use that. cat -n adds numbers, when you like that format, and you are editing a file that has been saved, use
:% !cat -n %

Number of lines differ in text and zipped file

I zippded few files in unix and later found zipped files have different number of lines than the raw files.
>>wc -l
70308 /location/filename.txt
2931 /location/filename.zip
How's this possible?
zip files are binary files. wc command is targeted for text files.
zip compressed version of a text file may contain more or less number of newline characters because zipping is not done line per line. So if they both give same output for all commands, there is no point of compressing and keeping the file in different format.
From wc man page:
-l, --lines
print the newline counts
To get the matching output, you should try
$ unzip -c | wc -l # Decompress on stdout and count the lines
This would give (about) 3 extra lines (if there is no directory structure involved). If you compressed directory containing text file instead of just file, you may see a few more lines containing the file/directory information.
In compression algorithm word/character is replaced by some binary sequence.
let's suppose \n is replaced by 0011100
and some other character 'x' is replaced by 0001010(\n)
so wc program search for sequence 0001010 in zip file and count of these can vary.

Subsetting a file into multiple files based on a value in the last two positions of the record in the file using powershell

I want to subset a file into multiple txt files based on a value in the last two positions of the record in the file in powershell. The source file is from IBM z/OS machine and it does not have file extension. What i currently do is use a awk command to subset it based on the values in the last two positions of the record in the file like below
awk '{print > "file.txt" substr($0,length-2,2) }' RAW
The file name is RAW and it creates multiple files depending on the last two distinct values in a record. So if AA is the value in the last two position of the record of the file. i would get a file outputted like fileAA.txt. How can i achieve this in powershell?
Thanks
You could try something like:
Get-Content RAW | %{ $fn="file"+$_.Substring($_.length-2)+".txt"; $_ | Out-File $fn -Append; }

To replace the first character of the last line of a unix file with the file name

We need a shell script that retrieves all txt files in the current directory and for each file checks if it is an empty file or contains any data in it (which I believe can be done with wc command).
If it is empty then ignore it else since in our condition, all txt files in this directory will either be empty or contain huge data wherein the last line of the file will be like this:
Z|11|21||||||||||
That is the last line has the character Z then | then an integer then | then an integer then certain numbers of | symbols.
If the file is not empty, then we just assume it to have this format. Data before the last line are garbled and not necessary for us but there will be at least one line before the last line, i.e. there will be at least two lines guaranteed if the file is non-empty.
We need a code wherein, if the file is non-empty, then it takes the file, replaces the 'Z' in the last line with 'filename.txt' and writes the new data into another file say tempfile. The last line will thus become as:
filename.txt|11|21|||||||
Remaining part of the line remains same. From the tempfile, the last line, i.e., filename.txt|int|int||||| is taken out and merged into a finalfile. The contents of tempfile is cleared to receive data from next filename.txt in the same directory. finalfile has the edited version of the last lines of all non-empty txt files in that directory.
Eg: file1.txt has data as
....
....
....
Z|1|1|||||
and file2.txt has data as
....
....
....
Z|2|34|||||
After running the script, new data of file1.txt becomes
.....
.....
.....
file1.txt|1|1||||||
This will be written into a new file say temp.txt which is initially empty. From there the last line is merged into a file final.txt. So, the data in final.txt is:
file1.txt|1|1||||||
After this merging, the data in temp.txt is cleared
New data of file2.txt becomes
...
...
...
file2.txt|2|34||||||
This will be written into the same file temp.txt. From there the last line is merged into the same file final.txt.
So, the data in final.txt is
file1.txt|1|1||||||
file2.txt|2|34||||||
After considering N number of files that was returned to be as of type txt and non-empty and within the same directory, the data in final.txt becomes
file1.txt|1|1||||||
file2.txt|2|34||||||
file3.txt|8|3||||||
.......
.......
.......
fileN.txt|22|3|||||
For some of the conditions, I already know the command, like
For finding files in a directory of type text,
find <directory> -type f -name "*.txt"
For taking the last line and merging it into another file
tail -1 file.txt>>destination.txt
You can use 'sed' to replace the "z" character. You'll be in a loop, so you can use the filename that you have in that. This just removes the Z, and then echos the line and filename.
Good luck.
#!/bin/bash
filename=test.txt
line=`tail -1 $filename | sed "s/Z/$filename/"`
echo $line
Edit:
Did you run your find command first, and see the output? It has of course a ./ at the start of each line. That will break sed, since sed uses / as a delimiter. It also will not work with your problem statement, which does not have an extra "/" before the filename. You said current directory, and the command you give will traverse ALL subdirectories. Try being simple and using LS.
# `2>/dev/null` puts stderr to null, instead of writing to screen. this stops
# us getting the "no files found" (error) and thinking it's a file!
for filename in `ls *.txt 2>/dev/null` ; do
... stuff ...
done

Resources