Unix random 16 character number - unix

Hey need help in unix creating a random 16 character long number using 0-9 but the first number being not 0 (1-9).
tr -c -d 0-9 < /dev/urandom | fold -w16
something like this but first number not being 0

First, generate a 1-digit random number using the digits 1-9.
Second, generate a 15-digit random number using the digits 0-9.
Then, combine these two numbers.

Since you are on UNIX, you can use jot with arguments from bash integer expansion:
jot -r 1 $((10 ** 15)) $((((10 ** 15)) * 2))
Generates numbers like:
1595866171875968

Related

Generating random 1 to 12 digit numbers in R following some condition

I wish to write a program in R to generate 1 random number (a positive integer) each starting from 3 digits to 12 digits following these conditions:
There is no order in the consecutive number digits.
Strictly no repetition of digits in a number until the 9th digit number.
0 can be used after the 9 digit number only.
After 10 digits, a digit can be used twice but with no order.
And most importantly:
**The first number will not be the last number of next line and vice versa. **
All I know how to use is the sample command in R:
sample(1:9, size=n, replace=FALSE)
where n is the number of digits I wish to generate. However, I need to write a more generalized function or program which strictly obeys these conditions.

awk output into specified number of columns

EDIT: I can get all my desired values from the command:
awk '/Value/{print $4}' *.log > ofile.csv
This makes a .csv file with a single column with hundreds of values. I would like to separate these values into a specified number of columns, i.e. instead of having values 1-1000 in a single column, I could specify that I want 100 columns and then my .csv file would have the first column be 1-10, 2nd column be 11-20... 100th column be 991-1000.
Previously, I was using the pr command to do this, but it doesn't work when the number of columns I want is too high (>36 in my experience).
awk '/Value/{print $4}' *.log | pr -112s',' > ofile.csv
the pr command gives the following message:
pr: page width too narrow
Is there an alternative to this command that I can use, that won't restrict the amount of comma delimiters in a row of data?
If your values are always the same length, you can use column:
$ seq 100 200 | column --columns 400 | column --table --output-separator ","
100,103,106,109,112,115,118,121,124,127,130,133,136,139,142,145,148,151,154,157,160,163,166,169,172,175,178,181,184,187,190,193,196,199
101,104,107,110,113,116,119,122,125,128,131,134,137,140,143,146,149,152,155,158,161,164,167,170,173,176,179,182,185,188,191,194,197,200
102,105,108,111,114,117,120,123,126,129,132,135,138,141,144,147,150,153,156,159,162,165,168,171,174,177,180,183,186,189,192,195,198,
The --columns control the number of columns, but in my example, 400 is the number of characters, not the number of columns.
If your values are not the same character length, you will find spaces inserted where the values have a different width.

How to plot data from file from specific lines start at line with some special string

I am trying to execute command similar to
plot "data.asc" every ::Q::Q+1500 using 2 with lines
But i have problem with that "Q" number. Its not a well known value but number of line with some specific string. Lets say i have line with string "SET_10:" and then i have my data to plot after this specific line. Is there some way how to identify the number of that line with specific string?
An easy way is to pass the data through GNU sed to print just the wanted lines:
plot "< sed -n <data.asc '/^SET_10:/,+1500{/^SET_10:/d;p}'" using 1:2 with lines
The -n stops any output, the a,b says between which lines to do the {...} commands, and those commands say to delete the trigger line, and p print the others.
To make sure you have a compatible GNU sed try the command on its own, for a short number of lines, eg 5:
sed -n <data.asc '/^SET_10:/,+5{/^SET_10:/d;p}'
If this does not output the first 5 lines of your data, an alternative is to use awk, as it is too difficult in sed to count lines without this GNU-specific syntax. Test the (standard POSIX, not GNU-specific) awk equivalent:
awk <data.asc 'end!=0 && NR<=end{print} /^start/{end=NR+5}'
and if that is ok, use it in gnuplot as
plot "< awk <data.asc 'end!=0 && NR<=end{print} /^start/{end=NR+1500}'" using 1:2 with lines
Here's a version entirely within gnuplot, with no external commands needed. I tested this on gnuplot 5.0 patchlevel 3 using the following bash commands to create a simple dataset of 20 lines of which only 5 lines are to be printed from the line with "start" in column 1. You don't need to do this.
for i in $(seq 1 20)
do let j=i%2
echo "$i $j"
done >data.asc
sed -i data.asc -e '5a\
start'
The actual gnuplot uses a variable endlno initially set to NaN (not-a-number) and a function f which takes 3 parameters: a boolean start saying if column 1 has the matching string, lno the current linenumber, and the current column 1 value val. If the linenumber is less-than-or-equal-to the ending line number (and therefore it is not still NaN), f returns val, else if the start condition is true the wanted ending line number is set in variable endlno and NaN is returned. If we have not yet seen the start, NaN is returned.
gnuplot -persist <<\!
endlno=NaN
f(start,lno,val) = ((lno<=endlno)?val:(start? (endlno=lno+5,NaN) : NaN))
plot "data.asc" using (f(stringcolumn(1)eq "start", $0, $1)):2 with lines
!
Since gnuplot does not plot points with NaN values, we ignore lines upto the start, and again after the wanted number of lines.
In your case you need to change 5 to 1500 and "start" to "SET_10:".

Using sed or awk (or similar) incrementally or with a loop to do deletions in data file based on lines and position numbers given in another text file

I am looking to do deletions in a data file at specific positions in specific lines, based on a list in a separate text file, and have been struggling to get my head around it.
I'm working in cygwin, and have a (generally large) data file (data_file) to do the deletions in, and a tab-delimited text file (coords_file) listing the relevant line numbers in column 2 and the matching position numbers for each of those lines in column 3.
Effectively, I think I'm trying to do something similar to the following incomplete sed command, where coords_file$2 represents the line number taken from the 2nd column of coords_file and coords_file$3 represents the position in that line to delete from.
sed -r 's coords_file$2/(.{coords_file$3}).*/\1/' datafile
I'm wondering if there's a way to include a loop or iteration so that sed runs first using the values in the first row of coords_file to fill in the relevant line and position coordinates, and then runs again using the values from the second row, etc. for all the rows in coords_file? Or if there's another approach, e.g. using awk to achieve the same result?
e.g. for awk, I identified these coordinates based on string matches using this really handy awk command from Ed Morton's response to this question: line and string position of grep match.
awk 'NR==FNR{strings[$0]; next} {for (string in strings) if ( (idx = index($0,string)) > 0 ) print string, FNR, idx }' strings.txt data_file > coords_file.txt
Was thinking potentially something similar could work doing an in-place deletion rather than just finding the lines, such as incorporating a simple find and replace like {if($0=="somehow_reference_coords_file_values_here"){$0=""}. But it's a bit beyond me (am a coding novice, so I barely understand how that original command is actually working, let alone how to mod it).
File examples
data_file
#vandelay.1
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.2
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.3
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
coords_file (tab-delimited)
(column 1 is just the string that was matched, column 2 is the line number it matched in, and column 3 is the position number of the match).
stringID 2 20
stringID 4 20
stringID 10 27
stringID 12 27
Desired result:
#vandelay.1
blablablablablablab
+
mehmehmehmehmehmehm
#vandelay.2
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.3
blablablablablablablablabl
+
mehmehmehmehmehmehmehmehme
Any guidance would be much appreciated thanks! (And as I mentioned, I'm very new to this coding scene, so apologies if some of that doesn't make sense or my question format's shonky (or if the question itself is rudimentary)).
Cheers.
(Incidentally, this has all been a massive work around to delete strings identified in the blablabla lines of data_file as well as the same positions 2 lines below (i.e. the mehmehmeh lines), since the mehmehmeh characters are quality scores that match the blablabla characters for each sample (each #vandelay.xx). i.e. Essentially this: sed -i 's/string.*//' datafile, but also running the same deletion 2 lines below every time it identifies the string. So if there's actually an easier script to do just that instead of all the stuff in the question above, please let me know!)
You can simply use one liner awk to do that,
$ awk 'NR==FNR{a[$2]=$3;next} (FNR in a){$0=substr($0,0,a[FNR]-1)}1' coords_file data_file
#vandelay.1
blablablablablablab
+
mehmehmehmehmehmehm
#vandelay.2
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.3
blablablablablablablablabl
+
mehmehmehmehmehmehmehmehme
Brief explanation,
NR==FNR{a[$2]=$3;next}: create the line number and the matching position map in array a. This part of expression would only process coords_file because of NR==FNR
(FNR in a): then awk would start to process data_file. Use the expression to search any FNR contained in array a.
$0=substr($0,0,a[FNR]-1): re-assign the $0 to the line be cut.
1: print all lines

How to exclude columns from a data.frame and keep the spaces between columns?

I have a tab.table (like below) with million of rows and 340 columns
HanXRQChr00c0001 68062 N N N N A
HanXRQChr00c0001 68080 N N N N A
HanXRQChr00c0001 68285 N N N N A
I want to remove 28 columns. It is easy to do that, but in the output file I lose the space between my columns.
Is there any way to exclude these columns and still keep the space between them like above?
You can try different things. I include some of them below:
awk -i inplace '{$0=gensub(/\s*\S+/,"",28)}1' file
or
sed -i -r 's/(\s+)?\S+//28' file
or
awk '{$28=""; print $0}' file
or using cut as mentioned in the comments.

Resources