Unix: grep suits of cards

Unix: grep suits of cards - unix

So I need to grep the collection that is given here:
γ = { h ∈ H | h contains at least 3 cards of every suit} examples:
Ad7hTc4h8d8sAsKd5c9cQhJdTs ∈ γ
5dKc9cJcTh7sQc3s4sAs7c2cTs ∉ γ
The suits are c d h s.
The file that I needed to use is given here:
http://computergebruik.ugent.be/oefeningenreeks1/kaarten1.txt
Thanks in advance!
** I tried egrep -c ‘([cdhs]).*\1.*\1 ’ kaarten1.txt but that command only matches one of the 4 characters 3 times, but I also need to match the other 3 characters 3 times... So I tried another one egrep -c ‘([c]).*\1.*\1 | [d]).*\1.*\1 | [h]).*\1.*\1 | [s]).*\1.*\1 ’ kaarten1.txt but that also doesn't seem to work.

The trick is to use multiple greps, searching for the suits one at a time and chaining the greps together. Each grep will successively cull non-matching hands from the results. In pseudo-code:
grep 3-diamonds | grep 3-hearts | grep 3-spades | grep 3-clubs
I'll leave it to you to figure out how to write those individual greps.

Related

How to split a column into two in Bash shell

I have a huge file with many column. I want to count the number of occurences of each values in 1 column. Therefore, I use
cut -f 2 "file" | sort | uniq -c
. I got the result as I want. However, when I read this file to R, It shows that I have only 1 column but the data is like the example below
Example:
123 Chelsea
65 Liverpool
77 Manchester city
2 Brentford
The thing I want is two columns, one for the counts the other for the names. However, I got one only. Can anyone help me to split the column into 2 or a better method to extract from the big file?
Thanks in advance !!!!

Not a beautiful solution, but try this.
Pipe the output of the previous command into this while loop:
"your program" | while read count city
do
printf "%20s\t%s" $count $city
done

If you want to simply count the unique instances in each column, your best bet would be the cut command with the custom delimiter. For instance, it would be the whitespace delimiter.
In this case you have to consider that you have subsequent spaces after the first one e.g. Manchester city.
So, in order to count the unique occurrences of the first column:
cut -d ' ' -f1 <your_file> | uniq | wc -l
where -d sets the delimiter to whitespace ' ', and -f1 gives you the first column; uniq keeps the unique instances and wc -l counts the number of occurrences.
Similarly, to count the unique occurrences of the second column:
cut -d ' ' -f2- <your_file> | uniq | wc -l
where all parameters/commands are the same except for -f2- which allows you to get the from the second column to the last (see cut man page -f<from>-<to>).
EDIT
Based on the update of your question, here is a proposition on how to get what you want in r:
You can use cut with pipe:
df = read.csv(pipe("cut -f1,2- -d ' ' <your_csv_file>"))
And this should return a dataframe with the data separated as you want.

Using "grep -c" with pipes to get a count

I want to get a count of matches with grep.
grep -c works fine with one pattern.
But when I pipe (|) a second pattern, it returns nothing.
Here is an example:
File test with contents:
A C
A A
A B
A B C
I run the command:
>grep A test
A C
A A
A B
A B C
I run the command:
>grep -c A test
4
I run the command:
>grep A test | grep B
A B
A B C
I run the command:
grep -c A test | grep B
returns nothing
How can I get a count of 2 for the second example?
thank you

grep -c A test outputs 4. The result of searching for rows that match B in a text consisting of 4 is unsurprisingly empty.
Instead, you want to count at the last step:
grep A test | grep -c B
Here, the first command filters the rows down to only those that have A, then the second command filters those rows into those that contain B, finally counting them.

The role of the pipe command is to use the result of the previous command as the input of the next command, you should use grep A test | grep -c B

groupby an element with jq

I have the following json:
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d2"},"org":"TΙ UIH","rc":{"$event":"13"}}
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d3"},"org":"TΙ UIH","rc":{"$event":"13"}}
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d4"},"org":"AB KIO","rc":{"$event":"13"}}
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d5"},"org":"GH SVS","rc":{"$event":"17"}}
How could i achieve the following output result? (tsv)
13 TΙ UIH 2
13 AB KIO 1
17 GH SVS 1
so far from what i have searched,
jq -sr 'group_by(.org)|.[]|[.[0].org, length]|#tsv'
how could i add one more group_by to achieve the desired result?

I was able to obtain the expected result from your sample JSON using the following :
group_by(.org, .rc."$event")[] | [.[0].rc."$event", .[0].org, length] | #tsv
You can try it on jqplay.org.
The modification of the group_by clause ensures we will have one entry by pair of .org/.rc.$event (without it we would only have one entry by .org, which might hide some .rc.$event).
Then we add the .rc.$event to the array you create just as you did with the .org, accessing the value of the first item of the array since we know they're all the same anyway.
To sort the result, you can put it in an array and use sort_by(.[0]) which will sort by the first element of the rows :
[group_by(.org, .rc."$event")[] | [.[0].rc."$event", .[0].org, length]] | sort_by(.[0])[] | #tsv

How to exclude columns from a data.frame and keep the spaces between columns?

I have a tab.table (like below) with million of rows and 340 columns
HanXRQChr00c0001 68062 N N N N A
HanXRQChr00c0001 68080 N N N N A
HanXRQChr00c0001 68285 N N N N A
I want to remove 28 columns. It is easy to do that, but in the output file I lose the space between my columns.
Is there any way to exclude these columns and still keep the space between them like above?

You can try different things. I include some of them below:
awk -i inplace '{$0=gensub(/\s*\S+/,"",28)}1' file
or
sed -i -r 's/(\s+)?\S+//28' file
or
awk '{$28=""; print $0}' file
or using cut as mentioned in the comments.

specifying job arrays in LSF

My objective is to repeatedly run an R script, each time with a different set of parameters.
To do so, I have been using a bash script to pass the command-line parameters to the R script by looping through an input file, in which each line contains a different combination of 7 parameters.
The input file looks like this:
10 food 0.00005 0.002 1 OBSERVED 0
10 food 0.00005 0.002 1 OBSERVED 240
10 food 0.00005 0.002 1 OBSERVED 480
10 food 0.00005 0.002 1 OBSERVED 720
10 food 0.00005 0.002 1 OBSERVED 960
10 food 0.00005 0.002 1 OBSERVED 1200
The R script to which the command-line parameters are passed, begins like this:
commandArgs(trailingOnly=FALSE)
A <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -6 )])
B <- commandArgs()[as.numeric(length(commandArgs()) -5 )]
C <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -4 )])
D <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -3 )])
E <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -2 )])
F <- commandArgs()[as.numeric(length(commandArgs()) -1 )]
G <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) )])
The bash loop that reads these in and dispatches the R script, is as follows;
#!/bin/bash
N=0
cat Input.txt | while read LINE ; do
N=$((N+1))
echo "R --no-save < /home/trichard/Script.R" "$LINE" | bsub -N -q priority -R "select[model==Xeon5450]"
done
However, the problem is that there are millions of lines in Input.txt, so this approach is way too slow (it prevents other LSF users from submitting their own jobs).
So, the question is, how to do the above using an LSF array?

The main trick is to extract the nth line from the input file. Assuming you're on a Unix-like system, you can use the "sed" command to do that. Here's an example:
N=$(wc -l < input.txt)
echo 'R --no-save -f Script.R --args $(sed "${LSB_JOBINDEX}q;d" input.txt)' |
bsub -J "R_Job[1-$N]" -N -q priority -R "select[model==Xeon5450]"
Correct argument quoting is a bit tricky and very important in this example.
Note that this uses the R "--args" option to avoid warnings messages about unrecognized arguments. I'd also suggest using commandArgs(trailingOnly=TRUE) in the R script so you only see the arguments of interest.

maybe you should consider putting it all into R and use a 'foreach' loop construct with a proper parallelization framework like 'doMPI' (or pure Rmpi if your are really motivated ;-)). So the job management system on the cluster has full control and your are basically submitting one single job.
Rather a hint then a solution to your specific problem.

The answer of Steve Westson works well; thanks!
However, in the LSF system, the maximum N jobs within a single array is limited to ~1000. That means that when you have >1000 jobs, you need to submit multiple job arrays, like this:
#!/bin/bash
increment=1000
startvalue=1
stopvalue=$(wc -l < Col_Treat_BETA_MU_RAND_METHOD_part1.txt)
stopvalue=$(( ($increment*((stopvalue+999)/$increment))+$increment ))
end=$increment
for ((s=$startvalue,e=$end ; e<$stopvalue; s+=$increment,e+=$increment)); do
echo $s "-" $e
echo 'R --no-save -f script.R --args $(sed "${LSB_JOBINDEX}q;d" input.txt)' | bsub -J "R_Job[$s-$e]" -N -q normal
done
so, this successfully submits all jobs instantaneously, wihtout the original job-by-job loop that essentially blocks other users, and annoys your sysadmin. Thanks again!
I am posting this as an answer as it exceeds the max length for a comment.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Unix: grep suits of cards - unix

Related

How to split a column into two in Bash shell

Using "grep -c" with pipes to get a count

groupby an element with jq

How to exclude columns from a data.frame and keep the spaces between columns?

specifying job arrays in LSF

Categories

Resources