What is the best way to get the number of responses of a findscu query?
For now I am thinking of exporting the responses to an .xml file and count the tags. Is there a better way?
The answer to your question depends on what you mean by "best way".
Using one of findscu's --extract-xxx options and then count the number of created files or, in case of --extract-xml-single, count the number of "dataset" elements would be a possible solution. However, it requires to create output files, which might slow down the process.
Alternatively, you could count the number of lines in the log output that contain the text "Find Response:", i.e. something like the following should work with "bash":
findscu dicomserver.co.uk 11112 -P -k 0008,0052=PATIENT -k 0010,0020 2>&1 | fgrep -c "Find Response:"
Related
I have a huge table I want to extract information from. Firstly, I want to extract a certain line based on a pattern -> I've done that successfully with grep. However this line has loads of columns and I'm interested only in a couple of them that have a certain pattern in them (partial match - beginning of the string). Is it possible to extract only the columns and the number of the column (the nth column) for some partial matches? Hope I was clear enough.
Languages: Preferably in bash but I can also work in R, alternatively I'm open to suggestions if you think another language can be more helpful.
Thanks!
Awk is perfect for stuff like this. To help you write a script I think we need more details. But I'm guessing you'll want to use the print feature of awk. To print out the nth column of a file "your_file" do:
awk '{print $n}' your_file
In solving your problem you may also want to loop over all N columns which you can do via:
for i in {1..N} ;
do
awk -v col=${i} '{print $col}' your_file ;
done
So I was trying to do some research on it, but I could not find the answer. So I know that ls -l returns all things in the folder alphabetically, whilst ls -alt returns a list of files by their modification date, though without respect to alphabetical ordering.
I tried doing ls -l -alt, and also ls -alt -l, still no luck. What is the correct way to group them together?
Edit: With example.
Say I have the following list of directories:
aalexand bam carson duong garrett hollande jjackson ksmith mkumba olandt rcs solorzan truong yoo
aalfs battiste chae echo ghamilto holly jkelly kturner mls old.2016 reichman sophia twong zbib
I want to order them by alphabet, so say aalexand comes first. However, if aalfs has been modified last. So in other words has been changed more recently (not really sure how to structure this with proper grammar) it should appear first.
So if this were like a SQL query then we order by date last modified, group by directory name.
I am not sure what you want to do.
But, first of all: ls -l -alt is a double use of the -l parameter (take a look at man ls for more information about the parameters).
ls -l (l stands for list) just lists only one file per line (if you don't need the extra information like permissions, use -1 instead of -l). The -a includes hidden files. -t is for sorting by modified time. You cannot sort by name AND by time, except if two files would have the same name, which is not posible. Could you please explain your wish further?
Maybe you include a short example list of files including their modified time and your desired output, maybe then I can understand.
grep -F -f file1 file2
file1 is 90 Mb (2.5 million lines, one word per line)
file2 is 45 Gb
That command doesn't actually produce anything whatsoever, no matter how long I leave it running. Clearly, this is beyond grep's scope.
It seems grep can't handle that many queries from the -f option. However, the following command does produce the desired result:
head file1 > file3
grep -F -f file3 file2
I have doubts about whether sed or awk would be appropriate alternatives either, given the file sizes.
I am at a loss for alternatives... please help. Is it worth it to learn some sql commands? Is it easy? Can anyone point me in the right direction?
Try using LC_ALL=C . It turns the searching pattern from UTF-8 to ASCII which speeds up by 140 time the original speed. I have a 26G file which would take me around 12 hours to do down to a couple of minutes.
Source: Grepping a huge file (80GB) any way to speed it up?
So what I do is:
LC_ALL=C fgrep "pattern" <input >output
I don't think there is an easy solution.
Imagine you write your own program which does what you want and you will end up with a nested loop, where the outer loop iterates over the lines in file2 and the inner loop iterates over file1 (or vice versa). The number of iterations grows with size(file1) * size(file2). This will be a very large number when both files are large. Making one file smaller using head apparently resolves this issue, at the cost of not giving the correct result anymore.
A possible way out is indexing (or sorting) one of the files. If you iterate over file2 and for each word you can determine whether or not it is in the pattern file without having to fully traverse the pattern file, then you are much better off. This assumes that you do a word-by-word comparison. If the pattern file contains not only full words, but also substrings, then this will not work, because for a given word in file2 you wouldn't know what to look for in file1.
Learning SQL is certainly a good idea, because learning something is always good. It will hovever, not solve your problem, because SQL will suffer from the same quadratic effect described above. It may simplify indexing, should indexing be applicable to your problem.
Your best bet is probably taking a step back and rethinking your problem.
You can try ack. They are saying that it is faster than grep.
You can try parallel :
parallel --progress -a file1 'grep -F {} file2'
Parallel has got many other useful switches to make computations faster.
Grep can't handle that many queries, and at that volume, it won't be helped by fixing the grep -f bug that makes it so unbearably slow.
Are both file1 and file2 composed of one word per line? That means you're looking for exact matches, which we can do really quickly with awk:
awk 'NR == FNR { query[$0] = 1; next } query[$0]' file1 file2
NR (number of records, the line number) is only equal to the FNR (file-specific number of records) for the first file, where we populate the hash and then move onto the next line. The second clause checks the other file(s) for whether the line matches one saved in our hash and then prints the matching lines.
Otherwise, you'll need to iterate:
awk 'NR == FNR { query[$0]=1; next }
{ for (q in query) if (index($0, q)) { print; next } }' file1 file2
Instead of merely checking the hash, we have to loop through each query and see if it matches the current line ($0). This is much slower, but unfortunately necessary (though we're at least matching plain strings without using regexes, so it could be slower). The loop stops when we have a match.
If you actually wanted to evaluate the lines of the query file as regular expressions, you could use $0 ~ q instead of the faster index($0, q). Note that this uses POSIX extended regular expressions, roughly the same as grep -E or egrep but without bounded quantifiers ({1,7}) or the GNU extensions for word boundaries (\b) and shorthand character classes (\s,\w, etc).
These should work as long as the hash doesn't exceed what awk can store. This might be as low as 2.1B entries (a guess based on the highest 32-bit signed int) or as high as your free memory.
I have data from http access logs that I need to do the following:
Search for the pattern in all files in a specific directory
Write that data to another file
Check new file for uniqueness and remove duplicate entries
Data looks like this:
<IP address> - - [09/Sep/2012:17:35:39 +0000] "GET /api/v1/user/followers?user_id=577670686&access_token=666507ba-8e88-423b-83c6-9df44bee2c8b& HTTP/1.1" 200 172209 <snip>
I'm particularly interested in the numeric part of: user_id=577670686, which I would like to print to a new file (I haven't yet tried that part yet)...
I've tried to use sed, but I'm not really trying to manipulate the data, so it seems incredibly clumsy....looked at awk, but the data isn't really column-based and the $# designations didn't work for this data (it would be in $10, right?) And, I couldn't see a way to get rid of the portion of data that results from using $#. It was suggested that I use perl, so I've looked at examples in google, but it's so foreign to me. Any suggestions?
Use sed to extract relevant part, then sort an uniq pair to report:
$ sed -r 's/.*user_id=([0-9]+)&.*/\1/' access.log | sort | uniq -c
This will print all unique user_id values together with the total number of occurrences.
I have two files ...
file1:
002009092312291100098420090922111
010555101070002956200453T+00001190.81+00001295.920010.87P
010555101070002956200449J+00003128.85+00003693.90+00003128
010555101070002956200176H+00000281.14+00000300.32+00000281
file2:
002009092410521000098420090709111
010560458520002547500432M+00001822.88+00001592.96+00001822
010560458520002547500432D+00000106.68+00000114.77+00000106
In both files in every record starting with 01, the string from 3rd char to 25th char, i.e up to alphabet is the key.
Based on this key, I have to compare two files, and if there is any record matching in file 2, then I have to replace that record in file1, or else append it if it won't match.
Well, this is a fairly unspecific (and basic) programming question. We'll be better able to help us if you explain exactly what you did and where you got stuck.
Also, it looks a bit like homework, and people are wary of giving too much help on homework problems, as it might look like cheating.
To get you started:
I'd recommend Perl to solve this, but awk or another scripting language will also do. I'd recommend against sh/bash, as they are weak on text manipulation; also combining grep et al will become rather cumbersome.
First write a Perl program that filters records starting with 01. Then extract the key and put it into a hash (a Perl structure). Then output a new, combined file as required.
Using awk get the fields from 3-25 but doing something like
awk -F "" '/^01/{print $1}' file_name | cut -c 3-25 and match the first two fields with 01 from both files and get all the lines in two different buffers and compare both the buffers using for line in in a shell script.
Whenever the line in second buffer matches the first one grep the line in second buffer in first file and replace the line in first file with the line in second. I think you need to work a bit around the logic.