I want to see the differences of 2 files that not in the local filesystem but on the web. So, i think if have to use diff, curl and some kind of piping.
Something like
curl http://to.my/file/one.js http://to.my/file.two.js | diff
but it doesn't work.
The UNIX tool diff can compare two files. If you use the <() expression, you can compare the output of the command within the indirections:
diff <(curl file1) <(curl file2)
So in your case, you can say:
diff <(curl -s http://to.my/file/one.js) <(curl -s http://to.my/file.two.js)
Some people arriving at this page might be looking for a line-by-line diff rather than a code-diff. If so, and with coreutils, you could use:
comm -23 <(curl http://to.my/file/one.js | sort) \
<(curl http://to.my/file.two.js | sort)
To get lines in the first file that are not in the second file. You could use comm -13 to get lines in the second file that are not in the first file.
If you're not restricted to coreutils, you could also use sd (stream diff), which doesn't require sorting nor process substitution and supports infinite streams, like so:
curl http://to.my/file/one.js | sd 'curl http://to.my/file.two.js'
The fact that it supports infinite streams allows for some interesting use cases: you could use it with a curl inside a while(true) loop (assuming the page gives you only "new" results), and sd will timeout the stream after some specified time with no new streamed lines.
Here's a blogpost I wrote about diffing streams on the terminal, which introduces sd.
Related
Ok so I'm still learning the command line stuff like grep and diff and their uses within the scope of my project, but I can't seem to wrap my head around how to approach this problem.
So I have 2 files, each containing hundreds of 20 character long strings. lets call the files A and B. I want to search through A and, using the values in B as keys, locate UNIQUE String entries that occur in A but not in B(there are duplicates so unique is the key here)
Any Ideas?
Also I'm not opposed to finding the answer myself, but I don't have a good enough understanding of the different command line scripts and their functions to really start thinking of how to use them together.
There are two ways to do this. With comm or with grep, sort, and uniq.
comm
comm afile bfile
comm compares the files and outputs 3 columns, lines only in afile, lines only in bfile, and lines in common. The -1, -3 switches tell comm to not print out those columns.
grep sort uniq
grep -F -v -file bfile afile | sort | uniq
or just
grep -F -v -file bfile afile | sort -u
if your sort handles the -u option.
(note: the command fgrep if your system has it, is equivalent to grep -F.)
Look up the comm command (POSIX comm
) to do this. See also Unix command to find lines common in two files.
I have a query regarding the execution of a complex command in the makefile of the current system.
I am currently using shell command in the makefile to execute the command. However my command fails as it is a combination of a many commands and execution collects a huge amount of data. The makefile content is something like this:
variable=$(shell ls -lart | grep name | cut -d/ -f2- )
However the make execution fails with execvp failure, since the file listing is huge and I need to parse all of them.
Please suggest me any ways to overcome this issue. Basically I would like to execute a complex command and assign that output to a makefile variable which I want to use later in the program.
(This may take a few iterations.)
This looks like a limitation of the architecture, not a Make limitation. There are several ways to address it, but you must show us how you use variable, otherwise even if you succeed in constructing it, you might not be able to use it as you intend. Please show us the exact operations you intend to perform on variable.
For now I suggest you do a couple of experiments and tell us the results. First, try the assignment with a short list of files (e.g. three) to verify that the assignment does what you intend. Second, in the directory with many files, try:
variable=$(shell ls -lart | grep name)
to see whether the problem is in grep or cut.
Rather than store the list of files in a variable you can easily use shell functionality to get the same result. It's a bit odd that you're flattening a recursive ls to only get the leaves, and then running mkdir -p which is really only useful if the parent directory doesn't exist, but if you know which depths you want to (for example the current directory and all subdirectories one level down) you can do something like this:
directories:
for path in ./*name* ./*/*name*; do \
mkdir "/some/path/$(basename "$path")" || exit 1; \
done
or even
find . -name '*name*' -exec mkdir "/some/path/$(basename {})" \;
I want to find string pattern in file in unix. I use below command:
$grep 2005057488 filename
But file contains millions of lines and i have many such files. What is fastest way to get pattern other than grep.
grep is generally as fast as it gets. It's designed to one thing and one thing only - and it does what it does very well. You can read why here.
However, to speed things up there are a couple of things you could try. Firstly, it looks like the pattern you're looking for is a fixed string. Fortunately, grep has a 'fixed-strings' option:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)
Secondly, because grep is generally pretty slow on UTF-8, you could try disabling national language support (NLS) by setting the environment LANG=C. Therefore, you could try this concoction:
LANG=C grep -F "2005057488" file
Thirdly, it wasn't clear in your question, but if your only trying to find if something exists once in your file, you could also try adding a maximum number of times to find the pattern. Therefore, when -m 1, grep will quit immediately after the first occurrence is found. Your command could now look like this:
LANG=C grep -m 1 -F "2005057488" file
Finally, if you have a multicore CPU, you could give GNU parallel a go. It even comes with an explanation of how to use it with grep. To run 1.5 jobs per core and give 1000 arguments to grep:
find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
To grep a big file in parallel use --pipe:
< bigfile parallel --pipe grep STRING
Depending on your disks and CPUs it may be faster to read larger blocks:
< bigfile parallel --pipe --block 10M grep STRING
grep works faster than sed.
$grep 2005057488 filename
$sed -n '/2005057488/p' filename
Still Both works to get that particular string in a file
sed -n '/2005057488/p' filename
Not sure if this is faster than grep though.
I prefer not to create new files. I want to accomplish something similar to:
cmd1 > a
cmd2 > b
cat a b b | sort | uniq -u
but without using files a and b.
Unix utilities are generally file oriented, so nothing quite does what you want.
However, zsh can autocreate temporary files with the following syntax:
diff =(cmd1) =(cmd2)
It can also create temporary named pipes (or use the special files /dev/fdn to reference anonymous pipes) with
diff <(cmd1) <(cmd2)
However, many diffs call lseek() on their input, so won't work with named pipes.
(diff is in general a more useful command for comparing very similar output than your pipeline above.)
See the "process substitution" section of the "zshexpn" man page for more details.
I have a bunch of commands I would like to execute in parallel. The commands are nearly identical. They can be expected to take about the same time, and can run completely independently. They may look like:
command -n 1 > log.1
command -n 2 > log.2
command -n 3 > log.3
...
command -n 4096 > log.4096
I could launch all of them in parallel in a shell script, but the system would try to load more than strictly necessary to keep the CPU(s) busy (each task takes 100% of one core until it has finished). This would cause the disk to thrash and make the whole thing slower than a less greedy approach to execution.
The best approach is probably to keep about n tasks executing, where n is the number of available cores.
I am keen not to reinvent the wheel. This problem has already been solved in the Unix make program (when used with the -j n option). I was wondering if perhaps it was possible to write generic Makefile rules for the above, so as to avoid the linear-size Makefile that would look like:
all: log.1 log.2 ...
log.1:
command -n 1 > log.1
log.2:
command -n 2 > log.2
...
If the best solution is not to use make but another program/utility, I am open to that as long as the dependencies are reasonable (make was very good in this regard).
Here is more portable shell code that does not depend on brace expansion:
LOGS := $(shell seq 1 1024)
Note the use of := to define a more efficient variable: the simply expanded "flavor".
See pattern rules
Another way, if this is the single reason why you need make, is to use -n and -P options of xargs.
First the easy part. As Roman Cheplyaka points out, pattern rules are very useful:
LOGS = log.1 log.2 ... log.4096
all: $(LOGS)
log.%:
command -n $* > log.$*
The tricky part is creating that list, LOGS. Make isn't very good at handling numbers. The best way is probably to call on the shell. (You may have to adjust this script for your shell-- shell scripting isn't my strongest subject.)
NUM_LOGS = 4096
LOGS = $(shell for ((i=1 ; i<=$(NUM_LOGS) ; ++i)) ; do echo log.$$i ; done)
xargs -P is the "standard" way to do this.
Note depending on disk I/O you may want to limit to spindles rather than cores.
If you do want to limit to cores note the new nproc command in recent coreutils.
With GNU Parallel you would write:
parallel command -n {} ">" log.{} ::: {1..4096}
10 second installation:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
Learn more: http://www.gnu.org/software/parallel/parallel_tutorial.html https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1