I was trying to store multiple diff outputs in one .patch file to keep versioning in one file instead of running multiple
diff -u f1 f2 > f1.patch
commands. Preferably I'd keep running
diff -u[other params?] f1 f2 >> f1.patch
to have one file containing all changes which would allow me to later on run patch on those files to have a f1 file available in any given moment.
Unfortunately patch fails with file generated in such manner. It only seems to apply first patch from the file and then quits with error.
My question: is that possible with diff and patch? And if so, how?
Thank you in advance.
Related
I need to download many files from a server (specifically tectia) ideally using the ssh package. These files all follow the a predictable pattern across multiple sub folders. The filepath is formatted like this
/directory/subfolder/A001/abcde001.csv
Where A001 counts up alongside the last 3 digits of the filename (/A002/abcde002.csv and so on)
In the vignette for scp_download it states that the files parameter may contain wildcards so I have tried to do something like
scp_download(session, "/directory/subfolder/A.*/abcde.*[.]csv", to=tempdir())
and
scp_download(session, "directory/subfolder/A\\d{3}/abcde\\d{3}[.]csv", to=tempdir())
but no matter which combination of patterns or wildcards I can think of (which isn't many) I only get something like
Warning: SSH warning: scp: /directory/subfolder/A\d{3}/abcde\d{3}[.]csv: No such file or directory
What I'm hoping to do is either find a way to do pattern matching here, or to find a way to store tectia directories as a string to be read by scp_download. I've made sure that my session is connected properly and it works without attempting to pattern match, which it does.
I had the same problem. The problem is that when you use * in your pattern it gets escaped when you send it to the server. However, when you request a special file name like this /directory/subfolder/A001/abcde001.csv, it works fine.
Finally I changed my code based on the below steps:
I got the list of files/folders using ls command with ssh_exec_wait function and then store them on a variable.
Download files in the variable separately
session <- ssh_connect("username#ip",passwd="password")
files<-capture.output(ssh_exec_wait(session, command = 'ls /directory/subfolder/A001/*'))
dnc1<- scp_download(session, files[1], to = paste0(getwd(),"/data/"))
dnc2<- scp_download(session, files[2], to = paste0(getwd(),"/data/"))
dnc3<- scp_download(session, files[3], to = paste0(getwd(),"/data/"))
The bottom 3 commands can be done in a loop as this could be hundreds or thousands of records.
I have a series of sequential directories to gather files from on a linux server that I am logging into remotely and processing from an R terminal.
/r18_060, /r18_061, ... /r18_118, /r18_119
Each directory is for the day of year the data was logged on, and it contains a series of files with standard prefix such as "fl.060.gz"
I have to supply a function that contains multiple system() commands with a linux glob for the day. I want to divide the year into 60-day intervals to make the QA/QC more manageable. Since I'm crossing from 099 - 100 in the glob, I have to use brace expansion to match the correct sequence of days.
ls -d /root_driectory/r18_{0[6-9]?,1[0-1]?}
ls -d /root_driectory/r18_{060..119}
All of these work fine when I manually input these globs into my bash shell, but I get an error when the system() function provides a similar command through R.
day_glob <- {060..119}
system(paste("zcat /root_directory/r_18.", day_glob, "/fl.???.gz > tmpfile", sep = "")
>gzip: cannot access '/root_directory/r18_{060..119}': No such file or directory
I know that this could be an error in the shell that the system() function operates in, but when I query that it gives the correct environment and user name
system("env | grep ^SHELL=")
>SHELL=/bin/bash
system("echo $USER")
>tgw
Does anyone know why this fails when it is passed through R's system() command? What can I do to get around this problem without removing the system call altogether? There are many scripts that rely on these functions, and re-writing the entire family of R scripts would be time prohibitive.
Previously I had been using 50-day intervals which avoids this problem, but I thought this should be something easy to change, and make one less iteration of my QA/QC scripts per year. I'm new to the linux OS so I figured I might just be missing something obvious.
I have lots (millions) of small log files in s3 in with its name (date/time) helping to define it i.e. servername-yyyy-mm-dd-HH-MM. e.g.
s3://my_bucket/uk4039-2015-05-07-18-15.csv
s3://my_bucket/uk4039-2015-05-07-18-16.csv
s3://my_bucket/uk4039-2015-05-07-18-17.csv
s3://my_bucket/uk4039-2015-05-07-18-18.csv
...
s3://my_bucket/uk4339-2015-05-07-19-23.csv
s3://my_bucket/uk4339-2015-05-07-19-24.csv
...
etc
From EC2, using the AWS CLI, I would like to simultaneously download all files that are have the minute equal 16 for 2015, for all only server uk4339 and uk4338
Is there a clever way to do this?
Also if this is a terrible file structure in s3 to query data, I would be extremely grateful for any advice on how to set this up better.
I can put a relevant aws s3 cp ... command into a loop in a shell/bash script to sequentially download the relevant files but, was wondering if there was something more efficient.
As an added bonus I would like to row bind the results together too as one csv.
A quick example of a mock csv file can be generated in R using this line of R code
R> write.csv(data.frame(cbind(a1=rnorm(100),b1=rnorm(100),c1=rnorm(100))),file='uk4339-2015-05-07-19-24.csv',row.names=FALSE)
The csv that is created is uk4339-2015-05-07-19-24.csv. FYI, I will be importing the combined data into R at the end.
As you didn't answer my questions, nor indicate what OS you use, it is somewhat hard to make any concrete suggestions, so I will briefly suggest you use GNU Parallel to parallelise your S3 fetch requests to get around the latency.
Suppose you somehow generate a list of all the S3 files you want and put the resulting list in a file called GrabMe.txt like this
s3://my_bucket/uk4039-2015-05-07-18-15.csv
s3://my_bucket/uk4039-2015-05-07-18-16.csv
s3://my_bucket/uk4039-2015-05-07-18-17.csv
s3://my_bucket/uk4039-2015-05-07-18-18.csv
Then you can get them in parallel, say 32 at a time, like this:
parallel -j 32 echo aws s3 cp {} . < GrabMe.txt
or if you prefer reading left-to-right
cat GrabMe.txt | parallel -j 32 echo aws s3 cp {} .
You can obviously alter the number of parallel requests from 32 to any other number. At the moment, it just echoes the command it would run, but you can remove the word echo when you see how it works.
There is a good tutorial here, and Ole Tange (the author of GNU Parallel) is on SO, so we are in good company.
I've created a mapping in MapForce 2013 and exported the MFX file. Now, I need to be able to run the mapping using MapForce Server. The problem is, I need to specify both the input EDI file and the output file. As far as I can tell, the usage pattern is to run the mapping with MapForce server using the input/output configuration in the MFX itself, not passed in on the command line.
I suppose I could change the input/output to some standard file name and then just write the input file to that path before performing the mapping, and then grab the output from the standard output file path when the mapping is complete.
But I'd prefer to be able to do something like:
MapForceServer run -in=MyInputFile.txt -out=MyOutputFile.xml MyMapping.mfx > MyLogFile.txt
Is something like this possible? Perhaps using parameters within the mapping?
There are two options that I've come across in dealing with a similar situation.
Option 1- If you set the input XML file to *.xml in the component settings, mapforceserver.exe will automatically process all txt in the directory assuming your source is xml (this should work for text just the same). Similar to the example below you can set a cleanup routine to move the files into another folder after processing.
Note: It looks in the folder where the schema file is located.
Option 2 - Since your output is XML you can use Altova's raptorxml (rack up another license charge). Now you can generate code in XSLT 2.0 and use a batch file to automatically execute, something like this.
::#echo off
for %%f IN (*.xml) DO (RaptorXML xslt --xslt-version=2 --input="%%f" --output="out/%%f" %* "mymapping.xslt"
if NOT errorlevel 1 move "%%f" processed
if errorlevel 1 move "%%f" error)
sleep 15
mymapping.bat
I tossed in a sleep command to loop the batch for rechecking every 15 seconds. Unfortunately this does not work if your output target is a database.
I have a slew of CSS files to go through where someone just grunted through making alterations to various core stylesheets on a number of subsites. Obviously if the original developer had had some foresight they would have just included a master stylesheet and overridden the necessary elements…
I first started off with comm thinking that it might do the trick, but quickly found that it needed to receive a sorted input file.
I then switched over to diff and have gotten down to the following through some reading and research:
diff --unchanged-group-format="## %dn,%df%c'\012'%<" --old-group-format='' --new-group-format='' --changed-group-format='' file_1.css file_2.css
The previous obviously is almost there, but:
A) I need to grep out the ## lines (which should be fine, right? At first glance this appears right, but does diff throw in any other unexpected lines that need to be yanked?) and then
B) I need to create two more files that first is the leftover unique lines from file_1.css and then the leftover unique lines of file_2.css.
Obviously the first "in common" file will go into an include folder and then be included into the two latter created files as a #import url("common.css");
I am thinking that the following simple alteration will create the latter two files to which I'm referring:
diff --unchanged-group-format='' --old-group-format="## %dn,%df%c'\012'%<" --new-group-format='' --changed-group-format='' file_1.css file_2.css
diff --unchanged-group-format='' --old-group-format='' --new-group-format="## %dn,%df%c'\012'%<" file_1.css file_2.css
Sample files:
file 1: https://gist.github.com/c13843972c47b5037704
file 2: https://gist.github.com/fff39eae386e8969dc10
So for example, upon executing a test of the following:
diff --unchanged-group-format="## %dn,%df%c'\012'%<" --old-group-format='' --new-group-format='' --changed-group-format='' file_1.css file_2.css | egrep -v "^##\d*" > common.css
diff --unchanged-group-format='' --old-group-format="## %dn,%df%c'\012'%<" --new-group-format='' --changed-group-format='' file_1.css file_2.css | egrep -v "^##\d*" > old.css
And then searching for body with egrep "^body" *css, it yielded only a body in common.css and none in old.css, whereas it showed that there were two different entries in file_1.css and file_2.css. So obviously this methodology is flawed.
How would one about creating these two files that would ultimately become the common include and the override files?
#ylluminate, you have a couple of options:
use BeyondCompare to visually verify the differences. It does a fantastic job comparing similar files. It allows saving common lines/left only lines/right only lines. Only downside is it is interactive and if you have a lot of files, will take some time. On the positive side, it looks like you want to build trust first by testing it out a few times.
Add formatting text for --changed-group-format and capture modified code (and the old code as your command does it now). You need to run one more comparison to get what is in new code but not in old code. Downside here is the validation is going to be hard.
Saving all the lines in a database table and comparing columns is another option. Take care to store old and new line numbers. Downsides are the data lines need to be unique, blank lines will be chopped off.
I would go with option 1 if i have less than 50 files.
Hope this helps.
PS: I am not associated with BeyondCompare in any way. just a happy user of the software