R TSP Package with Concorde - r

I have the TSP package installed and working.
I downloaded all the files from the Concorde (TSP/Waterloo) websites. I tried different versions. Even extracted all the files.
I put the files in my R working directory.
Finally, when running concorde_path() it was able to pick up that the files are found.
However, when I run concorde_help() I receive an error.
I got a cygwin1.dll not found error. So I installed cygwin.
I still get an error.
I tried putting all the concorde files in the bin folder of cygwin (where cygwin1.dll lives) and pointing the R working directory and the concorde_path() there and I get a status 123 error.
I also have the Concorde windows app downloaded and it does work. I found another post suggesting that it has to work in order for it to work within R.
Running R/Rstudio under Windows
Thank you for any suggestions and help you may have.

I just got this working for TSPMap so hopefully it helps someone out.
Concorde for windows doesn't appear to have a command line interface that works with the TSP package.. This is where Cygwin comes in because that version of concorde can work on the command line and interface with the TSP package.
You really need to get it working in cygwin first.. So you need to get cygwin console up and running.
If you've got the console working you can download and gunzip the concorde binary and test that.. Just running ./concorde.exe within cygwin should display the help for the program.
Another test is to use the following test file and see if that works through concorde.
NAME: TEST
TYPE: TSP
DIMENSION: 6
EDGE_WEIGHT_TYPE: EXPLICIT
EDGE_WEIGHT_FORMAT: FULL_MATRIX
EDGE_WEIGHT_SECTION:
0 1 2 1 1 2
1 0 1 2 2 1
2 1 0 1 2 1
1 2 1 0 1 2
1 2 2 1 0 1
2 1 1 2 1 0
Once you get that working in cygwin, it is time to try from the command line to see if cygwin is integrated with windows. If that is fine R shouldn't have an issue with it.
This command should display the concorde command line help if your system can recognise it.
t1<- try(system("c:/cygwin64/home/davisst5/concorde.exe"))
This one tests that concorde_path should work..
concorde_path("c:/cygwin64/home/davisst5/")
found: concorde.exe
If you've done all of this already and still having issues, its possible that there might be a 32/64 bit version problem where either R or cygwin is running in a different mode and cannot call the other properly (which might be a source of the dll issues).. I've got RGui in 64 bit and Cygwin in 64 bit.. One tutorial I read said that it was critically important to install 32 bit cygwin to make it work, so thats possibly why.

I followed the answer given by "Stephen Davison" above. However, I still got the cygwin1.dll file missing error. I had cygwin 32 bit installed in C drive. What I did is that I get the cygwin1.dll file from the C:\cygwin\bin folder and paste it in my R working directory, which is E:\RA\Concorde_Code
Then in my R-studio (64 bit installed), I ran the following code to check whether the Concorde is working or not
concordePath = "E:/RA/Concorde_Code/"
concorde_path(concordePath)
It says the following
found: concorde concorde.exe
Then I ran the following code
concorde_help()
It gave me following output
The following options can be specified in solve_TSP with method "concorde" using clo in control:
/Concorde_Code/concorde
Usage: /Concorde_Code/concorde [-see below-] [dat_file]
-B do not branch
-C # maximum chunk size in localcuts (default 16)
-d use dfs branching instead of bfs
-D f edgegen file for initial edge set
-e f initial edge file
-E f full edge file (must contain initial edge set)
-f write optimal tour as edge file (default is tour file)
-F f read extra cuts from file
-g h be a grunt for boss h
-h be a boss for the branching
-i just solve the blossom polytope
-I just solve the subtour polytope
-J # number of tentative branches
-k # number of nodes for random problem
-K h use cut server h
-M f master file
-m use multiple passes of cutting loop
-n s problem location (just a name or host:name, not a file name)
-o f output file name (for optimal tour)
-P f cutpool file
-q do not cut the root lp
-r # use #x# grid for random points, no dups if #<0
-R f restart file
-s # random seed
-S f problem file
-t f tour file (in node node node format)
-u v initial upperbound
-U do not permit branching on subtour inequalities
-v verbose (turn on lots of messages)
-V just run fast cuts
-w just subtours and trivial blossoms
-x delete files on completion (sav pul mas)
-X f write the last root fractional solution to f
-y use simple cutting and branching in DFS
-z # dump the #-lowest reduced cost edges to file xxx.rcn
-N # norm (must specify if dat file is not a TSPLIB file)
0=MAX, 1=L1, 2=L2, 3=3D, 4=USER, 5=ATT, 6=GEO, 7=MATRIX,
8=DSJRAND, 9=CRYSTAL, 10=SPARSE, 11-15=RH-norm 1-5, 16=TOROIDAL
17=GEOM, 18=JOHNSON
This confirmed that Concorde is properly installed and working.
After the installation, I ran the TSP code to check the Concorde working using following code
tour_test <- solve_TSP(tsp_test, method = "concorde")
It is working fine now. I got the following output
Used control parameters:
clo =
exe = E:\RA\Concorde_Code\/concorde
precision = 6
verbose = TRUE
keep_files = FALSE
/Concorde_Code/concorde -x -o file225841777aa0.sol file225841777aa0.dat
Host: Pasha Current process id: 1193
Using random seed 1586547969
Problem Name: TSP
Generated by write_TSPLIB (R-package TSP)
Problem Type: TSP
Number of Nodes: 6
Explicit Lengths (CC_MATRIXNORM)
Optimal Solution: 60000.00
Total Running Time: 0.01 (seconds)

Related

parallel download of 7000 files

Please would you advise about an effective method to download a large number of files from EBI : https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/tree/master/tabix
We can use wget sequentially on each file. I have seen some information about using a python script : How to parallelize file downloads?
although there might be some complementary ways by using bash script or R ?
If you are not requiring R here, then the xargs command-line utility allows parallel execution. (I'm using the linux version in the findutils set of utilities. I believe this is also supported in the version of wget in git-bash. I don't know if the macos binary is installed by default nor if it includes this option, ymmv.)
For proof, I'll create a mywget script that prints the start time (and args) and then passes all arguments to wget.
(mywget)
echo "$(date) :: ${#}"
wget "${#}"
I also have a text file urllist with one URL per line (it's crafted so that I don't have to encode anything or worry about spaces, etc). (Because I'm using a personal remote server to benchmark this, and I don't that the slashdot-effect, I'll obfuscate the URLs here ...)
(urllist)
https://somedomain.com/quux0
https://somedomain.com/quux1
https://somedomain.com/quux2
First, no parallelization, simply consecutive (default). (The -a urllist is to read items from the file urllist instead of stdin. The -q is to be quiet, not required but certainly very helpful when doing things in parallel, since the typical verbose option has progress bars that will overlap each other.)
$ time xargs -a urllist ./mywget -q
Tue Feb 1 17:27:01 EST 2022 :: -q https://somedomain.com/quux0
Tue Feb 1 17:27:10 EST 2022 :: -q https://somedomain.com/quux1
Tue Feb 1 17:27:12 EST 2022 :: -q https://somedomain.com/quux2
real 0m13.375s
user 0m0.210s
sys 0m0.958s
Second, adding -P 3 so that I run up to 3 simultaneous processes. The -n1 is required so that each call to ./mywget gets only one URL. You can adjust this if you want a single call to download multiple files consecutively.
$ time xargs -n1 -P3 -a urllist ./mywget -q
Tue Feb 1 17:27:46 EST 2022 :: -q https://somedomain.com/quux0
Tue Feb 1 17:27:46 EST 2022 :: -q https://somedomain.com/quux1
Tue Feb 1 17:27:46 EST 2022 :: -q https://somedomain.com/quux2
real 0m13.088s
user 0m0.272s
sys 0m1.664s
In this case, as BenBolker suggested in a comment, parallel download saved me nothing, it still took 13 seconds. However, you can see that in the first block, they started sequentially with 9 seconds and 2 seconds in between each of the three downloads. (We can infer that the first file is much larger, taking 9 seconds, and the second file took about 2 seconds.) In the second block, all three started at the same time.
(Side note: this doesn't require a shell script at all; you can use R's system or the processx::run functions to call xargs -n1 -P3 wget -q with a text file of URLs that you create in R. So you can still do this comfortably from the warmth of your R console.)
I had a similar task and my approach was the following:
I have used python, redis and supervisord.
I have pushed to a redis list all the paths/urls of the files i needed (i just created a small py script to read my csv and push it to a Redis queue/list.)
then i have created another py script to read (pull) one item from the redis list and download it.
using supervisord, i just launched 10 paralel py files that were pulling data from redis (file paths) and downloading the files.
It might be too complicated for you, but this solution is very scalable, can use multiple servers etc.
Thank you all. I have investigated a few other ways to do it :
#!/bin/bash
############################
while read file; do
wget ${file} &
done < files.txt
###########################
while read file; do
wget ${file} -b
done < files.txt
##########################
cat files.txt | xargs -n 1 -P 10 wget -q

How should I deal with "sph2pipe command not found" error message?

I'm trying to use the sph2pipe tool to convert the SPH files into wav or mp3 files. Although I have downloaded and installed the tool downloaded from here: https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools
still don't see any program that I can use..
On windows 10, after downloading sph2pipe and click the .exe file, a window just quickly popped up and never showed up again. And then I can't find any program called sph2pipe from the system and no command named sph2pipe either.
On Mac, I downloaded the program from where I forgot, but after clicked the executable file on mac, I got this document saying
Last login: Tue May 8 18:57:21 on ttys001
Pennys-MBP:~ me$ /Users/me/Downloads/SPH/sph2pipe_v2.5/sph2pipe ; exit;
Usage: sph2pipe [-h hdr] [-t|-s b:e] [-c 1|2] [-p|-u|-a] [-f typ] infile [outfile]
default conditions (for 'sph2pipe infile'):
input file contains sphere header
output full duration of input file
output all channels from input file
output same sample coding as input file
output format is WAV on Wintel machines, SPH elsewhere
output is written to stdout
optional controls (items bracketed separately above can be combined):
-h hdr -- treat infile as headerless, get sphere info from file 'hdr'
-t b:e -- output portion between b and e sec (floating point)
-s b:e -- output portion between b and e samples (integer)
-c 1 -- only output first channel
-c 2 -- only output second channel
-p -- force conversion to 16-bit linear pcm
-u -- force conversion to 8-bit ulaw
-a -- force conversion to 8-bit alaw
-f typ -- select alternate output header format 'typ'
five types: sph, raw, au, rif(wav), aif(mac)
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
[Process completed]
But still when try to type sph2pipe on my terminal, I got the response:
-bash: sph2pipe: command not found
Can somebody help me? I need to do the conversion very soon.
Thank you!
I figured it out:
sph2pipe.exe file file.wav

GNUPlot cannot be executed after mpirun command in PBS script

I have PBS command something like this
#PBS -N marcell_single_cell
#PBS -l nodes=1:ppn=1
#PBS -l walltime=20000:00:00
#PBS -e stderr.log
#PBS -o stdout.log
# Specific the shell types
#PBS -S /bin/bash
# Specific the queue type
#PBS -q dque
#uncomment this if you want to debug the process
#set -vx
cd $PBS_O_WORKDIR
ulimit -s unlimited
NPROCS=`wc -l < $PBS_NODEFILE`
#export PATH=$PBS_O_PATH
echo This job has allocated $NPROCS nodes
echo Cleaning old files...
rm -rf *.png *.plt *.log
echo Cleaning success
/opt/Lib/openmpi-2.1.3/bin/mpirun -np $NPROCS /scratch4/marcell/CellMLSimulator/bin/CellMLSimulator -ionmodel grandi2010 -solverType CVode -irepeat 4 -dt 0.01
gnuplot -p plotting.gnu
It got error something like this, thrown by the PBS error log.
/var/spool/torque/mom_priv/jobs/6265.node01.SC: line 28: gnuplot: command not found
I've already make sure that the path of GNUPlot is already been added to the PATH environment variable.
However, the strange part is, if I interchange the sequence of command, like gnuplot first and then mpirun, there isn't any error. I suspect that some commands after mpirun need some special configs, but I dunno how to do that
Already following this solution, but no avail.
sleep command not found in torque pbs but works in shell
EDITED:
it seems that the before and after mpirun still got error. and this is the which result:
which: no gnuplot in (/opt/intel/composer_xe_2011_sp1.9.293/bin/intel64:/opt/intel/composer_xe_2011_sp1.9.293/bin/intel64:/opt/pgi/linux86-64/9.0-4/bin:/opt/openmpi/bin:/usr/kerberos/bin:/prog/tools/grace/grace/bin:/home/prog/ansys_inc/v121/fluent/bin:/bin:/usr/bin:/opt/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/opt/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/scratch7/feber/jdk1.8.0_101:/scratch7/feber/code/apache-maven/bin:/usr/local/bin:/scratch7/cml/bin)
It's strange, since when I try to find the gnuplot, there is one in the /usr/local/bin
ls -l /usr/local/bin/gnuplot
-rwxr-xr-x 1 root root 3262113 Sep 18 2017 /usr/local/bin/gnuplot
moreover, if I run those commands without PBS, it seems executed as I expected:
/scratch4/marcell/CellMLSimulator/bin/CellMLSimulator -ionmodel grandi2010 -solverType CVode -irepeat 4 -dt 0.01
gnuplot -p plotting.gnu
It's very likely that your system has different "login/head nodes" and "compute nodes". This is a commonly used practice in many supercomputing clusters. While you build and launch your application from the head node, it gets executed on one or more compute nodes.
The compute nodes can have different hardware and software compared to the head nodes. In your case, gnuplot is installed only on the head node, as you can see from the different outputs of which gnuplot. To solve this, you have three approaches:
Request the system administrators to install gnuplot on the compute nodes.
Build and install your own version of gnuplot in a file-system accessible from the compute nodes. It could be your home directory or somewhere else depending on your cluster. In general, the filesystem where your application is will be available. In your case, anywhere under /scratch4/marcell/ would probably work.
Run gnuplot on the head node after the MPI jobs finish as a post-processing step. PBS/Torque does not provide a direct way to do this. You'll need to write a separate bash (not PBS) script to do this.

Multithreaded program only runs on a single processor after compiling, how do I troubleshoot?

I am trying to run a compiled program that is supposed to be running on multiple processors. But with the same data, sometimes this program runs in parallel and sometimes it won't (with the identical PBS script file!). I am suspecting that something is wrong with some of the compute nodes that won't let it run on parallel (I don't get to choose the compute node I want). How can I troubleshoot if this is a bug in the program or it is problem with the compute node?
As per the sys admin's adivce, I am using ulimit -s 100000, but this don't change anything. Also, this program is not an mpi program (runs only on a single node, with multiple processors).
The code that I run is as follows:
quorum_error_correct_reads -q 68 \
--contaminant=/data004/software/GIF/packages/masurca/2.3.0rc1/bin/../share/adapter.jf \
-m 1 -s 1 -g 1 -a 3 --thread=32 -w 10 -e 3 \
quorum_mer_db.jf aa.renamed.fastq ab.renamed.fastq ac.renamed.fastq ad.renamed.fastq ae.renamed.fastq af.renamed.fastq ag.renamed.fastq \
--no-discard -o pe.cor --verbose
Thanks for any advice you can offer. I will greatly appreciate your help!
PS: I don't have sudo access.
EDIT: I know it is supposed to be using multiple processors because, when I SSH into the node and do top -c I can see (above command) sometimes running like 3200 % CPU (all the time) and sometimes only 100 % CPU all the time. This is the only step involved and there are no other sub-process within this program. Also, I am using HPC, where I submit the job to a compute node, each with 32 procs, 512GB RAM.

strange behavior of fc -l command

I have two unix machines, both running AIX 5.3
My $HOME is mounted on machine1.
Using NFS, login machine2 will go to the same $HOME
I login machine2 first, then machine1.
Both using telnet.
The 2 sessions will share the same .sh_history file.
I found out that the fc -l behavior very strange.
In machine2, I issue the commands in telnet:
fc -l
ksh fc -l
Both give the same output.
In machine1,
fc -l
ksh fc -l
give DIFFERENT results
The result for ksh fc -l
is the same as /usr/bin/fc -l
Also, when I run a script like this:
#!/usr/bin/ksh
fc -l
The result is same as /usr/bin/fc -l
Could anyone tell me what happened?
Alvin SIU
Ah, wisdom of the ancients... (Since this post is over a year old.)
Anyway, I just encountered this problem in Solaris 10. Issue seems to be this: When you define a function in /etc/profile, or in any file called by /etc/profile, your HISTFILE variable gets ignored by the Korn shell, and the shell instead uses ".sh_history" when accessing its history. Not sure why this is.
Result is that you see other root shell's commands. You can test it with :
lsof -p $$
or
cat /proc/$$/fd/63
It's possible that the login shell is not ksh or that $HISTFILE is being reset. One thing you can do is echo $HISTFILE in the various situations and see if it's different. Another thing to check is to see what shell you're in using ps.
Bash (default $HOME/.bash_history), for example, will have a different $HISTFILE than ksh (default $HOME/.sh_history).
Another possible reason for the difference is that the builtin fc may be able to see in-memory history that hasn't been written to disk yet (which the external /usr/bin/fc wouldn't be able to see). If this is true, it may be version dependent. Bash, for example, doesn't write history to the file until the shell exits. Ksh (at least the version I'm using) writes it immediately.

Resources