Spacy.io Entity Linker "not enough values to unpack (expected 2, got 0)" - python-3.6

I have being trying to use Spacy.io's Wikipedia Entity Linker posted here.
When running "wikidata_train_entity_linker.py" I got the following error at the 3rd epoch.
I need help understanding why I am getting the error below. I googled and the only mention of a similar problem did not include a solution.
2020-09-03 17:54:31,725 - INFO - entity_linker_evaluation - Counts: {'EVENT': 2409, 'GPE': 16137, 'NORP': 2601, 'ORG': 12739, 'PERSON': 23443}
2020-09-03 17:54:31,725 - INFO - entity_linker_evaluation - Random: F-score = 0.331 | Recall = 0.199 | Precision = 0.983 | F-score by label = {'EVENT': 0.9166104742638795, 'GPE': 0.5135877024430415, 'NORP': 0.2743334404111789, 'ORG': 0.2596817157297999, 'PERSON': 0.11490371085112372}
2020-09-03 17:54:31,725 - INFO - entity_linker_evaluation - Prior: F-score = 0.331 | Recall = 0.199 | Precision = 0.983 | F-score by label = {'EVENT': 0.9166104742638795, 'GPE': 0.5135877024430415, 'NORP': 0.2743334404111789, 'ORG': 0.2596817157297999, 'PERSON': 0.11490371085112372}
2020-09-03 17:54:31,725 - INFO - entity_linker_evaluation - Oracle: F-score = 0.332 | Recall = 0.199 | Precision = 1.0 | F-score by label = {'EVENT': 0.91681654676259, 'GPE': 0.5161379310344828, 'NORP': 0.2820343461030383, 'ORG': 0.2596994535519126, 'PERSON': 0.11490833065294308}
Traceback (most recent call last):
File "wikidata_train_entity_linker.py", line 226, in <module>
plac.call(main)
File "/Users/eliranboraks/opt/anaconda3/envs/spacy/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Users/eliranboraks/opt/anaconda3/envs/spacy/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "wikidata_train_entity_linker.py", line 172, in main
docs, golds = zip(*train_batch)
ValueError: not enough values to unpack (expected 2, got 0)
The command I used is python3 wikidata_train_entity_linker.py -o output_lt_2m_model -l "FAC,LOC,PRODUCT,WORK_OF_ART,LAW,LANGUAGE,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL" -t 500000 -d 50000 output_lt_2m
The knowledge base directory was created successfully.
2020-09-03 12:13:02,283 - INFO - train_descriptions - Trained entity descriptions on 2155 (non-unique) descriptions across 5 epochs
2020-09-03 12:13:02,283 - INFO - train_descriptions - Final loss: 0.8585907478066995
2020-09-03 12:13:02,283 - INFO - kb_creator - Getting entity embeddings
2020-09-03 12:13:02,535 - INFO - train_descriptions - Encoded: 431 entities
2020-09-03 12:13:02,535 - INFO - kb_creator - Adding 431 entities
2020-09-03 12:13:02,544 - INFO - kb_creator - Adding aliases from Wikipedia and Wikidata
2020-09-03 12:13:02,544 - INFO - kb_creator - Adding WP aliases
2020-09-03 12:13:02,651 - INFO - __main__ - kb entities: 431
2020-09-03 12:13:02,651 - INFO - __main__ - kb aliases: 326
2020-09-03 12:13:05,640 - INFO - __main__ - Done!
Environment:
MacOS Catalina
Python 3.6
Spacy.io 2.3.2
Platform: Darwin-19.5.0-x86_64-i386-64bit

This error indicates that for a certain training batch, the algorithm couldn't find appropriate gold links to train on. I'm afraid you'll have to dig a bit into the code and your data to see what is going on. It looks like you have a relatively small KB. You can only have gold links if you have an NER in the pipeline that hits entries from that KB. If that doesn't happen, the EL algorithm doesn't have any data to work with, and throws this (unfortunately quite ugly) error.
You could try moving this line
docs, golds = zip(*train_batch)
to directly below it, within the try block. Then the error should be logged, but hopefully the training will continue. That will show you whether the problem is just within that one training batch, or more general.

Related

How to resolve error " Hunk #2 FAILED at 456. 1 out of 2 hunks FAILED"

I am trying to run the following command in ubuntu terminal
patch -p0 -i adjustmentFile.patch
That is giving the following error
patching file ./src/helpStructures/CastaliaModule.cc
patching file ./src/node/communication/mac/tunableMac/TunableMAC.cc
Hunk #2 FAILED at 456.
1 out of 2 hunks FAILED -- saving rejects to file ./src/node/communication/mac/tunableMac/TunableMAC.cc.rej
I tried almost all the ways suggested in the link Hunk #1 FAILED at 1. What's that mean?. However, nothing worked.
Here is my version detail
VIM - Vi IMproved 8.0 (2016 Sep 12, compiled Jun 06 2019 17:31:41)
Included patches: 1-1453
The patch file:
diff -r -u ./src/helpStructures/CastaliaModule.cc ./src/helpStructures/CastaliaModule.cc
--- ./src/helpStructures/CastaliaModule.cc 2010-12-09 09:56:47.000000000 -0300
+++ ./src/helpStructures/CastaliaModule.cc 2011-12-20 00:16:39.944320051 -0300
## -180,6 +180,8 ##
classPointers.resourceManager = getParentModule()->getParentModule()->getSubmodule("ResourceManager");
else if (name.compare("SensorManager") == 0)
classPointers.resourceManager = getParentModule()->getSubmodule("ResourceManager");
+ else if (name.compare("Routing") == 0)
+ classPointers.resourceManager = getParentModule()->getParentModule()->getSubmodule("ResourceManager");
else
opp_error("%s module has no rights to call drawPower() function", getFullPath().c_str());
if (!classPointers.resourceManager)
Only in ./src/helpStructures: CastaliaModule.cc~
diff -r -u ./src/node/communication/mac/tunableMac/TunableMAC.cc ./src/node/communication/mac/tunableMac/TunableMAC.cc
--- ./src/node/communication/mac/tunableMac/TunableMAC.cc 2011-03-30 02:14:34.000000000 -0300
+++ ./src/node/communication/mac/tunableMac/TunableMAC.cc 2011-12-19 23:57:43.894686687 -0300
## -405,6 +405,8 ##
void TunableMAC::fromRadioLayer(cPacket * pkt, double rssi, double lqi)
{
TunableMacPacket *macFrame = dynamic_cast <TunableMacPacket*>(pkt);
+ macFrame->getMacRadioInfoExchange().RSSI = rssi;
+ macFrame->getMacRadioInfoExchange().LQI = lqi;
if (macFrame == NULL){
collectOutput("TunableMAC packet breakdown", "filtered, other MAC");
return;
## -454,7 +456,8 ##
}
case DATA_FRAME:{
- toNetworkLayer(macFrame->decapsulate());
+ cPacket *netPkt = decapsulatePacket(macFrame);
+ toNetworkLayer(netPkt);
collectOutput("TunableMAC packet breakdown", "received data pkts");
if (macState == MAC_STATE_RX) {
cancelTimer(ATTEMPT_TX);
Only in ./src/node/communication/mac/tunableMac: TunableMAC.cc~
Patching takes some changes made to a file X, and applies them to a different instance of file X. That is, suppose you start with generation 1 of file X; you make changes to get generation 2-a, and someone else starts with generation 1 to make generation 2-b. Now you want to take his edits that created his generation 2-b, and apply them to your generation 2-a.
If 'his' changes clash with 'your' changes, they cannot be automatically patched.
You'll need to look at the changes being made in hunk 2.
- toNetworkLayer(macFrame->decapsulate());
+ cPacket *netPkt = decapsulatePacket(macFrame);
+ toNetworkLayer(netPkt);
and figure out what you want the result to look like. Someone needs to know what the result is supposed to be. You can't resolve conflicts without knowledge of intent.

How can i make a python file run another which is not in same directory?

This is my code...I have tried many solutions given on stackoverflow but i am not able to run another python file which is not in same directory
f.py
import os
print "Start..."
file='"C:\Users\Mohit\Desktop\ML PROJECT\Practical Session on R\Practical
Session on R\Session II - Regression\run.py"'
os.system('python file ')
print "Done"
Output:
Fatal error: cannot open file 'file': No such file or directory
run.py
import os
print "Start..."
files=r'"C:\Users\Mohit\Desktop\ML PROJECT\Practical Session on R\Practical
Session on R\Session II - Regression\decisionTree.R"'
os.system('Rscript '+files)
print "Done"
I get the desired result when i run run.py.
Output:
C:\Users\Mohit\Desktop\ML PROJECT\Practical Session on R\Practical Session
on R\
Session II - Regression>python r.py
Start...
START
elapsed
0.17
Step 1: Library Inclusion
Step 2: Variable Declaration[1]
[1] "regressionDataSet.csv"
Step 3: Data Loading RMSD
irNumber
14395 1.167 5364.27 4616.64
15435 2.500 25468.10 475274.00 -
8739 4.039 5921.03 7071.89
11586 0.000 14192.70 75413.00
13765 0.000 17432.40 245147.00 -
10087 3.814 7333.51 26580.70
[1] 16382
[1] "RMSD" "Area"
[5] "SS" "ResidueLeng
Step 4: Counting dataset[1] 1638
Step 5: Choose Target Variable[1
Step 6: Choose Inputs Variable[1
"SS"
[5] "ResidueLength" "PairNumber"
[1] 6
Step 7: Select training dataset
th PairNumber RMSD
14395 5364.27 4616.64 -696.
15435 25468.10 475274.00 -10600.
8739 5921.03 7071.89 -1658.
11586 14192.70 75413.00 -7939.
13765 17432.40 245147.00 -10885.
10087 7333.51 26580.70 -2746.
[1] 8191
Step 8: Select testing dataset
PairNumber RMSD
10216 20510.40 254178.0 44500.0
2385 9981.42 28499.0 -4789.0
1886 21107.10 192443.0 -4860.0
13684 17765.00 76543.0 -8164.0
10319 6308.91 10287.3 -3370.0
13088 5844.34 11139.7 -1879.7
[1] 8192
Step 9: Model Building -> decis
ength + PairNumber
n= 8191
node), split, n, deviance, yval
* denotes terminal node
1) root 8191 23819.88000 2.364
2) Energy< -6161.885 1592 4
4) ResidueLength< 387.5 74
8) Energy< -7785.3 262
9) Energy>=-7785.3 481
5) ResidueLength>=387.5 84
10) Energy< -12910.5 173
11) Energy>=-12910.5 676
22) ResidueLength< 467.
44) ResidueLength>=39
45) ResidueLength< 39
23) ResidueLength>=467.
3) Energy>=-6161.885 6599 17
6) SS< 21.5 1584 3795.813
7) SS>=21.5 5015 13126.850
14) Energy< -3971.125 163
28) ResidueLength< 192.
29) ResidueLength>=192.
58) ResidueLength< 26
116) Energy< -4680.5
117) Energy>=-4680.5
59) ResidueLength>=26
15) Energy>=-3971.125 337
30) Area< 8601.98 1655
60) Energy< -2933.36
61) Energy>=-2933.36
31) Area>=8601.98 1721
Step 10: Prediction using -> de
13684 10319 13088
3.24618361 0.51722167 3.10727318
Step 11: Extracting Actual[1] 3.
Step 12: Model Evaluation[1] 0.5
[1] 0.25
[1] 1.19
[1] 48.19
elapsed
0.53
null device
1
modelName r R rm
elapsed decisionTree 0.5 0.25 1.
Step 13: Writing to file
Step 14: Saving the Model -> dec
Done
Total Time Taken: 0.53 secDone
But i get the following error when i run it through f.py
Error:
Start...
Start...
START
elapsed
0.18
Step 1: Library Inclusion
Step 2: Variable Declaration[1] "decisionTree"
[1] "regressionDataSet.csv"
Step 3: Data LoadingError in file(file, "rt") : cannot open the connec
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'regressionDataSet.csv': No such file or directory
Put an r in front of your file path, so that the backslashes \ don't escape strings where you don't want them to be escaped (see also this question).
Secondly os.system('python file') will literally execute python file, your variable isn't being used.
Also make sure there's no linebreak your file path (just one long line).
import os
print "Start..."
folder = r'C:\Users\Mohit\Desktop\ML PROJECT\Practical Session on R\Practical Session on R\Session II - Regression'
file= r'"{}\run.py"'.format(folder)
os.chdir(folder)
os.system('python '+file)
print "Done"

Restore vgg16 network in tensorflow

This one has been giving me a headache for quite some time now, even though it seems to be very basic.
I have the vgg16 network downloaded as a .cpkt
(from https://github.com/tensorflow/models/blob/master/slim/README.md#Pretrained)
Now what I want to do is loading for example the tensor of the first convolution layer of this network as an array in R.
I tried
restorer = tf$train$Saver()
sess = tf$Session()
restorer$restore(sess, "/home/beheerder/R/vgg_16.ckpt")
But then I do not see any variables apearing in my enviroment.
I'm working in R, but an awnser in Python is OK as well, as I can probably translate it to R.
Saver takes the variables to restore in constructor. In other words, you have to create the variables before you can restore them. Here is the example from Saver's doc:
v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')
# Pass the variables as a dict:
saver = tf.train.Saver({'v1': v1, 'v2': v2})
# Or pass them as a list.
saver = tf.train.Saver([v1, v2])
If you were to run the first line of your code in python you would get:
In [1]: import tensorflow as tf
In [2]: saver = tf.train.Saver()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-18da33d742f9> in <module>()
----> 1 saver = tf.train.Saver()
/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.pyc in __init__(self, var_list, reshape, sharded, max_to_keep, keep_checkpoint_every_n_hours, name, restore_sequentially, saver_def, builder, defer_build, allow_empty, write_version, pad_step_number)
1054 self._pad_step_number = pad_step_number
1055 if not defer_build:
-> 1056 self.build()
1057 if self.saver_def:
1058 self._check_saver_def()
/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.pyc in build(self)
1075 return
1076 else:
-> 1077 raise ValueError("No variables to save")
1078 self._is_empty = False
1079 self.saver_def = self._builder.build(
ValueError: No variables to save
You can see how model variables are created before being restored in the 20 lines starting from https://github.com/tensorflow/models/blob/master/slim/train_image_classifier.py#L338
This code gets executed if you make a call to train_image_classifier.py similar to the flower example in https://github.com/tensorflow/models/blob/master/slim/README.md#fine-tuning-a-model-from-an-existing-checkpoint

Snakemake: MissingInputException in snakemake pipeline

I'm trying a SnakeMake pipeline and I'm stucked on an error I really don't understand.
I've got a directory (raw_data) in which I have the input files :
ll /home/nico/labo/etudes/Optimal/data/raw_data
total 41M
drwxrwxr-x 2 nico nico 4,0K mars 6 16:09 ./
drwxrwxr-x 11 nico nico 4,0K mars 6 16:14 ../
-rw-rw-r-- 1 nico nico 15M févr. 27 12:21 sampleA_R1.fastq.gz
-rw-rw-r-- 1 nico nico 19M févr. 27 12:22 sampleA_R2.fastq.gz
-rw-rw-r-- 1 nico nico 3,4M févr. 27 12:21 sampleB_R1.fastq.gz
-rw-rw-r-- 1 nico nico 4,3M févr. 27 12:22 sampleB_R2.fastq.gz
This directory contains 4 files for 2 samples.
I created a config json file for the SnakeMake pipeline named config_snakemake_Optimal_mapping_BaL.json:
{
"fastqExtension": "fastq.gz",
"fastqDir": "/home/nico/labo/etudes/Optimal/data/raw_data",
"outputDir": "/home/nico/labo/etudes/Optimal/data/mapping_BaL",
"logDir": "logs",
"reference": {
"fasta": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta",
"index": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta.bwt"
}
}
And finally the SnakeMake file snakefile_bwa_samtools.py:
import subprocess
from os.path import join
### Globals ---------------------------------------------------------------------
# A Snakemake regular expression matching fastq files.
SAMPLES, = glob_wildcards(join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]))
print(SAMPLES)
### Rules -----------------------------------------------------------------------
# Pipeline output files
rule all:
input: expand(join(config["outputDir"], "{sample}.bam.bai"), sample=SAMPLES)
# Reads alignment on reference genome and BAM file creation
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1_ID = "{sample}_R1."+config["fastqExtension"],
fq2_ID = "{sample}_R2."+config["fastqExtension"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1_ID} and {input.fq2_ID} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
# Sorting the BAM files on genomic positions
rule bam_sort:
input:
join(config["outputDir"], "{sample}.bamUnsorted")
output:
join(config["outputDir"], "{sample}.bam")
log:
join(config["outputDir"], config["logDir"], "{sample}.samtools_sort.log")
version:
subprocess.getoutput(
"samtools --version | "
"head -1 | "
"cut -d' ' -f2"
)
message:
"Genomic sorting of {input} with samtools version {version}."
shell:
"samtools sort -f {input} {output} 2> {log}"
# Indexing the BAM files
rule bam_index:
input:
join(config["outputDir"], "{sample}.bam")
output:
join(config["outputDir"], "{sample}.bam.bai")
message:
"Indexing {input}."
shell:
"samtools index {input}"
I run this pipeline:
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
and I've got the following error outputs:
['sampleB', 'sampleA']
MissingInputException in line 18 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
Missing input files for rule bwa_mem_to_bam:
sampleB_R1.fastq.gz
sampleB_R2.fastq.gz
or depending the moment:
['sampleB', 'sampleA']
PeriodicWildcardError in line 40 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
The value _unsorted in wildcard sample is periodically repeated (sampleB_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted). This would lead to an infinite recursion. To avoid this, e.g. restrict the wildcards in this rule to certain values.
The samples are correctly detected as they appear in the list (first line of kind of outputs) and I'm surely messing around with the wildcards in the rule bwa_mem_to_bam, but I really don't get why..
Any clue?
I quickly looked your code.
Why didn't the first one work out?
Look when you declare fq1_ID and fq1, same for sample 2. You didn't assign the same string. For fq1 you add a repertory for the file witch is not present for fq1_ID so snakemake is searching it in the workdir (current directory if -d option is not set) a file name with your string. Beacuse these variables are in input section.
So by removing the two fq1/2_ID, it will erase all files searching problems.
Hugo
Finally, I succed with the pipeline removing the fq1_ID and fq2_ID variables in the rule bwa_mem_to_bam and replacing in the message of the rule input.fq1_ID and input.fq2_ID by input.fq1 and input.fq2.
The message is less elegant, but the pipeline is running correctly. Still doesn't understand exactly where was the mistake, if someone can explain, I'm still listening!
The correct code for rule bwa_mem_to_bam:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1} and {input.fq2} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
Thanks Hugo for checking my code and your explanation, it makes sense!
I finally get a flash idea waking up this morning (the best ones), and realized that I neglected the params part of the rule, fq1_ID and fq2_ID are not inputs but params..
I changed the code to that:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1.fastq.gz"),
fq2 = join(config["fastqDir"], "{sample}_R2.fastq.gz")
output:
temp(join(config["outputDir"],"{sample}_unsorted.bam"))
params:
fq1_ID = "{sample}_R1.fastq.gz",
fq2_ID = "{sample}_R2.fastq.gz",
ref_ID = os.path.basename(config["reference"]["fasta"])
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {params.fq1_ID} and {params.fq2_ID} on {params.ref_ID} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
And it works just fine!
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
Provided cores: 3
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 bam_index
2 bam_sort
2 bwa_mem_to_bam
7
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
1 of 7 steps (14%) done
Genomic sorting of sampleB_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB_unsorted.bam.
2 of 7 steps (29%) done
Indexing sampleB.bam.
3 of 7 steps (43%) done
4 of 7 steps (57%) done
Genomic sorting of sampleA_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA_unsorted.bam.
5 of 7 steps (71%) done
Indexing sampleA.bam.
6 of 7 steps (86%) done
localrule all:
input: /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB.bam.bai, /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA.bam.bai
7 of 7 steps (100%) done
And finally get my correct messages:
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on
BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta
with BWA version 0.7.12.

Is there an 11 digits limit for time series numbers in x12 for R?

I am trying to use the x12 function in the x12 package for R.
My problem is, when using time series object (tso) with monthly data and each observation is a large number (11 or more digits), the function is making a spec file which x12a.exe (binaries) can not read.
x12 binaries does not allow the spec file to be wider then 132 column.
In my example, the spec file have 144 columns, which I believe give me this error message in R:"ERROR: Input record longer than limit : 133".
When I am using smaller numbers (fewer columns) in the spec file, there are no problem so far. When creating the spec file on my own, when using x12-arima for windows, I have never seen the problem before, because I always use the "free" format (one observation per line) for the series in x12-arima.
My question is: How do I make the format for the time series object = "free", or some how just one observation per line, in the "Rout.spc" file, while using x12 function in the x12 package for R?
I am using R version 2.15.2 and R-studio version 0.97.318
Attached is my example code in R-studio, output in R-console, and the spec file
"Rstudio"
library(x12)
alt <- read.csv2("alt.csv",header=T)
tal <- ts(data=alt,start=c(1995,4),freq=12)
x12path <- shortPathName("C:\\Dokumenter\\X_12_Arima_Program\\x12a\\x12a.exe")
x12tal <- x12(tso=tal,automdl=T,x12path=x12path,period=12,trendma=23)
"Console"
C:\Dokumenter\Eksperimentering\x12>md gra
C:\Dokumenter\Eksperimentering\x12>C:\DOKUME~1\X_12_A~2\x12a\x12a.exe Rout -g gra
X-12-ARIMA Seasonal Adjustment Program
Version Number 0.3 Build 192
Execution began Mar 12, 2013 23.46.25
Reading input spec file from Rout.spc
Storing any program output into Rout.out
Storing any program error messages into Rout.err
ERROR: Input record longer than limit : 133
Line 6: start=1995.4
^
ERROR: Expected an real number not "111"
Program error(s) halt execution for Rout.spc
Check error file Rout.err
Error messages generated from processing the X-12-ARIMA spec file
Rout.spc:
Error in readx12Out(file, freq_series = frequency(tso), start_series = start(tso), :
Error! No proper run of x12! Check your parameter settings.
"The spec file: Rout.spc"
series{
title="R Output for X12a"
decimals=2
start=1995.4
period=12
data=(
14056669449 12785389868 12772341230 12342935128 12081332395 12110109950 12367542268 12911930417 12836340370 12214486074 12057940408 11555540809
10002847699 9199284760 8704422249 8492914782 8507816348 8470254675 8665139772 8653204621 9177471163 9676069791 9483990311 9825510541
7613345714 7168896536 7527318694 7721174940 7584049271 7586159794 7411383039 7565724342 7555103032 7148551906 7792379395 7493885451
6636374143 6390731897 6160711917 6003196233 5955867663 5868369296 5858314348 6098506333 6297774946 6074680955 6132163345 5875098456
5198306672 4891946405 4875765641 4834436461 4835096514 4804664875 4684550404 4733459404 5056773308 4912329843 5080643820 4568733581
4286693348 3898776528 3872776341 3842469172 3756957390 3782676505 3924066331 3810475969 3943259720 3665136687 3962811976 3449264257
3120637669 2813261665 2692920289 2652153941 2557247524 2658115616 2777287302 2688976703 2712004412 2596430893 2520548046 2455531008
2429263753 2187017586 2181610529 2139024441 2008850781 2049874584 2110715482 2218937956 2565352715 2635375627 2598584163 2435211675
2433625715 2350144562 2298764466 2242464445 2288528533 2532374821 2696862060 2877128057 3086285374 3309497319 3684989376 3709283880
3483967873 3294407926 3465439983 3546006197 3526166213 3625899404 3774201496 3941610691 4325836434 4466576126 4115121591 4036118609
3824882119 3552896925 3649624960 3570454122 3622089655 3662984491 3601306018 3604389348 3620162022 3401732239 3158217491 2896252892
2800864675 2630474256 2668229303 2631120097 2343131082 2163910930 2108285015 2067601541 2099699134 1803097392 1742652674 1626660618
1560369744 1448264771 1419659828 1547101381 1310783818 1358686467 1300281852 1315247637 1380387680 1286158497 1329769957 1272124521
1185603967 1125238745 1217223861 1265616553 1222054134 1279497332 1499392605 1810208712 2314301847 2908395453 3388479445 3441615991
3432688695 3691000321 3891303059 4111250935 4258776704 4586315450 5050122946 5156728599 5550332779 5769588984 5943764465 6032516246
5765718572 5521116586 5498458566 5374456514 5130561755 5219814632 5542173962 6883624616 7744043244 7913799960 7416210299 7127265644
6790509897 6562709494 6390985216 6126897801 5855125688 6259675447 6439114484 6634617502 6771498442 6674343925 6295709586 5890916431
5545655270 5315444742 5205711894 5115065476 4648229650 4724377012 4816989052 5049928441 5041395923
)
}
transform{
function=auto
}
automdl {
maxorder=(3,2)
maxdiff=(1,1)
balanced=yes
savelog=(adf amd b5m mu)
}
forecast {
}
x11{
sigmalim=(1.5,2.5)
trendma=23
excludefcst=yes
final=(user)
appendfcst=yes
savelog=all
}

Resources