BizTalk mapping RECADV D96A - biztalk

I need to convert the CSV file to EDIFACT RECADV D96A.
The input:
REC;A;ABC;120769;4502902610;0196466358;ABC;;202003051329;OB:505+DP:8718
RECD;1;110000;45;;
RECD;1;120000;50;;
RECD;1;130000;100;;
RECD;2;200000;21;;
RECD;2;210000;12;;
And the output should be:
LIN+1++1:EN'
GIN+BJ+110000:45'
GIN+BJ+120000:50'
GIN+BJ+130000:100'
LIN+2++2:EN'
GIN+BJ+200000:21'
GIN+BJ+210000:12'
This is what I managed to do until now:(take only first GIN for each line)
LIN+1++1:EN'
GIN+BJ+110000:45'
LIN+2++2:EN'
GIN+BJ+200000:21'
How can I do to take the number for each distinct line?
Thank you in advance!

Related

How to Concatenate multiple repetitive nodes into a single node - BizTalk

I have something like this in an input XML
<OrderText>
<text_type>0012</text_type>
<text_content>Text1</text_content>
</OrderText>
<OrderText>
<text_type>ZT03</text_type>
<text_content>Text2</text_content>
</OrderText>
The above data I need to map after concatenating as the below schema
<Order>
<Note>0012:Text1#ZT03:Text2</Note>
</Order>
Can anyone please help?
I'm going to assume that your input actually has a Root node, as otherwise it is not valid XML.
<Root>
<OrderText>
<text_type>0012</text_type>
<text_content>Text1</text_content>
</OrderText>
<OrderText>
<text_type>ZT03</text_type>
<text_content>Text2</text_content>
</OrderText>
</Root>
Then all you need is a map like this
With a String Concatenate functoid with
Input[0] = text_type
Input[1] = :
Input[2] = text_content
Input[3] = #
That goes into a Cumulative Concatenate
This will give you an output of
<Order>
<Note>0012:Text1#ZT03:Text2#</Note>
</Order>
Note: There is a extra # at the end, but you could use some more functoids to trim that off if needed.
You can use the Value-Mapping Flattening functoid in a map, then feed the result of each into a Concatenate functoid to generate the result string. The map can be executed on a port or in an orchestration.

JSON format in R refuse to parse?

Here is my toy JSON:
"[[143828095,86.82525,78.50037,0.011764707,1.0,1,1],
[143828107,86.82525,78.50037,0.015686275,1.0,1,0],
[143828174,84.82802,83.49646,0.015686275,1.0,1,0],
[143828190,83.3301,92.4895,0.011764707,1.0,1,0],
[143828206,83.3301,92.4895,0.011764707,1.0,1,-1],
[143828251,119.482666,98.4848,0.03137255,1.0,2,1],
[143828325,123.30899,95.93237,0.027450982,1.0,2,0],
[143828334,128.47015,92.4895,0.027450982,1.0,2,0],
[143828351,128.47015,92.4895,0.027450982,1.0,2,-1],
[143828406,115.19141,60.514465,0.019607844,1.0,3,1],
[143828529,121.183105,61.51367,0.019607844,1.0,3,0],
[143828551,121.183105,61.51367,0.019607844,1.0,3,-1],
[143828696,105.502075,94.26935,0.023529414,1.0,8,1],
[143828773,105.502075,94.26935,0.023529414,1.0,8,-1],
[143829030,78.24274,58.18811,0.023529414,1.0,DEL,1],
[143829107,78.24274,58.18811,0.023529414,1.0,DEL,-1],
[143831178,127.47159,76.28339,0.023529414,1.0,8,1],
[143831244,127.47159,76.28339,0.023529414,1.0,8,-1]]"
Now I want to parse it (fromJSON()) but
DEL
within the JSON prevents me to do this.
Please advise how to fix it.
You can substitute "DEL" for, say, 0.
json_string <- "[[143828095,86.82525,78.50037,0.011764707,1.0,1,1], [143828107,86.82525,78.50037,0.015686275,1.0,1,0], [143828174,84.82802,83.49646,0.015686275,1.0,1,0], [143828190,83.3301,92.4895,0.011764707,1.0,1,0], [143828206,83.3301,92.4895,0.011764707,1.0,1,-1], [143828251,119.482666,98.4848,0.03137255,1.0,2,1], [143828325,123.30899,95.93237,0.027450982,1.0,2,0], [143828334,128.47015,92.4895,0.027450982,1.0,2,0], [143828351,128.47015,92.4895,0.027450982,1.0,2,-1], [143828406,115.19141,60.514465,0.019607844,1.0,3,1], [143828529,121.183105,61.51367,0.019607844,1.0,3,0], [143828551,121.183105,61.51367,0.019607844,1.0,3,-1], [143828696,105.502075,94.26935,0.023529414,1.0,8,1], [143828773,105.502075,94.26935,0.023529414,1.0,8,-1], [143829030,78.24274,58.18811,0.023529414,1.0,DEL,1], [143829107,78.24274,58.18811,0.023529414,1.0,DEL,-1], [143831178,127.47159,76.28339,0.023529414,1.0,8,1], [143831244,127.47159,76.28339,0.023529414,1.0,8,-1]]"
json_string <- gsub("DEL", 0, json_string) # You can make the zero any number you like
fromJSON(json_string)
using a json parser (http://json.parser.online.fr/), just deleting the "DEL" at the respective places seems to fix the issue.

Matching the first and last charcters in a fasta file

I have a fasta sequences like following:
fasta_sequences
seq1_1
"MTFJKASDKASWQHBFDDFAHJKLDPAL"
seq1_2
"GTRFKJDAIUETZUQOIHHASJKKJHPAL"
seq1_3
"MTFJHAZOQIIREUUBSDFHGTRF"
seq2_1
"JUZGFNBGTFCKAJDASEJIJAS"
seq2_1
"MTFHJHJASBBCMASDOEQSDPAL"
seq2_3
"RTZIIASDPLKLKLKLLJHGATRF"
seq3_1
"HMTFLKBNCYXBASHDGWPQWKOP"
seq3_2
"MTFJKASDJLKIOOIEOPWEIOKOP"
I would like to retain only those sequences which starts with MTF and ends with either KOP or TRF or PAL. At the end it should be like
seq1_1
"MTFJKASDKASWQHBFDDFAHJKLDPAL"
seq1_3
"MTFJHAZOQIIREUUBSDFHGTRF"
seq2_1
"MTFHJHJASBBCMASDOEQSDPAL"
seq3_2
"MTFJKASDJLKIOOIEOPWEIOKOP"
I tried the following code in R but it gave me which contains nothing
new_fasta=grep("^MTF.*(PAL|TRF|KOP)$")
Could anyone help how to get the desired output. Thanks in advance.
This is the way to go i guess;
For every element in fasta_sequences; (if fasta_sequences is a vector containing the sequences)
newseq = list()
it=1
for (i in fasta_sequences){
# i is seq1_1, seq1_2 etc.
a=substr(i,1,3)
if (a=="MTF"){
x=substr(i,(nchar(i)-2),nchar(i))
if ( x=="PAL" | x=="KOP" | x=="TRF"){
newseq[it]=i
it=it+1
}
}
}
Hope it helps
new_fasta=grep("^MTF.*(PAL|TRF|KOP)$",fasta_sequences,perl=True)
^^^^^^^^^
Add perl=True option.

Merging a large number of csv datasets

Here are 2 sample datasets.
PRISM-APPT_1895.csv
https://copy.com/SOO2KbCHBX4MRQbn
PRISM-APPT_1896.csv
https://copy.com/JDytBqLgDvk6JzUe
I have 100 of these types of data sets that I'm trying to merge into one data frame, export that to csv, and then merge that into another very large dataset.
I need to merge everything by "gridNumber" and "Year", creating a time series dataset.
Originally, I imported all of the annual datasets and then tried to merge them with this :
df <- join_all(list(Year_1895, Year_1896, Year_1897, Year_1898, Year_1899, Year_1900, Year_1901, Year_1902,
Year_1903, Year_1904, Year_1905, Year_1906, Year_1907, Year_1908, Year_1909, Year_1910,
Year_1911, Year_1912, Year_1913, Year_1914, Year_1915, Year_1916, Year_1917, Year_1918,
Year_1919, Year_1920, Year_1921, Year_1922, Year_1923, Year_1924, Year_1925, Year_1926,
Year_1927, Year_1928, Year_1929, Year_1930, Year_1931, Year_1932, Year_1933, Year_1934,
Year_1935, Year_1936, Year_1937, Year_1938, Year_1939, Year_1940, Year_1941, Year_1942,
Year_1943, Year_1944, Year_1945, Year_1946, Year_1947, Year_1948, Year_1949, Year_1950,
Year_1951, Year_1952, Year_1953, Year_1954, Year_1955, Year_1956, Year_1957, Year_1958,
Year_1959, Year_1960, Year_1961, Year_1962, Year_1963, Year_1964, Year_1965, Year_1966,
Year_1967, Year_1968, Year_1969, Year_1970, Year_1971, Year_1972, Year_1973, Year_1974,
Year_1975, Year_1976, Year_1977, Year_1978, Year_1979, Year_1980, Year_1981, Year_1982,
Year_1983, Year_1984, Year_1985, Year_1986, Year_1987, Year_1988, Year_1989, Year_1990,
Year_1991, Year_1992, Year_1993, Year_1994, Year_1995, Year_1996, Year_1997, Year_1998,
Year_1999, Year_2000),
by = c("gridNumber","Year"),type="full")
But R keeps crashing because I think the merge is a bit to large for it to handle, so I'm looking for something that would work better. Maybe data.table? Or another option.
Thanks for any help you can provide.
Almost nine months later and your question has no answer. I could not find your datasets, however, I will show one way to do the job. It is trivial in awk.
Here is a minimal awk script:
BEGIN {
for(i=0;i<10;i++) {
filename = "out" i ".csv";
while(getline < filename) print $0;
close(filename);
}
}
The script is run as
awk -f s.awk
where s.awk is the above script in a text file.
This script creates ten filenames: out0.csv, out1.csv ... out9.csv. These are the already-existing files with the data. The first file is opened and all records sent to the standard output. The file is then closed and the next filename created and opened. The above script has little to offer over a command line read/redirect. You would typically use awk to process a long list of filenames read from another file; with statements to selectively ignore lines or columns depending on various criteria.

Find oldest files in the directory based on the filename Timestamp in unix

I want the oldest files in the directory based on the file name date and timestamp to be listed first.
Example:
input file :
AAAG11020709581.txt
AAAG13020709581.txt
AACL11020709581.txt
AACL13020709581.txt
AAFU11020709581.txt
AAFU13020709581.txt
AAHO11020709581.txt
AAHO13020709581.txt
AAPC11020709581.txt
AAPC13020709581.txt
AAPO11020709581.txt
AAPO13020709581.txt
AATR11020709581.txt
AATR13020709581.txt
AARC11020709581.txt
AARC13020709581.txt
Expected output :
AAAG11020709581.txt
AACL11020709581.txt
AAFU11020709581.txt
AAHO11020709581.txt
AAPC11020709581.txt
AAPO11020709581.txt
AARC11020709581.txt
AATR11020709581.txt
AAAG13020709581.txt
AACL13020709581.txt
AAFU13020709581.txt
AAHO13020709581.txt
AAPC13020709581.txt
AAPO13020709581.txt
AARC13020709581.txt
AATR13020709581.txt
Can anyone please suggest ?
Sort will by default sort with the beginning of the line as key. You can tell it to start at a different place with the -k FIELD.OFFSET notation, e.g. if all filenames begin with 4 letters, you can skip these like this:
sort -k1.5
Output:
AAAG11020709581.txt
AACL11020709581.txt
AAFU11020709581.txt
AAHO11020709581.txt
AAPC11020709581.txt
AAPO11020709581.txt
AARC11020709581.txt
AATR11020709581.txt
AAAG13020709581.txt
AACL13020709581.txt
AAFU13020709581.txt
AAHO13020709581.txt
AAPC13020709581.txt
AAPO13020709581.txt
AARC13020709581.txt
AATR13020709581.txt

Resources