I believe I have a general understanding on the steps on how to do this, but am struggling to get the schemas correct, either using the Flat File Schema Generator or tweaking the config afterwords.
I will give a sample of the data below, but in general, it starts with a multi-line header that can have variable text but always ends with the same exact line ("START-OF-DATA"). The next section consists of rows of delimited data (this is the only part of the file I need to bring into Biztalk). Finally, there is a multi-line footer that always has the same start end end line ("END-OF-Data" and "END-OF-FILE").
Sample--my comments are in parens:
START-OF-FILE (this is always here)
(. . . variable number of lines that contain info I don't need . . .)
START-OF-DATA (this is always here)
(many lines of delimited data that I DO need)
END-OF-DATA (this is always here)
(. . . variable number of lines that contain info I don't need . . .)
END-OF-FILE (this is always here)
I have used the flat file generator to create three schema (header/detail/footer) with the intent to map only the detail. I created a pipeline and assigned the three schemas to the disassembly stage.
I am looking for general tips on what may be wrong with my approach, or what I should be looking out for. However the error I get when running this is:
The trailer specification specified on the pipeline component
properties does not contain an interchange trailer.
I have googled this error and (as suggested) tried to change the Child order from Infix to Postfix, but this didn't help.
I think this blog should help you:
http://maddcoder.wordpress.com/2012/06/14/using-biztalk-to-parse-a-flatfile-with-multi-line-header-and-trailers/
Related
I have the data as below manner.
<Status>Active Leave Terminated</Status>
<date>05/06/2014 09/10/2014 01/10/2015</date>
I want to get the data as in the below manner.
<status>Active</Status>
<date>05/06/2014</date>
<status>Leave</Status>
<date>09/10/2014</date>
<status>Terminated</Status>
<date>01/10/2015</date>
please help me on the query, to retrieve the data as specified above.
Well, you have a string and want to split it at the whitestapces. That's what tokenize() is for and \s is a whitespace. To get the corresponding date you can get the current position in the for loop using at. Together it looks something like this (note that I assume that the input data is the current context item):
let $dates := tokenize(date, "\s+")
for $status at $pos in tokenize(Status, "\s+")
return (
<status>{$status}</status>,
<date>{$dates[$pos]}</date>
)
You did not indicate whether your data is on the file system or already loaded into MarkLogic. It's also not clear if this is something you need to do once on a small set of data or on an on-going basis with a lot of data.
If it's on the file system, you can transform it as it is being loaded. For instance, MarkLogic Content Pump can apply a transformation during load.
If you have already loaded the content and you want to transform it in place, you can use Corb2.
If you have a small amount of data, then you can just loop across it using Query Console.
Regardless of how you apply the transformation code, dirkk's answer shows how you need to change it. If you are updating content already in your database, you'll xdmp:node-delete() the original Status and date elements and xdmp:node-insert-child() the new ones.
I am using the following code to read a file with the data.table library:
fread(myfile, header=FALSE, sep=",", skip=100, colClasses=c("character","numeric","NULL","numeric"))
but I get the following error:
The supplied 'sep' was not found on line 80. To read the file as a single character column set sep='\n'.
It says it did not find sep on line 80, however I set skip=100 so it should not pay attention to the first 100 lines.
UPDATE:
I tried with skip=101 and it worked but it skips the first line where the data starts
I am using version 1.9.2 of the data.table package and R version 3.02 64 bit on windows 7
We don't know the version number you're using, but I can make a guess in this case.
Try setting autostart=101.
Note the first paragraph of Details in ?fread :
Once the separator is found on line autostart, the number of columns is determined. Then the file is searched backwards from autostart until a row is found that doesn't have that number of columns. Thus, the first data row is found and any human readable banners are automatically skipped. This feature can be particularly useful for loading a set of files which may not all have consistently sized banners. Setting skip>0 overrides this feature by setting autostart=skip+1 and turning off the search upwards step.
the skip argument has :
If -1 (default) use the procedure described below starting on line autostart to find the first data row. skip>=0 means ignore autostart and take line skip+1 as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).
and the autostart argument has :
Any line number within the region of machine readable delimited text, by default 30. If the file is shorter or this line is empty (e.g. short files with trailing blank lines) then the last non empty line (with a non empty line above that) is used. This line and the lines above it are used to auto detect sep, sep2 and the number of fields. It's extremely unlikely that autostart should ever need to be changed, we hope.
In your case perhaps the human readable header is much larger than 30 rows, which is why I guess setting autostart=101 might work. No need to use skip.
One motivation is for convenience when a file contains multiple tables. By setting autostart to any row inside the table that you want to pluck out of the file, it'll find the first data row and header row for you automatically, and then read just that table. You don't have to worry about getting the exact line number at the start of data like you do with skip. fread can only read one table currently. It could feasibly return a list of tables from a single file, but that's getting a bit complicated and nobody has asked for that.
I am using robot framework to test a GUI application ,
when I try to run the test case , got an error like
"Element locator with prefix '| id' is not supported " .
But I am using the latest version of selenium2library i.e.2.39.0 .
I will be thankful ,If somebody helps me out regarding the same .
and I have one more query ,i.e. how to click on the contents on GUI when working with robot framework
Thanks in advance
I think the only way you can get such an error message is if you mix two styles of cell separators in your test. For example, you may be mixing tabs and pipes, or multiple spaces and pipes.
Robot determines which format to use on a line-by line basis. First, it looks for a tab anywhere in the line being parsed, and if it finds it, it uses tabs to split the line. If it doesn't find a tab, it checks to see if the line begins with a pipe and space. If so, it uses the pipe for a separator. Failing that, it uses multiple spaces as the separator.
I can reproduce the exact error you are getting by mixing pipes with either a tab or multiple spaces. For example, the following will generate the exact same error you report:
# the next line begins with two spaces
click element | id=treeview_tv_active
Robot will detect the two leading spaces and decide to use spaces to split the line into cells. Thus, the first cell will be "click element" and the second cell will be "| id=treeview_tv_active". Selenium looks for everything before the "=" as the locator type, thus it's using "| id" as the locator, which is invalid and results in the error that you see.
Since you haven't shown us your code it's impossible to say for sure, but my guess is that the line causing the problem begins with a space or tab, or has a tab embedded somewhere else in the line, but later in the same line attempts to use pipes as cell separators.
I am using the fread function in R for reading files to data.tables objects.
However, when reading the file I'd like to skip lines that start with #, is that possible?
I could not find any mention to that in the documentation.
fread can read from a piped command that filters out such lines, like this:
fread("grep -v '^#' filename")
Not currently, but it's on the list to do.
Are the # lines at the top forming a header which is more than 30 lines long?
If so, that's come up before and the solution is :
fread("filename", autostart=60)
where 60 is chosen to be inside the block of data to be read.
From ?fread :
Once the separator is found on line autostart, the number of columns
is determined. Then the file is searched backwards from autostart
until a row is found that doesn't have that number of columns. Thus,
the first data row is found and any human readable banners are
automatically skipped. This feature can be particularly useful for
loading a set of files which may not all have consistently sized
banners. Setting skip>0 overrides this feature by setting
autostart=skip+1 and turning off the search upwards step.
The default autostart=30 might just need bumping up a bit in your case.
Or maybe skip=n or skip="string" helps :
If -1 (default) use the procedure described below starting on line autostart to find the first data row. skip>=0 means ignore autostart and take line skip+1 as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).
Setting:
I have (simple) .csv and .dat files created from laboratory devices and other programs storing information on measurements or calculations. I have found this for other languages but nor for R
Problem:
Using R, I am trying to extract values to quickly display results w/o opening the created files. Hereby I have two typical settings:
a) I need to read a priori unknown values after known key words
b) I need to read lines after known key words or lines
I can't make functions such as scan() and grep() work.
c) Finally I would like to loop over dozens of files in a folder and give me a summary (to make the picture complete: I will manage this part)
I woul appreciate any form of help.
ok, it works for the key value (although perhaps not very nice)
variable<-scan("file.csv", what=character(),sep="")
returns a charactor vector of everything
variable[grep("keyword", ks)+2] # + 2 as the actual value is stored two places ahead
returns characters of seaked values.
as.numeric(lapply(variable, gsub, patt=",", replace="."))
for completion: data had to be altered to number and "," and "." problem needed to be solved.
in a line:
data=as.numeric(lapply(ks[grep("Ks_Boden", ks)+2], gsub, patt=",", replace="."))
Perseverence is not to bad of an asset ;-)
The rest isn't finished, yet, I will post once finished.