Is there a way to operate over the ExecutionTestCaseLogs to capture results and pass the results of the ExecutionTestCaseLogs to the TDM? - tosca

I need to capture the ExecutionTestCaseLog results and pass them to the TDM
My vision is that we need to
1)operate over the ExecutionTestCaseLogs to capture results
2)pass the results of the ExecutionTestCaseLogs to the TDM or elsewhere (local excel file)
Any suggestions?

Related

filtered content of corpus by custom function with R

I want to analysis filtered texts by custom function (function with parameters) using R.
I used readlines function to extract my text and I get large list with 258 lists. Then, using VCorpus(VectorSource(files)) I get my corpus with original text, create corpus from vector. I bug here, I want to find way to get corpus filtered or to filter my files before inserting into corpus.
My aim is to deal with the corpus for the rest of my analysis.
This code allow me to analysis the original text, how can I filter the content of my corpus :
filenames <- list.files(getwd(),pattern="*.txt")
files <- lapply(filenames,readLines,warn=FALSE)
docs <- VCorpus(VectorSource(files))

Extracting multiple JSON files into one dataframe

I am trying to merge multiple json files into one database and despite trying all the approaches found on SO, it fails.
The files provide sensor data. The stages I've completed are:
1. Unzip the files - produces json files saved as '.txt' files
2. Remove the old zip files
3. Parse the '.txt' files to remove some bugs in the content - random 3
letter + comma combos at the start of some lines, e.g. 'prm,{...'
I've got code which will turn them into data frames individually:
stream <- stream_in(file("1.txt"))
flat <- flatten(stream)
df_it <- as.data.frame(flat)
But when I put it into a function:
df_loop <- function(x) {
stream <- stream_in(x)
flat <- flatten(stream)
df_it <- as.data.frame(flat)
df_it
}
And then try to run through it:
df_all <- sapply(file.list, df_loop)
I get:
Error: Argument 'con' must be a connection.
Then I've tried to merge the json files with rbind.fill and merge to no avail.
Not really sure where I'm going so terribly wrong so would appreciate any help.
You need a small change in your function. Change to -
stream <- stream_in(file(x))
Explanation
Start with analyzing your original implementation -
stream <- stream_in(file("1.txt"))
The 1.txt here is the file path which is getting passed as an input parameter to file() function. A quick ?file will tell you that it is a
Function to create, open and close connections, i.e., “generalized
files”, such as possibly compressed files, URLs, pipes, etc.
Now if you do a ?stream_in() you will find that it is a
function that implements line-by-line processing of JSON data over a
connection, such as a socket, url, file or pipe
Keyword here being socket, url, file or pipe.
Your file.list is just a list of file paths, character/strings to be specific. But in order for stream_in() to work, you need to pass in a file object, which is the output of file() function which takes in the file path as a string input.
Chaining that together, you needed to do stream_in(file("/path/to/file.txt")).
Once you do that, your sapply takes iterates each path, creates the file object and passes it as input to stream_in().
Hope that helps!

test on files to extract before the output export R

I have applied some data mining functions of a PDF corpus (541 PDF fie) and i want to save the processed data.
I used
writeCorpus (corpus_processed )
But I need to add t test on the files to save in fact to save only the files contains the word " America"
I found this function but i could not proceed the remaining parts to adapt it on my needs. I think also that it is not applicable on a corpus
patterns <- sapply(list.files(corpus_processed, full.names=TRUE), FUN=function(x){
grep("america", readLines(x))
})
Your current approach seems to be on the right track, but you are handling the grep logic incompletely. Since readLines will return a vector of lines for each file, you need to handle grep likely returning a vector of indices. The definition of a file which matches is that the vector returned from grep would not be empty.
files <- list.files(corpus_processed, full.names=TRUE)
matches <- sapply(files, function(x) {
conn <- file(x, open="r")
count <- length(grep("\\bamerica\\b", readLines(conn)))
return(count > 0)
})
file_matches <- files[matches]
In the code above matches should be a boolean vector, which can then be used to subset your original vector of files to obtain those files containing america.
Edit:
The above script assumes that files is a list of files (full paths) which contain your material. If not, then you will have to provide such a list.

How do I use getNodeSet (XML package) within a function?

I am trying to develop a script to extract information from xml files. After parsing the XML file I use
idNodes <- getNodeSet(doc, "//compound[#identifier='101.37_1176.0998m/z']")
to subset a particular part of the document and then extract information I need using lines such as
subject <- sapply(idNodes, xpathSApply, path = './condition/sample', function(x) xmlAttrs(x)['name'])
My xml file has hundreds of identifiers of the type 101.37_1176.0998m/z
It is not possible to load all of the identifiers at once so I need iterate through the file by using getNodeSet followed by data extraction
My script works fine if I enter the identifier manually, i.e.
idNodes <- getNodeSet(doc, "//compound[#identifier='101.37_1176.0998m/z']")
but I would like to write a function so I can use do.call to pass the function a list of identifiers.
I have tried
xtract <- function(id){
idNodes <- getNodeSet(doc, "//compound[#identifier='id']")}
but when I use this function, i.e.
xtract('102.91_1180.5732m/z')
or
compounds <- c("101.37_1176.0998m/z", "102.91_1180.5732m/z")
do.call("xtract", list(compounds))
it is clear that getNodeSet has not worked, i.e. there is no data to be extracted.
If I use
xtract(102.91_1180.5732m/z)
I get: Error: unexpected input in "xtract(102.91_"
Can anyone help resolve this problem?
In the function it should be
idNodes <- getNodeSet(doc, paste0("//compound[#identifier='",id,"']"))
then the following call will work
xtract('102.91_1180.5732m/z')

Convert each document in corpus into a separate character vector

I have a corpus created using tm package consisting of many documents. I want to use stringr function str_detect on my documents to see whether a document contains strings from another document. The output I want is lists of true/false on whether each document coincides with every other document in the corpus. Here's a sample of the code using the crude dataset from tm package:
library(tm)
library(stringr)
data("crude")
for (i in 1:length(crude)) {
text <- crude[[i]]
search <- str_detect(crude, text)
}
But in doing so, I get an error stating that the str_detect function is not applicable to plain text documents. So, what I want to do is to convert each document in the corpus into separate character vectors, so that the str_detect can work.
I tried doing:
chr.vector <- as.character(crude)
It returns one character vector comprising everything in my corpus, which is not what I want. So I was considering to do a for loop, just that I have no idea how to display my output in a good way.
for (i in 1:length(crude)) {
x <- as.character(crude[[i]])
Can someone advise me on how to complete my code here? Or if there is a better way for me to approach this problem? Thanks!

Resources