Selecting columns in R depending on user input - r

I have a dataframe col_metadata in R that goes as:
sample | b | c | ...
____________________
S1 | 1 | 1 | ...
S2 | 1 | 2 | ...
S3 | 2 | 2 | ...
S4 | 3 | 3 | ...
I want to make a function that gives me samples that have given values in front of them. For eg.,
fun(b,c(1,2))
should return
S1 S2 S3
while
fun(c,c(2,3))
should return
S2 S3 S4
and so on. If the column would have been fixed (say, b), I could simply do:
col_metaData[col_metaData$b %in% inputList,]$sample
But since there can be many more columns(hence I can't use if-else), I was looking for a different method to do the same. Can someone please help me do this? Thanks...

I solved it. Just in case anyone comes looking for an answer, we can use this:
col_metaData[col_metaData[,b] %in% inputList,]$sample
Notice [,b] instead of $b.

Related

Robot Framework API - how to get suite and its test cases results

I have a test suite directory which contains test suite files with one or more test cases. Let's say it looks like this:
TestSuite
Test-1
Step 1
Step 2
Test-2
Step 1
Test-3
Step 1
Step 2
Step 3
I would like to parse output.xml to get results like this:
Test-1 | PASS
Test-1 | Step 1 | PASS
Test-1 | Step 2 | PASS
Test-2 | PASS
Test-2 | Step 1 | PASS
Test-3 | PASS
Test-3 | Step 1 | PASS
Test-3 | Step 2 | PASS
Test-3 | Step 3 | PASS
So far I have managed to get only suite files names and results using this code:
from robot.api import ExecutionResult, SuiteVisitor
class PrintSuiteInfo(SuiteVisitor):
def visit_suite(self, suite):
print('{} | {}'.format(suite.name, suite.status))
result = ExecutionResult('output.xml')
result.suite.suites.visit(PrintSuiteInfo())
which gives this output:
Test-1 | PASS
Test-2 | PASS
Test-3 | PASS
I can get test case names and results with this code:
from robot.api import ExecutionResult, ResultVisitor
class PrintTestInfo(ResultVisitor):
def visit_test(self, test):
print('{} | {}'.format(test.name, test.status))
result = ExecutionResult('output.xml')
result.visit(PrintTestInfo())
but the output is:
Step 1 | PASS
Step 2 | PASS
Step 1 | PASS
Step 1 | PASS
Step 2 | PASS
Step 3 | PASS
so there is no relation to suite files which I need to update results in Jira.
The only thing that came to my mind is to include the suite file name in each test case name but I would like to learn more about robot.api. I looked into the documentation many times but it is not clear enough for me now.
One of my colleagues helped me with solving this. What I was missing was:
test.parent
or one that I figured out myself:
test.longname
which gives output like this:
TestSuite.Test-1.Step 1
TestSuite.Test-1.Step 2
...
It is documented here.

Levensthein logic to get all the string with minimum difference

Suppose i have a datframe with values
Mtemp:
-----+
code |
-----+
Ram |
John |
Tracy|
Aman |
i want to compare it with dataframe
M2:
------+
code |
------+
Vivek |
Girish|
Rum |
Rama |
Johny |
Stacy |
Jon |
i want to get result so that for each value in Mtemp i will get maximum 2 possible match in M2 with Levensthein distance 2.
i have used
tp<-as.data.frame(amatch(Mtemp$code,M2$code,method = "lv",maxDist = 2))
tp$orig<-Mtemp$code
colnames(tp)<-c('Res','orig')
and i am getting result as follow
Res |orig
-----+-----
3 |Ram
5 |John
6 |Tracy
4 |Aman
please let me know a way to get 2 values(if possible) for every Mtemp string with Lev distance =2

How to eliminate nothing elements in a array (1D) in Julia?

I would like to know how I could eliminate nothing elements in a Julia array (1D) like the one below. It was built from reading a text file with lines with no relevant information mixed with lines with relevant information. "nothing" is type Void and I would like to clean the array of all of it.
nothing
nothing
nothing
nothing
nothing
" -16.3651\t 0.1678\t -4.6997\t -14.0152\t -2.6855\t -16.0294\t -7.8049\t -27.1912\t -5.0354\t -14.5187\t\r\n"
" -16.4490\t -1.0910\t -3.6087\t -12.6724\t -1.5945\t -14.7705\t -7.2174\t -25.2609\t -3.7766\t -14.3509\t\r\n"
" -16.4490\t -2.2659\t -2.4338\t -10.9100\t -0.5875\t -13.6795\t -6.7139\t -22.9950\t -2.9373\t -14.0991\t\r\n"
testvector[testvector.!=nothing] is also a very readable option.
benchmarking can help choose the most efficient code.
How are you reading that file?
You can filter out nothings from an array:
filter(x -> !is(nothing, x), [nothing, 42]) # => Any[42]
But you may want to clean your data first, with a tsv (tab separated values) file like this:
-16.3651 0.1678 -4.6997 -14.0152 -2.6855 -16.0294 -7.8049 -27.1912 -5.0354 -14.5187
-16.4490 -1.0910 -3.6087 -12.6724 -1.5945 -14.7705 -7.2174 -25.2609 -3.7766 -14.3509
-16.4490 -2.2659 -2.4338 -10.9100 -0.5875 -13.6795 -6.7139 -22.9950 -2.9373 -14.0991
Using readdlm:
julia> readdlm("data.tsv")
3x10 Array{Float64,2}:
-16.3651 0.1678 -4.6997 -14.0152 … -27.1912 -5.0354 -14.5187
-16.449 -1.091 -3.6087 -12.6724 -25.2609 -3.7766 -14.3509
-16.449 -2.2659 -2.4338 -10.91 -22.995 -2.9373 -14.0991
Using DataFrmaes.readtable:
julia> df = readtable("data.tsv");
julia> names!(df, [symbol(x) for x in 'A':'J'])
2x10 DataFrames.DataFrame
| Row | A | B | C | D | E | F | G |
|-----|---------|---------|---------|----------|---------|----------|---------|
| 1 | -16.449 | -1.091 | -3.6087 | -12.6724 | -1.5945 | -14.7705 | -7.2174 |
| 2 | -16.449 | -2.2659 | -2.4338 | -10.91 | -0.5875 | -13.6795 | -6.7139 |
| Row | H | I | J |
|-----|----------|---------|----------|
| 1 | -25.2609 | -3.7766 | -14.3509 |
| 2 | -22.995 | -2.9373 | -14.0991 |
one simple way is using filter! function to update your vector like this:
testvector=[fill(nothing,10) ; [1,2,3]];
# =>13-element Array{Any,1}:
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# 1
# 2
# 3
filter!(x->x!=nothing, testvector)
# => 3-element Array{Any,1}:
# 1
# 2
# 3
thanks #Daniel Arndt
EDIT, Refer to this paragraph from Julia doc:
nothing is a special value that does not print anything at the
interactive prompt. Other than not printing, it is a completely normal
value and you can test for it programmatically.
I think all of the conditions below, reach us to the same result
x!=nothing
x!==nothing
!is(x,nothing)
!isa(x,Void)
typeof(x)!=Void
To add to the answers above, it appears:
filter(!isnothing, [nothing, 42])
is a working shorthand for filter(x -> !isnothing(x), [nothing, 42]), and will correctly return 42.
Dear All,
At the end, the code became this:
tmpFile=open(fileName)
tmp=readdlm(tmpFile);
ind=pmap(typeof,tmp[:,1]).!=SubString{ASCIIString}; # if the first column typeof is string, than pmap will return false, else, it return true. This will provide an index of valid/not valid rows.
tmpClean=tmp[ind,:]; # only valid rows will be used
If you may have any suggestion to improve it, I would appreciate it. Thank you for your help.

Last matching date in spreadsheet function

I have a spreadsheet where dates are being recorded in regards to individuals, with additional data, as such:
Tom | xyz | 5/2/2012
Dick | foo | 5/2/2012
Tom | bar | 6/1/2012
On another sheet there is a line in which I want to be able to put in the name, such as Tom, and retrieve on the following cell through a formula the data for the LAST (most recent by date) entry in the first sheet. So the first sheet is a log, and the second sheet displays the most recent one. In the following example, the first cell is entered and the remaining are formulas displaying data from the first sheet:
Tom | bar | 6/1/2012
and so on, showing the latest dated entry in the log.
I'm stumped, any ideas?
If you only need to do a single lookup, you can do that by adding two new columns in your log sheet:
Sheet1
| A | B | C | D | E | F
1 | Tom | xyz | 6/2/2012 | | * | *
2 | Dick | foo | 5/2/2012 | | * | *
3 | Tom | bar | 6/1/2012 | | * | *
Sheet2
| A | B | C
1 | Tom | =Sheet1.E1 | =Sheet1.F1
*(E1) = =IF(AND($A1=Sheet2.$A$1;E2=0);B1;E2)
(i.e. paste the formula above in E1, then copy/paste it in the other cells with *)
Explanation: if A is not what you're looking for, go for the next; if it is, but there is a non-empty next, go for the next; otherwise, get it. This way you're selecting the last one corresponding to your search. I'm assuming you want the last entry, not "the one with the most recent date", since that's what you asked in your example. If I interpreted your question wrong, please update it and I can try to provide a better answer.
Update: If the log dates can be out of order, here's how you get the last entry:
*(F1) = =IF(AND($A1=Sheet2.$A$1;C1>=F2);C1;F2)
*(E1) = =IF(C1=F1;B1;E2)
Here I just replaced the test F2=0 (select next if non-empty) for C1>=F2 (select next if more recent) and, for the other column, select next if the first test also did so.
Disclaimer: I'm very inexperienced with spreadsheets, the solution above is ugly but gets the job done. For instance, if you wanted a 2nd row in Sheet2 to do another lookup, you'd need to add two more columns to Sheet1, etc.

Replacement and non-matches with 'sub'

Months ago I ended up with a sub statement that originally worked with my input data. It has since stopped working causing me to re-examine my ugly process. I hate to share it but it accomplished several things at once:
active$id[grep("CIR",active$description)] <- sub(".*CIR0*(\\d+).*","\\1",active$description[grep("CIR",active$description)],perl=TRUE)
This statement created a new id column by finding rows that had an id embedded in the description column. The sub statement would find the number following a "CIR0" and populate the id column iff there was an id within a row's description. I recognize it is inefficient with the embedded grep subsetting either side of the assignment.
Is there a way to have a 'sub' replacement be NA or empty if the pattern does not match? I feel like I'm missing something very simple but ask for the community's assistance. Thank you.
Example with the results of creating an id column:
| name | id | description |
|------+-----+-------------------|
| a | 343 | Here is CIR00343 |
| b | | Didn't have it |
| c | 123 | What is CIR0123 |
| d | | CIR lacks a digit |
| e | 452 | CIR452 is next |
I was struggling with the same issue a few weeks ago. I ended up using the str_match function from the stringr package. It returns NA if the target string is not found. Just make sure you subset the result correctly. An example:
library(stringr)
str = "Little_Red_Riding_Hood"
sub(".*(Little).*","\\1",str) # Returns 'Little'
sub(".*(Big).*","\\1",str) # Returns 'Little_Red_Riding_Hood'
str_match(str,".*(Little).*")[1,2] #Returns 'Little'
str_match(str,".*(Big).*")[1,2] # Returns NA
I think in this case you could try using ifelse(), i.e.,
active$id[grep("CIR",active$description)] <- ifelse(match, replacement, "")
where match should evaluate to true if there's a match, and replacement is what that element would be replaced with in that case. Likewise, if match evaluates to false, that element's replaced with an empty string (or NA if you prefer).

Resources