How to eliminate nothing elements in a array (1D) in Julia? - julia

I would like to know how I could eliminate nothing elements in a Julia array (1D) like the one below. It was built from reading a text file with lines with no relevant information mixed with lines with relevant information. "nothing" is type Void and I would like to clean the array of all of it.
nothing
nothing
nothing
nothing
nothing
" -16.3651\t 0.1678\t -4.6997\t -14.0152\t -2.6855\t -16.0294\t -7.8049\t -27.1912\t -5.0354\t -14.5187\t\r\n"
" -16.4490\t -1.0910\t -3.6087\t -12.6724\t -1.5945\t -14.7705\t -7.2174\t -25.2609\t -3.7766\t -14.3509\t\r\n"
" -16.4490\t -2.2659\t -2.4338\t -10.9100\t -0.5875\t -13.6795\t -6.7139\t -22.9950\t -2.9373\t -14.0991\t\r\n"

testvector[testvector.!=nothing] is also a very readable option.
benchmarking can help choose the most efficient code.

How are you reading that file?
You can filter out nothings from an array:
filter(x -> !is(nothing, x), [nothing, 42]) # => Any[42]
But you may want to clean your data first, with a tsv (tab separated values) file like this:
-16.3651 0.1678 -4.6997 -14.0152 -2.6855 -16.0294 -7.8049 -27.1912 -5.0354 -14.5187
-16.4490 -1.0910 -3.6087 -12.6724 -1.5945 -14.7705 -7.2174 -25.2609 -3.7766 -14.3509
-16.4490 -2.2659 -2.4338 -10.9100 -0.5875 -13.6795 -6.7139 -22.9950 -2.9373 -14.0991
Using readdlm:
julia> readdlm("data.tsv")
3x10 Array{Float64,2}:
-16.3651 0.1678 -4.6997 -14.0152 … -27.1912 -5.0354 -14.5187
-16.449 -1.091 -3.6087 -12.6724 -25.2609 -3.7766 -14.3509
-16.449 -2.2659 -2.4338 -10.91 -22.995 -2.9373 -14.0991
Using DataFrmaes.readtable:
julia> df = readtable("data.tsv");
julia> names!(df, [symbol(x) for x in 'A':'J'])
2x10 DataFrames.DataFrame
| Row | A | B | C | D | E | F | G |
|-----|---------|---------|---------|----------|---------|----------|---------|
| 1 | -16.449 | -1.091 | -3.6087 | -12.6724 | -1.5945 | -14.7705 | -7.2174 |
| 2 | -16.449 | -2.2659 | -2.4338 | -10.91 | -0.5875 | -13.6795 | -6.7139 |
| Row | H | I | J |
|-----|----------|---------|----------|
| 1 | -25.2609 | -3.7766 | -14.3509 |
| 2 | -22.995 | -2.9373 | -14.0991 |

one simple way is using filter! function to update your vector like this:
testvector=[fill(nothing,10) ; [1,2,3]];
# =>13-element Array{Any,1}:
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# 1
# 2
# 3
filter!(x->x!=nothing, testvector)
# => 3-element Array{Any,1}:
# 1
# 2
# 3
thanks #Daniel Arndt
EDIT, Refer to this paragraph from Julia doc:
nothing is a special value that does not print anything at the
interactive prompt. Other than not printing, it is a completely normal
value and you can test for it programmatically.
I think all of the conditions below, reach us to the same result
x!=nothing
x!==nothing
!is(x,nothing)
!isa(x,Void)
typeof(x)!=Void

To add to the answers above, it appears:
filter(!isnothing, [nothing, 42])
is a working shorthand for filter(x -> !isnothing(x), [nothing, 42]), and will correctly return 42.

Dear All,
At the end, the code became this:
tmpFile=open(fileName)
tmp=readdlm(tmpFile);
ind=pmap(typeof,tmp[:,1]).!=SubString{ASCIIString}; # if the first column typeof is string, than pmap will return false, else, it return true. This will provide an index of valid/not valid rows.
tmpClean=tmp[ind,:]; # only valid rows will be used
If you may have any suggestion to improve it, I would appreciate it. Thank you for your help.

Related

Selecting columns in R depending on user input

I have a dataframe col_metadata in R that goes as:
sample | b | c | ...
____________________
S1 | 1 | 1 | ...
S2 | 1 | 2 | ...
S3 | 2 | 2 | ...
S4 | 3 | 3 | ...
I want to make a function that gives me samples that have given values in front of them. For eg.,
fun(b,c(1,2))
should return
S1 S2 S3
while
fun(c,c(2,3))
should return
S2 S3 S4
and so on. If the column would have been fixed (say, b), I could simply do:
col_metaData[col_metaData$b %in% inputList,]$sample
But since there can be many more columns(hence I can't use if-else), I was looking for a different method to do the same. Can someone please help me do this? Thanks...
I solved it. Just in case anyone comes looking for an answer, we can use this:
col_metaData[col_metaData[,b] %in% inputList,]$sample
Notice [,b] instead of $b.

How to separate out letters in a sentence using R

I have a character vector that is a string of letters and punctuation. I want to create a data frame where each column is made up of a letter/character from this string.
e.g.
Character string = I WENT TO THE FAIR
Dataframe = | I | | W | E | N | T | | T | O | | T | H | E | | F | A | I | R |
I thought I could do this using a loop with substr, but I can't work out how to get R to write into separate columns, rather than just writing over the previous letter. I'm new to writing loops etc so struggling a bit to get my head around the way in which to compose what I need.
Thanks for any help and advice that you can offer.
Best wishes,
Natalie
This should get that result
string <- "I WENT TO THE FAIR"
df <- as.data.frame(t(as.data.frame(strsplit(string,""))), row.names = "1")

In julia, how do I assign the output of an expression to a new variable?

Stupid example, I would like to do something like
X=println("hi"),
and get
X="hi".
The general solution is to use IOBuffer and takebuf_string as described by #ARM above. If it's enough to capture the output of print, then
s = string(args...)
gives the string that would have been printed by print(args...). Also,
s = repr(X)
gives the string that would have been printed by showall(X). Both are implemented using IOBuffer and takebuf_string internally.
I think the poster wants to access the nice summary format that you can get from println. One way to access that as a string is to write to a buffer using print and then read it back as a string. There's probably also an easier way.
using DataFrames
data = DataFrame()
data[:turtle] = ["Suzy", "Suzy", "Bob", "Batman", "Batman", "Bob", "Adam"]
data[:mealType] = ["bug", "worm", "worm", "bug", "worm", "worm", "stick"]
stream = IOBuffer()
println(data)
print(stream, data)
yourString = takebuf_string(stream)
returns
"7x2 DataFrame\n| Row | turtle | mealType |\n|-----|----------|----------|\n| 1 | \"Suzy\" | \"bug\" |\n| 2 | \"Suzy\" | \"worm\" |\n| 3 | \"Bob\" | \"worm\" |\n| 4 | \"Batman\" | \"bug\" |\n| 5 | \"Batman\" | \"worm\" |\n| 6 | \"Bob\" | \"worm\" |\n| 7 | \"Adam\" | \"stick\" |"
If you are after formatted strings you can use #sprintf.
julia> x = #sprintf("%s", "hi")
"hi"
julia> x
"hi"
julia> x = #sprintf("%d/%d", 3, 4)
"3/4"
It's a macro though so be careful

Last matching date in spreadsheet function

I have a spreadsheet where dates are being recorded in regards to individuals, with additional data, as such:
Tom | xyz | 5/2/2012
Dick | foo | 5/2/2012
Tom | bar | 6/1/2012
On another sheet there is a line in which I want to be able to put in the name, such as Tom, and retrieve on the following cell through a formula the data for the LAST (most recent by date) entry in the first sheet. So the first sheet is a log, and the second sheet displays the most recent one. In the following example, the first cell is entered and the remaining are formulas displaying data from the first sheet:
Tom | bar | 6/1/2012
and so on, showing the latest dated entry in the log.
I'm stumped, any ideas?
If you only need to do a single lookup, you can do that by adding two new columns in your log sheet:
Sheet1
| A | B | C | D | E | F
1 | Tom | xyz | 6/2/2012 | | * | *
2 | Dick | foo | 5/2/2012 | | * | *
3 | Tom | bar | 6/1/2012 | | * | *
Sheet2
| A | B | C
1 | Tom | =Sheet1.E1 | =Sheet1.F1
*(E1) = =IF(AND($A1=Sheet2.$A$1;E2=0);B1;E2)
(i.e. paste the formula above in E1, then copy/paste it in the other cells with *)
Explanation: if A is not what you're looking for, go for the next; if it is, but there is a non-empty next, go for the next; otherwise, get it. This way you're selecting the last one corresponding to your search. I'm assuming you want the last entry, not "the one with the most recent date", since that's what you asked in your example. If I interpreted your question wrong, please update it and I can try to provide a better answer.
Update: If the log dates can be out of order, here's how you get the last entry:
*(F1) = =IF(AND($A1=Sheet2.$A$1;C1>=F2);C1;F2)
*(E1) = =IF(C1=F1;B1;E2)
Here I just replaced the test F2=0 (select next if non-empty) for C1>=F2 (select next if more recent) and, for the other column, select next if the first test also did so.
Disclaimer: I'm very inexperienced with spreadsheets, the solution above is ugly but gets the job done. For instance, if you wanted a 2nd row in Sheet2 to do another lookup, you'd need to add two more columns to Sheet1, etc.

Replacement and non-matches with 'sub'

Months ago I ended up with a sub statement that originally worked with my input data. It has since stopped working causing me to re-examine my ugly process. I hate to share it but it accomplished several things at once:
active$id[grep("CIR",active$description)] <- sub(".*CIR0*(\\d+).*","\\1",active$description[grep("CIR",active$description)],perl=TRUE)
This statement created a new id column by finding rows that had an id embedded in the description column. The sub statement would find the number following a "CIR0" and populate the id column iff there was an id within a row's description. I recognize it is inefficient with the embedded grep subsetting either side of the assignment.
Is there a way to have a 'sub' replacement be NA or empty if the pattern does not match? I feel like I'm missing something very simple but ask for the community's assistance. Thank you.
Example with the results of creating an id column:
| name | id | description |
|------+-----+-------------------|
| a | 343 | Here is CIR00343 |
| b | | Didn't have it |
| c | 123 | What is CIR0123 |
| d | | CIR lacks a digit |
| e | 452 | CIR452 is next |
I was struggling with the same issue a few weeks ago. I ended up using the str_match function from the stringr package. It returns NA if the target string is not found. Just make sure you subset the result correctly. An example:
library(stringr)
str = "Little_Red_Riding_Hood"
sub(".*(Little).*","\\1",str) # Returns 'Little'
sub(".*(Big).*","\\1",str) # Returns 'Little_Red_Riding_Hood'
str_match(str,".*(Little).*")[1,2] #Returns 'Little'
str_match(str,".*(Big).*")[1,2] # Returns NA
I think in this case you could try using ifelse(), i.e.,
active$id[grep("CIR",active$description)] <- ifelse(match, replacement, "")
where match should evaluate to true if there's a match, and replacement is what that element would be replaced with in that case. Likewise, if match evaluates to false, that element's replaced with an empty string (or NA if you prefer).

Resources