Mapping array with Dataframe in Julia

Mapping array with Dataframe in Julia - julia

I'd like to write a code to convert the array.
The base array is a kind of permutated array, which has arrays inside as components.
el = {[0, 0, 0], [1, 0, 0], [0, 0, 1] ... [4, 4, 3], [4, 3, 4], [3, 4, 4], [4, 4, 4]}
125-element Array{Array{Int64,1},1}:
em = {"[0, 0, 0]", "[1, 0, 0]", "[0, 0, 1]" ... "[4, 4, 3]", "[4, 3, 4]", "[3, 4, 4]", "[4, 4, 4]"}
125-element Array{String,1}:
It represents how many stepped forward groups have.
And there is a sort of dataframe
3×3 Named Array{String,2}.; I used NamedArrays to depict its row name.
OrderNA =
|row name |Start | 1st | 2nd | 3rd |Finish|
|:-------- |:----:|:----:|:----:|:----:| ----:|
| G1 | "Stt"| "W1" | "W2" | "W3" | "Fin"|
| G2 | "Stt"| "W2" | "W3" | "W1" | "Fin"|
| G3 | "Stt"| "W3" | "W1" | "W2" | "Fin"|
As you can see, It's a order table.
As mentioned, El's components represents location of groups.
And should be converted with the order table.
eg) [0,0,0] is location of [G1, G2, G3] => ["Stt","Stt","Stt"]
[1,0,0] => ["W1", "Stt", "Stt"]
[1,2,4] => ["W1", "W3", "Fin" ]
So I've struggled to convert it as described below, but I failed.
function trans(em)
for i in 1:length(em)
for j in 1:length(em[i])
#show em[i][j]
if em[i][3j-1] == '0'
replace(em[i][3j-1]) = "Stt"
elseif em[i][3j-1] == '1'
replace(em[i][j]) = OrderNA[j, 1]
elseif em[i][3j-1] == '2'
replace(em[i][3j-1]) = OrderNA[j, 2]
elseif em[i][3j-1] == '3'
replace(em[i][3j-1]) = OrderNA[j, 3]
else em[i][3j-1] == '4'
replace(em[i][3j-1]) = "Fin"
end
end
end
end
syntax: ""em[i][((3 * j) - 1)]" is not a valid function argument name around In[90]:9
I can't figure out what wrong is. How can I solve this? Thank you in advance!

You could do:
el_new = []
for (a,b,c) in el
push!(el_new, [OrderNA[1, a+1], OrderNA[2, b+1], OrderNA[3, c+1]])
end

Related

How to Find the Widest Valley

There are list of numbers which represent size of blocks and I want to find out biggest Valley shape in the list.
Constraint is that unlike normal valley two end can be flat like in following example [5, 5] is still counts as valley end
Some examples;
[1, 5, 5, 2, 8] => [5, 5, 2, 8] widest valley [2, 6, 8, 5] => [2,6,8] widest valley [9, 8, 13, 13, 2, 2, 15, 17] => [13, 13, 2, 2, 15, 17] widest valley
It's not a homework or something but I am wondering how I can solve it in Erlang
I solved it in another language but Erlang is a bit recursive that's why I need some help

I'm no expert, but I'd solve the problem like this:
-record(valley, {from=1, to=1, deepest=1}).
widest_valley([]) ->
[];
widest_valley([H]) ->
[H];
widest_valley([H,T]) ->
[H,T];
widest_valley(L) ->
widest_valley(L, #valley{}, #valley{}, 1, 2).
widest_valley(L, _Curr, Widest, _FlatFrom, Pos) when Pos > length(L) ->
lists:sublist(L, Widest#valley.from, 1 + Widest#valley.to - Widest#valley.from);
widest_valley(L, Curr, Widest, FlatFrom, Pos) ->
Before = lists:nth(Pos - 1, L),
AtPos = lists:nth(Pos, L),
Deepest = lists:nth(Curr#valley.deepest, L),
Curr1 = if Before == Deepest ->
Curr#valley{deepest = if AtPos < Deepest ->
Pos;
true ->
Curr#valley.deepest
end};
AtPos < Before ->
#valley{from=FlatFrom, deepest=Pos};
true ->
Curr
end,
FlatFrom1 = if AtPos == Before ->
FlatFrom;
true ->
Pos
end,
Widest1 = if Pos - Curr1#valley.from > Widest#valley.to - Widest#valley.from ->
Curr1#valley{to=Pos};
true ->
Widest
end,
widest_valley(L, Curr1, Widest1, FlatFrom1, Pos + 1).

Best way to identify duplicates in two lists or a dictionary?

Lets say you have two lists such as:
list1 = [-2, -1, 0, 1, 2, 3]
list2 = [4, 1, 0, 1, 4, 9]
...and the two lists were zipped into a dictionary to produce:
dict1 = {-2: 4,
-1: 1,
0: 0,
1: 1,
2: 4,
3: 9}
...where list1 is the key, and list 2 is the value.
You will notice that some of the elements in list2 are duplicates such as 4 and 1. They show up twice in list 2, and consequently in the dictionary.
-2 corresponds to 4
2 corresponds to 4
-1 corresponds to 1
1 corresponds to 1
I am trying to figure out a way either using the lists or the dictionary to identify the duplicate items in list2, and return their keys from list 1.
So the returned values I would expect from the two lists above would be:
(-2, 2) #From list 1 since they both correspond to 4 in list2
(-1, 1) #from list 1 since they both correspond to 1 in list2
In this example, list2 happens to be the square of list1. But this will not always be the case.
So ultimately, what I am looking for is a way to return those keys based on their duplicate values.
Any thoughts on how to approach this? I am able to identify the duplicates in list2, but I am completely stuck on how to identify their corresponding values in list 1.

In python3:
from itertools import groupby
list1 = [-2, -1, 0, 1, 2, 3]
list2 = [4, 1, 0, 1, 4, 9]
pairs = zip(list2, list1)
ordered = sorted(pairs, key=lambda x: x[0])
groups = ((k, list(g)) for k,g in groupby(ordered, key=lambda x: x[0])) # generator
duplicates = (k for k in groups if len(k[1])>1) # generator
for k,v in duplicates :
print(str(k) + " : " + str(list(v)))
result:
1 : [(1, -1), (1, 1)]
4 : [(4, -2), (4, 2)]
Bonus: in functional c#:
var list1 = new[] { -2, -1, 0, 1, 2, 3 };
var list2 = new[] { 4, 1, 0, 1, 4, 9 };
var g = list1.Zip(list2, (a, b) => (a, b)) //create tuples
.GroupBy(o => o.b, o => o.a, (k, group) => new { key = k, group = group.ToList() }) //create groups
.Where(o => o.group.Count > 1) // select group with minimum 2 elements
.ToList(); // no lazy
foreach (var kvp in g)
Console.WriteLine($"{kvp.key}: {string.Join(",", kvp.group)}");
result:
4: -2,2
1: -1,1

What's an efficient way to fill `missing` values with previous non-missing value?

I have a vector
using Missings
v = allowmissing(rand(100))
v[rand(100) .< 0.1] .= missing
what's the best way to fill v with the last non-missing value?
Currently
for (i, val) in enumerate(v)
ismissing(val) && (i >=2) && (v[i]=v[i-1])
end
first_non_missing = findfirst(x->!ismissing(x), v)
if first_non_missing >= 2
v[1:first_non_missing -1] .= v[first_non_missing]
end
v = disallowmissing(v)
But I found it to be slow for large vectors. What's an elegant and efficient way to fill missing values with previous non-missing values?

A simple and fast solution:
replace_missing!(v) = accumulate!((n0,n1) -> ismissing(n1) ? n0 : n1, v, v, init=zero(eltype(v)))

you need an init value in case the fist value is missing, and i can't execute your code. but with that said, here is my attempt:
function replace_missing!(v,init=zero(eltype(v)))
function reduce_missing(n0,n1)
if ismissing(n1)
return n0
else
return n1
end
end
v[1] = reduce_missing(init,v[1])
for i = 2:length(v)
v[i] = reduce_missing(v[i-1],v[i])
end
return v
end
using Missings
v = allowmissing(rand(100))
v[rand(100) .< 0.1] .= missing
v = replace_missing!(v)
v = disallowmissing(v)

The following answer is entirely based on the discussions in this thread: Julia DataFrame Fill NA with LOCF. More specifically, it is based on the answers by Danish Shrestha, Dan Getz, and btsays.
As laborg implies, the accumulate function in Base Julia will do the job.
Suppose we have an array: a = [1, missing, 2, missing, 9]. We want to replace the 1st missing with 1 and the second with 2: a = [1, 1, 2, 2, 9], which is a = a[[1, 1, 3, 3, 5]] ([1, 1, 3, 3, 5] here are indexes).
This function will do the job:
ffill(v) = v[accumulate(max, [i*!ismissing(v[i]) for i in 1:length(v)], init=1)]
BTW, "ffill" means "forward filling", a name I adopted from Pandas.
I'll explain in the following.
What the accumulate function does is that it returns a new array based on the array we input.
For those of you who are new to Julia like me: in Julia's mathematical operations, i*true = i, and i*false=0. Therefore, when an element in the array is NOT missing, then i*!ismissing() = i; otherwise, i*!ismissing() = 0.
In the case of a = [1, missing, 2, missing, 9], [i*!ismissing(a[i]) for i in 1:length(a)] will return [1, 0, 3, 0, 5]. Since this array is in the accumulate function where the operation is max, we'll get [1, 1, 3, 3, 5].
Then a[[1, 1, 3, 3, 5]] will return [1, 1, 2, 2, 9].
That's why
a = ffill(a)
will get [1, 1, 2, 2, 9].
Now, you may wonder why we have init = 1 in ffill(v). Say, b = [missing, 1, missing, 3]. Then, [i*!ismissing(b[i]) for i in 1:length(b)] will return [0, 2, 0, 4]. Then the accumulate function will return [0, 2, 2, 4]. The next step, b[[0, 2, 2, 4]] will throw an error because in Julia, index starts from 1 not 0. Therefore, b[0] doesn't mean anything.
With init = 1 in the accumulate function, we'll get [1, 2, 2, 4] rather than [0, 2, 2, 4] since 1 (the init we set) is larger than 0 (the first number).
We can go further form here. The ffill() function above only works for a single array. But what if we have a large dataframe?
Say, we have:
using DataFrames
a = ["Tom", "Mike", "John", "Jason", "Bob"]
b = [missing, 2, 3, missing, 8]
c = [1, 3, missing, 99, missing]
df = DataFrame(:Name => a, :Var1 => b, :Var2 => c)
julia> df
5×3 DataFrame
Row │ Name Var1 Var2
│ String Int64? Int64?
─────┼──────────────────────────
1 │ Tom missing 1
2 │ Mike 2 3
3 │ John 3 missing
4 │ Jason missing 99
5 │ Bob 8 missing
Here, Dan Getz's answer comes in handy:
nona_df = DataFrame([ffill(df[!, c]) for c in names(df)], names(df))
julia> nona_df
5×3 DataFrame
Row │ Name Var1 Var2
│ String Int64? Int64?
─────┼─────────────────────────
1 │ Tom missing 1
2 │ Mike 2 3
3 │ John 3 3
4 │ Jason 3 99
5 │ Bob 8 99

From a list of ints extract all consecutive repetitions in a list of lists

Extract all consecutive repetitions in a given list:
list1 = [1,2,2,3,3,3,3,4,5,5]
It should yield a list like this
[[2,2],[3,3,3,3],[5,5]]
I tried the code below. I know it is not the proper way to solve this problem but I could not manage how to solve this.
list1 = [1,2,2,3,3,3,3,4,5,5]
list2 = []
for i in list1:
a = list1.index(i)
if list1[a] == list1[a+1]:
list2.append([i,i])
print(list2)

You can use this to achieve it. There are "easier" solutions using itertools and groupby to get the same result, this is how to do it "by hand":
def FindInnerLists(l):
'''reads a list of int's and groups them into lists of same int value'''
result = []
allResults = []
for n in l:
if not result or result[0] == n: # not result == empty list
result.append(n)
if result[0] != n: # number changed, so we copy the list over into allResults
allResults.append(result[:])
result = [n] # and add the current to it
# edge case - if result contains elements, add them as last item to allResults
if result:
allResults.append(result[:])
return allResults
myl = [2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 7, 1, 1, 1,2,2,2,2,2]
print(FindInnerLists(myl))
Output (works for 2.6 and 3.x):
[[2], [1], [2], [1, 1, 1, 1], [2, 2, 2], [1], [2], [7], [1, 1, 1], [2, 2, 2, 2, 2]]

Another way to do it:
list1 = [1, 2, 2, 3, 3, 3, 3, 4, 5, 5]
result = [[object()]] # initiate the result with object() as a placeholder
for element in list1: # iterate over the rest...
if result[-1][0] != element: # the last repeated element does not match the current
if len(result[-1]) < 2: # if there was less than 2 repeated elements...
result.pop() # remove the last element
result.append([]) # create a new result entry for future repeats
result[-1].append(element) # add the current element to the end of the results
if len(result[-1]) < 2: # finally, if the last element wasn't repeated...
result.pop() # remove it
print(result) # [[2, 2], [3, 3, 3, 3], [5, 5]]
And you can use it on any kind of a list, not just numerical.

This would work:
list1 = [1,2,2,3,3,3,3,4,5,5]
res = []
add = True
last = [list1[0]]
for elem in list1[1:]:
if last[-1] == elem:
last.append(elem)
if add:
res.append(last)
add = False
else:
add = True
last = [elem]
print(res)
Output:
[[2, 2], [3, 3, 3, 3], [5, 5]]

How to do outer product as a layer with chainer?

How can I include an outer product (of the previous feature vector and itself) as a layer in chainer, especially in a way that's compatible with batching?

F.matmul is also very handy.
Depending on the input shapes, you can combine it with F.expand_dims (of course F.reshape works, too) or use transa/transb arguments.
For details, refer to the official documentation of functions.
Code
import chainer.functions as F
import numpy as np
print("---")
x = np.array([[[1], [2], [3]], [[4], [5], [6]]], 'f')
y = np.array([[[1, 2, 3]], [[4, 5, 6]]], 'f')
print(x.shape)
print(y.shape)
z = F.matmul(x, y)
print(z)
print("---")
x = np.array([[[1], [2], [3]], [[4], [5], [6]]], 'f')
y = np.array([[[1], [2], [3]], [[4], [5], [6]]], 'f')
print(x.shape)
print(y.shape)
z = F.matmul(x, y, transb=True)
print(z)
print("---")
x = np.array([[1, 2, 3], [4, 5, 6]], 'f')
y = np.array([[1, 2, 3], [4, 5, 6]], 'f')
print(x.shape)
print(y.shape)
z = F.matmul(
F.expand_dims(x, -1),
F.expand_dims(y, -1),
transb=True)
print(z)
Output
---
(2, 3, 1)
(2, 1, 3)
variable([[[ 1. 2. 3.]
[ 2. 4. 6.]
[ 3. 6. 9.]]
[[ 16. 20. 24.]
[ 20. 25. 30.]
[ 24. 30. 36.]]])
---
(2, 3, 1)
(2, 3, 1)
variable([[[ 1. 2. 3.]
[ 2. 4. 6.]
[ 3. 6. 9.]]
[[ 16. 20. 24.]
[ 20. 25. 30.]
[ 24. 30. 36.]]])
---
(2, 3)
(2, 3)
variable([[[ 1. 2. 3.]
[ 2. 4. 6.]
[ 3. 6. 9.]]
[[ 16. 20. 24.]
[ 20. 25. 30.]
[ 24. 30. 36.]]])

You can use F.reshape and F.broadcast_to to explicitly handle array.
Assume you have 2-dim array h with shape (minibatch, feature).
If you want to calculate outer product of h and h, try below code.
Is this what you want to do?
import numpy as np
from chainer import functions as F
def outer_product(h):
s0, s1 = h.shape
h1 = F.reshape(h, (s0, s1, 1))
h1 = F.broadcast_to(h1, (s0, s1, s1))
h2 = F.reshape(h, (s0, 1, s1))
h2 = F.broadcast_to(h2, (s0, s1, s1))
h_outer = h1 * h2
return h_outer
# test code
h = np.arange(12).reshape(3, 4).astype(np.float32)
h_outer = outer_product(h)
print(h.shape)
print(h_outer.shape, h_outer.data)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Mapping array with Dataframe in Julia - julia

You could do: el_new = [] for (a,b,c) in el push!(el_new, [OrderNA[1, a+1], OrderNA[2, b+1], OrderNA[3, c+1]]) end

Related

How to Find the Widest Valley

Best way to identify duplicates in two lists or a dictionary?

What's an efficient way to fill `missing` values with previous non-missing value?

From a list of ints extract all consecutive repetitions in a list of lists

How to do outer product as a layer with chainer?

Categories

Resources