I have 2 vectors, lets say v1 = (a,b,c,d,e) and v2 = (e,h,t,b,w) and I want to get a third vector containing the common elements of v1 and v2, in this case v3 = (e,b)
I've seen this is already asked in c++, but I cant see it for Lua.
I assume when saying vectors, you mean v1 and v2 are both array-like tables, like this:
local v1 = {1,2,4,8,16}
local v2 = {2,3,4,5}
Then you can do it like this:
local v3 = {}
for k1,v1 in pairs(v1) do
for k2,v2 in pairs(v2) do
if v1 == v2 then
v3[#v3 + 1] = v1
end
end
end
Unlike Yu Hao's solution, the code below runs in linear time, but it does need extra memory, unless you can live with v3 instead of insisting on v4. The time difference will probably only be noticeable for large input vectors, though.
local v1={1,2,4,8,16}
local v2={2,3,4,5}
local v3={}
for k,v in ipairs(v1) do v3[v]=false end
for k,v in ipairs(v2) do if v3[v]==false then v3[v]=true end end
for k,v in pairs(v3) do if v3[k]==false then v3[k]=nil end end
local v4={}
local n=0
for k,v in pairs(v3) do n=n+1; v4[n]=k end
for k,v in ipairs(v4) do print(k,v) end
You can take a pass through v1 to build a lookup table for its values, then one pass through v2 to find common values and add them to an output list:
function findcommon(a1, a2)
local in_a1 = {} -- lookup table for a1's values
for k,v in pairs(a1) do in_a1[v] = true end
local common = {}
local n = 0
for k,v in pairs(a2) do
if in_a1[v] then
in_a1[v] = false
n = n + 1
common[n] = v
end
end
return common
end
Given your example:
v1 = {'a','b','c','d','e'}
v2 = {'e','h','t','b','w'}
common = findcommon(v1,v2)
print('('..table.concat(common,',')..')') --> (e,b)
Just in case you want to intersect larger vectors (arrays/sets), here's an attempt at an unmeasured premature optimization
local v1 = {1,2,4,8,16}
local v2 = {2,3,4,5}
local function intersect(t1,t2)
local function make_lookup(t)
local res={}
for _,v in ipairs(t) do
res[v]=true
end
return res
end
local smaller,larger
if (#t1>#t2) then
larger=t1
smaller=t2
else
larger=t2
smaller=t1
end
local lookup=make_lookup(smaller)
local res={}
for _,v in ipairs(larger) do
if lookup[v] then
res[#res+1]=v
end
end
return res
end
local v1v2_intersected=intersect(v1,v2)
The internal lookup table is made of the smaller array table of the two. In "production" you would perhaps also want to check, whether you're dealing with correct input values
Related
I have a big file (75GB) memory mapped in an array d that I want to copy in another m. Because I do not have 75GB of RAM available, I did:
for (i,v) in enumerate(d)
m[i] = v
end
In order to copy the file value after value. But I get a copy rate of ~2MB/s on a SSD where I expect at least 50MB/s both in read and write.
How could I optimize this copy rate?
=== [edit] ===
According to the comments, I changed my code to the following, which sped up the write rate to 15MB/s
function copydcimg(m::Array{UInt16,4}, d::Dcimg)
m .= d
Mmap.sync!(m)
end
copydcimg(m,d)
At this point, I think I should optimize the Dcimg code. This binary file is made of frames spaced by a timestamp. Here is the code I use to access the frames:
module dcimg
using Mmap
using TOML
struct Dcimg <: AbstractArray{UInt16,4} # struct allowing to access dcimg file
filename::String # filename of the dcimg
header::Int # header size in bytes
clock::Int # clock size in bytes
x::Int
y::Int
z::Int
t::Int
m # linear memory map
Dcimg(filename, header, clock, x, y, z, t) =
new(filename, header, clock, x, y, z, t,
Mmap.mmap(open(filename), Array{UInt16, 3},
(x*y+clock÷sizeof(UInt16), z, t), header)
)
end
# following functions allows to access DCIMG like an Array
Base.size(D::Dcimg) = (D.x, D.y, D.z, D.t)
# skip clock
Base.getindex(D::Dcimg, i::Int) =
D.m[i + (i ÷ (D.x*D.y))*D.clock÷sizeof(UInt16)]
Base.getindex(D::Dcimg, x::Int, y::Int, z::Int, t::Int) =
D[x + D.x*((y-1) + D.y*((z-1) + D.z*(t-1)))]
# allowing to automatically parse size
function Dcimg(pathtag)
p = TOML.parsefile(pathtag * ".toml")
return Dcimg(pathtag * ".dcimg",
# ...
)
end
export Dcimg, getframe
end
I got it! The solution was to copy the file chunk by chunk lets say by frame (around 1024×720 UInt16). This way I reached 300MB/s, which I didn't even know was possible in single thread. Here is the code.
In module dcimg, I added the methods to access the file frame by frame
# get frame number n (starting form 1)
getframe(D::Dcimg,n::Int) =
reshape(D.m[
D.x*D.y*(n-1)+1 + (n-1)*D.clock÷sizeof(UInt16) : # cosmetic line break
D.x*D.y*n + (n-1)*D.clock÷sizeof(UInt16)
], D.x, D.y)
# get frame for layer z, time t (starting from 1)
getframe(D::Dcimg,z::Int,t::Int) =
getframe(D::Dcimg,(z-1)+D.z*(t-1))
Iterating over the frames within a loop
function copyframes(m::Array{UInt16,4}, d::Dcimg)
N = d.z*d.t
F = d.x*d.y
for i in 1:N
m[(i-1)*F+1:i*F] = getframe(d, i)
end
end
copyframes(m,d)
Thanks all in comments for leading me to this.
===== edit =====
for further reading, you might look at:
dd: How to calculate optimal blocksize?
http://blog.tdg5.com/tuning-dd-block-size/
which give hints about the optimal block size to copy at a time.
I would like to iterate over a list and occasionally delete items of said list. Below a toy example:
function delete_item!(myarray, item)
deleteat!(myarray, findin(myarray, [item]))
end
n = 1000
myarray = [i for i = 1:n];
for a in myarray
if a%2 == 0
delete_item!(myarray, a)
end
end
However I get error:
BoundsError: attempt to access 500-element Array{Int64,1} at index [502]
How can I fix it (as efficiently as possible)?
Additional information. The above seems like a silly example, in my original problem I have a list of agents which interact. Therefore I am not sure if iterating over a copy would be the best solution. For example:
#creating my agent
mutable struct agent <: Any
id::Int
end
function delete_item!(myarray::Array{agent, 1}, item::agent)
deleteat!(myarray, findin(myarray, [item]))
end
#having my list of agents
n = 1000
myarray = agent[agent(i) for i = 1:n];
#trying to remove agents from list while having them interact
for a in myarray
#agent does stuff
if a.id%2 == 0 #if something happens remove
delete_item!(myarray, a)
end
end
Unfortunately there is no single answer to this question as most efficient approach depends on the logic of the whole model (in particular do other agents' actions depend on the fact that some entry is actually deleted from an array).
In most cases the following approach should be the simplest (I am leaving findin which is inefficient but I understand that you may have duplicates in myarray in general):
n = 1000
myarray = [i for i = 1:n];
keep = trues(n)
for (i, a) in enumerate(myarray)
keep[i] || continue # do not process an agent that is marked for deletion
if a%2 == 0 # here application logic might also need to check keep in some cases
keep[findin(myarray, [a])] = false
end
end
myarray = myarray[keep]
If for some reason you really need to delete elements of myarray in each iteration here is how you can do it:
n = 1000
myarray = [i for i = 1:n];
i = 1
while i <= length(myarray)
a = myarray[i]
if a%2 == 0
todelete = findin(myarray, [a])
i -= count(x -> x < i, todelete) # if myarray has duplicates of a you have to move the counter back
deleteat!(myarray, todelete)
else
i += 1
end
end
In general the code you give will not be very fast (e.g. if you know myarray does not contain duplicates it can be much simpler - and I guess you can).
EDIT: Here is how you can implement both versions if you know you do not have duplicates (you can simply use agent's index - observe that we can also avoid unnecessary checks):
n = 1000
myarray = [i for i = 1:n];
keep = trues(n)
for (i, a) in enumerate(myarray)
if a%2 == 0 # here application logic might also need to check keep in some cases
keep[i] = false
end
end
myarray = myarray[keep]
If for some reason you really need to delete elements of myarray in each iteration here is how you can do it:
n = 1000
myarray = [i for i = 1:n];
i = 1
while i <= length(myarray)
a = myarray[i]
if a%2 == 0
deleteat!(myarray, i)
else
i += 1
end
end
I am having trouble figuring out how to get the length of a matrix within a matrix within a matrix (nested depth of 3). So what the code is doing in short is... looks to see if the publisher is already in the array, then it either adds a new column in the array with a new publisher and the corresponding system, or adds the new system to the existing array publisher
output[k][1] is the publisher array
output[k][2][l] is the system
where the first [] is the amount of different publishers
and the second [] is the amount of different systems within the same publisher
So how would I find out what the length of the third deep array is?
function reviewPubCount()
local output = {}
local k = 0
for i = 1, #keys do
if string.find(tostring(keys[i]), '_') then
key = Split(tostring(keys[i]), '_')
for j = 1, #reviewer_code do
if key[1] == reviewer_code[j] and key[1] ~= '' then
k = k + 1
output[k] = {}
-- output[k] = reviewer_code[j]
for l = 1, k do
if output[l][1] == reviewer_code[j] then
ltable = output[l][2]
temp = table.getn(ltable)
output[l][2][temp+1] = key[2]
else
output[k][1] = reviewer_code[j]
output[k][2][1] = key[2]
end
end
end
end
end
end
return output
end
The code has been fixed here for future reference: http://codepad.org/3di3BOD2#output
You should be able to replace table.getn(t) with #t (it's deprecated in Lua 5.1 and removed in Lua 5.2); instead of this:
ltable = output[l][2]
temp = table.getn(ltable)
output[l][2][temp+1] = key[2]
try this:
output[l][2][#output[l][2]+1] = key[2]
or this:
table.insert(output[l][2], key[2])
1Is there a way to write a function to multiply two values based on only the fact that they have the same key? Here is some psudocode for what I have in mind:
operation = {a=12, b=7, c=31}
operator1 = {a=0.5}
operator2 = {a=0.7}
operator3 = {b=0.3}
function Operate(x)
return x.common_key * operation.common_key
end
print (Operate (operator1))
print (Operate (operator3))
---> 6
---> 2.1
This code of course doesn't work, because "common_key" isn't a real thing. It is a stand-in for whatever the function's argument has in common with the "operation" dictionary. In this case, it would be "a", so the function would multiply "operator1.a" and "operation.a" if it could.
You can use the pairs function to iterate over a table, allowing you to inspect what keys it has available. Additionally, you can access tables with t[k] notation instead of t.name if k is the "name" string and lua tables return nil if you access a key that it doesn't have.
function find_common_keys(t1, t2)
for k,v1 in pairs(t1) do
local v2 = t2[k]
if v2 ~= nil then
print("Found match", k, v1, v2)
end
end
end
So I have a table that holds references to other tables like:
local a = newObject()
a.collection = {}
for i = 1, 100 do
local b = newObject()
a[#a + 1] = b
end
Now if I want to see if a particular object is within "a" I have to use pairs like so:
local z = a.collection[ 99 ]
for i,j in pairs( a.collection ) do
if j == z then
return true
end
end
The z object is in the 99th spot and I would have to wait for pairs to iterate all the way throughout the other 98 objects. This set up is making my program crawl. Is there a way to make some sort of key that isn't a string or a table to table comparison that is a one liner? Like:
if a.collection[{z}] then return true end
Thanks in advance!
Why are you storing the object in the value slot and not the key slot of the table?
local a = newObject()
a.collection = {}
for i = 1, 100 do
local b = newObject()
a.collection[b] = i
end
to see if a particular object is within "a"
return a.collection[b]
If you need integer indexed access to the collection, store it both ways:
local a = newObject()
a.collection = {}
for i = 1, 100 do
local b = newObject()
a.collection[i] = b
a.collection[b] = i
end
Finding:
local z = a.collection[99]
if a.collection[z] then return true end
Don't know if it's faster or not, but maybe this helps:
Filling:
local a = {}
a.collection = {}
for i = 1, 100 do
local b = {}
a.collection[b] = true -- Table / Object as index
end
Finding:
local z = a.collection[99]
if a.collection[z] then return true end
If that's not what you wanted to do you can break your whole array into smaller buckets and use a hash to keep track which object belongs to which bucket.
you might want to consider switching from using pairs() to using a regular for loop and indexing the table, pairs() seems to be slower on larger collections of tables.
for i=1, #a.collection do
if a.collection[i] == z then
return true
end
end
i compared the speed of iterating through a collection of 1 million tables using both pairs() and table indexing, and the indexing was a little bit faster every time. try it yourself using os.clock() to profile your code.
i can't really think of a faster way of your solution other than using some kind of hashing function to set unique indexes into the a.collection table. however, doing this would make getting a specific table out a non-trivial task (you wouldn't just be able to do a.collection[99], you'd have to iterate through until you found one you wanted. but then you could easily test if the table was in a.collection by doing something like a.collection[hashFunc(z)] ~= nil...)