Breadth first search, and A* search in a graph? - graph

I understand how to use a breadth first search and A* in a tree structure, but given the following graph, how would it be implemented? In other words, how would the search traverse the graph? S is the start state
Graph Here

It's exactly the same as doing it in a tree. You just need to somehow keep track of which nodes you've already visited so that you don't end up going in circles.

Basically, you treat a graph the same way that you'd treat a tree, except you need to keep track of nodes you've already visited. That's fine for BFS. On top of that, in the case of A*, consider what you'd do when you revisit a node but have found a cheaper route to it.

Paint the graph - Recursively search each node and mark the nodes you visited as dirty. Only recurse when the graph is not dirty.
If memory is not an issue, copy the graph and instead of marking the nodes, remove them from the copy graph.

It's weighted graph. Do you want to find shortest paths or just traverse it?
If you want just traversing, here it is:
1) there is only S in the queue
2) we are adding C and A in the queue, only they are reachable from S directly (with one edge)
3) D, G2 - from C
4) B, E - from A
5) G1 - from D (G2 is already in the queue)
6) there no outgoing edge from G2
7) there's no adjacent nodes of B which aren't already in the queue
So here's the order how nodes where added in the queue: S, C, A, D, G2, B, E, G1

I don't know how helpful you will find this, but here's a complete solution coded in the functional language J (available for free from jsoftware.com).
First, it's probably simplest to work directly from a representation of the graph you show in your picture. I represent this as a (# nodes) x (# nodes) table with a number at (i,j) for the value of the link between node-i and node-j. Also, along the diagonal I've put the number associated with each node itself.
So, I enter this as follows - don't worry too much about the unfamiliar notation, you'll soon see what the result looks like:
grph=: <;.1&>TAB,&.><;._2 ] 0 : 0
A B C D E G1 G2 S
A 2 1 8 2
B 1 1 1 4 2
C 3 1 5
D 1 5 2
E 6 9 7
G1 0
G2 0
S 2 3 5
)
So, I've assigned the variable "grph" as a 9x9 table where the first row and first column are the labels "A"-"E", "G1", "G2", and "S"; I've used tabs to delimit items so this could be cut-and-pasted to or from a spreadsheet as needed.
Now, I'll check the size of my table and display it:
$grph
9 9
grph
+---+--+--+--+--+--+---+---+--+
| | A| B| C| D| E| G1| G2| S|
+---+--+--+--+--+--+---+---+--+
| A | 2| 1| | | 8| | | 2|
+---+--+--+--+--+--+---+---+--+
| B | | 1| 1| 1| | 4 | | 2|
+---+--+--+--+--+--+---+---+--+
| C | | 3| 1| | | | 5 | |
+---+--+--+--+--+--+---+---+--+
| D | | | | 1| | 5 | 2 | |
+---+--+--+--+--+--+---+---+--+
| E | | | | | 6| 9 | 7 | |
+---+--+--+--+--+--+---+---+--+
| G1| | | | | | 0 | | |
+---+--+--+--+--+--+---+---+--+
| G2| | | | | | | 0 | |
+---+--+--+--+--+--+---+---+--+
| S | 2| | 3| | | | | 5|
+---+--+--+--+--+--+---+---+--+
It looks OK and it's easy to compare this to the picture of the graph to check it.
Now I'll drop the first row and column so we're left only with numbers (as boxed literals),
and remove any extraneous tab characters.
grn=. TAB-.~&.>}.}."1 grph
You can see I assign this result to the variable "grn".
Next, I'll replace any empty cells with "_" - which represents infinity - then convert the literals to numeric representation (re-assigning the result to the same name "grn"):
grn=. ".&>(0=#&>grn)}grn,:<'_'
Finally, I'll move the last column and row to the beginning since this is the one for "S" and it's supposed to be first. I'll also display the result to confirm that it looks correct.
]grn=. _1|."1]_1|.grn NB. "S" goes first.
5 2 _ 3 _ _ _ _
2 2 1 _ _ 8 _ _
2 _ 1 1 1 _ 4 _
_ _ 3 1 _ _ _ 5
_ _ _ _ 1 _ 5 2
_ _ _ _ _ 6 9 7
_ _ _ _ _ _ 0 _
_ _ _ _ _ _ _ 0
So, now that I have a simple 8x8 table of numbers representing the graph, it's a simple matter to traverse it.
Here's a simple J function, called "traverseGraph", to read this table, traverse the graph it represents, and return two results: the indexes (0-based origin) of the nodes visited, and the values of the points and edges in the order visited.
traverseGraph=: 3 : 0
pts=. ,_-.~,ix{y [ nxt=. ix=. ,0
while. 0~:#nxt=. ~.ix-.~;([:I._~:])&.><"1 nxt{y do.
ix=. ix,nxt [ pts=. pts,_-.~,nxt{y
end.
ix;pts
)
We start by initializing three variables: the list of indexes "ix" (to zero, since we want to begin in the zeroth row of the table), a variable "nxt" to point to the next group of nodes (initially the same as the starting node), and the list of point values "pts" (starting as the 0th row of our input table, known internally as "y", with all the infinite values removed.)
In the "while." loop, we continue as long as there's more than zero "nxt" values resulting from pulling the current row out of the table and removing any nodes (in "ix") we've already visited. Inside the loop, we accumulate the next set of indexes onto the end of "nxt" and the point values onto "pts". At the end, we return the indexes and point values as our (two-element) result.
We run it like this - it displays the result by default:
traverseGraph grn
+---------------+---------------------------------------------+
|0 1 3 2 5 7 4 6|5 2 3 2 2 1 8 3 1 5 2 1 1 1 4 6 9 7 0 1 5 2 0|
+---------------+---------------------------------------------+
So, the first box contains the indexes starting with "0" and ending with "6". The second boxed item is the vector of point values in the order we accumulated them. I don't know what you do with these, so I just show them.
We can use the indexes to display the node names like this:
0 1 3 2 5 7 4 6{(<"0'SABCDE'),'G1';'G2'
+-+-+-+-+-+--+-+--+
|S|A|C|B|E|G2|D|G1|
+-+-+-+-+-+--+-+--+
I don't know how useful you'll find this but it does outline a simple solution to your problem.

Related

divide rectangle into n equal parts

i want to code a video platform and have a problem and can't think of a solution right now.
I want to divide a rectangle into equal parts
Came up with a solution for a square but i am unable to come up with a solution for different ratios.
Maybe u guys can help me.
example:
BAD GOOD
n=4
________ ________
| | | | | | | |
| | | | | |---|---|
|_|_|_|_| |___|___|
n=2
________ ________
| | | | |
| | | | |
| | | |–––––––|
| | | | |
|___|___| |_______|
Let X be the width of the rectangle and let Y be the height. Let the goal be to divide this rectangle into N rectangles of equal area whose sides are as close to equal as possible.
The solution is not difficult. First, find all factors of the N. Then, write N as a product of two numbers A and B such that A and B are as close as possible (that is, there is no other choice A' and B' such that |A' - B'| < |A - B|). Assume we chose A > B. All we have to do is put A - 1 lines along the long side of the rectangle, and B - 1 lines along the short side.
For example: n = 4, A = 2 and B = 2 is optimal, so you put A - 1 = 1 and B - 1 = 1 lines along each side of the rectangle (as in your GOOD column for n = 4).
For example: n = 21, A = 7 and B = 3 is optional, so you'd put 6 lines along the long edge of the rectangle and 2 lines along the short edge, equally spaced:
_ _ _ _ _ _ _
|_|_|_|_|_|_|_|
|_|_|_|_|_|_|_|
|_|_|_|_|_|_|_|
Of course, for prime values of N, you are not going to get a very nice solution, but then there is no nice solution in that case. In such cases - where A and B are very very different and the rectangle's dimensions are not similarly different - you might want to choose another solution that doesn't require all rectangles have the same side lengths. You could do better by allowing 2 or 3 kinds of rectangles, or more, into the solution, for instance.

Selecting columns in R depending on user input

I have a dataframe col_metadata in R that goes as:
sample | b | c | ...
____________________
S1 | 1 | 1 | ...
S2 | 1 | 2 | ...
S3 | 2 | 2 | ...
S4 | 3 | 3 | ...
I want to make a function that gives me samples that have given values in front of them. For eg.,
fun(b,c(1,2))
should return
S1 S2 S3
while
fun(c,c(2,3))
should return
S2 S3 S4
and so on. If the column would have been fixed (say, b), I could simply do:
col_metaData[col_metaData$b %in% inputList,]$sample
But since there can be many more columns(hence I can't use if-else), I was looking for a different method to do the same. Can someone please help me do this? Thanks...
I solved it. Just in case anyone comes looking for an answer, we can use this:
col_metaData[col_metaData[,b] %in% inputList,]$sample
Notice [,b] instead of $b.

Count merged observations and calculate fraction

I merged two data sets using Stata and now I need to find the fraction and number of projects matched. To do this, I am assuming that I will need to calculate two counts.
How do I get both of the counts to display at the same time, and then divide one by the other?
Below is an example of my _merge variable:
4022. | master only (1) |
4023. | matched (3) |
4024. | using only (2) |
4025. | using only (2) |
4026. | using only (2) |
4027. | matched (3) |
4028. | matched (3) |
4029. | matched (3) |
4030. | matched (3) |
I would first like to count and store all of the variables under _merge, and then count those that don't say "master only". Then divide the two by each other.
For example:
count1 count2 fraction
6019 4020 .66 (4020/6019)
With count1 being everything under _merge, while count2 being everything that was matched (excludes master only).
Using the following toy example:
clear
webuse autosize
merge 1:1 make using http://www.stata-press.com/data/r14/autoexpense
First it is a good idea to confirm the value which corresponds to "master only":
list _merge
+-----------------+
| _merge |
|-----------------|
1. | matched (3) |
2. | matched (3) |
3. | matched (3) |
4. | master only (1) |
5. | matched (3) |
|-----------------|
6. | matched (3) |
+-----------------+
list _merge, nolabel
+--------+
| _merge |
|--------|
1. | 3 |
2. | 3 |
3. | 3 |
4. | 1 |
5. | 3 |
|--------|
6. | 3 |
+--------+
Then generate the three variables by first counting the relevant observations and dividing:
count if _merge
generate count1 = r(N)
count if _merge != 1
generate count2 = r(N)
generate fraction = count2 / count1
display count1
6
display count2
5
display fraction
1.2

SQLite find table row where a subset of columns satisfies a specified constraint

I have the following SQLite table
CREATE TABLE visits(urid INTEGER PRIMARY KEY AUTOINCREMENT,
hash TEXT,dX INTEGER,dY INTEGER,dZ INTEGER);
Typical content would be
# select * from visits;
urid | hash | dx | dY | dZ
------+-----------+-------+--------+------
1 | 'abcd' | 10 | 10 | 10
2 | 'abcd' | 11 | 11 | 11
3 | 'bcde' | 7 | 7 | 7
4 | 'abcd' | 13 | 13 | 13
5 | 'defg' | 20 | 21 | 17
What I need to do here is identify the urid for the table row which satisfies the constraint
hash = 'abcd' AND (nearby >= (abs(dX - tX) + abs(dY - tY) + abs(dZ - tZ))
with the smallest deviation - in the sense of smallest sum of absolute distances
In the present instance with
nearby = 7
tX = tY = tZ = 12
there are three rows that meet the above constraint but with different deviations
urid | hash | dx | dY | dZ | deviation
------+-----------+-------+--------+--------+---------------
1 | 'abcd' | 10 | 10 | 10 | 6
2 | 'abcd' | 11 | 11 | 11 | 3
4 | 'abcd' | 12 | 12 | 12 | 3
in which case I would like to have reported urid = 2 or urid = 3 - I don't actually care which one gets reported.
Left to my own devices I would fetch the full set of matching rows and then dril down to the one that matches my secondary constraint - smallest deviation - in my own Java code. However, I suspect that is not necessary and it can be done in SQL alone. My knowledge of SQL is sadly too limited here. I hope that someone here can put me on the right path.
I now have managed to do the following
CREATE TEMP TABLE h1(v1 INTEGER,v2 INTEGER);
SELECT urid,(SELECT (abs(dX - 12) + abs(dY - 12) + abs(dZ - 12))) devi FROM visits WHERE hash = 'abcd';
which gives
--SELECT * FROM h1
urid | devi |
-------+-----------+
1 | 6 |
2 | 3 |
4 | 3 |
following which I issue
select urid from h1 order by v2 asc limit 1;
which yields urid = 2, the result I am after. Whilst this works, I would like to know if there is a better/simpler way of doing this.
You're so close! You have all of the components you need, you just have to put them together into a single query.
Consider:
SELECT urid
, (abs(dx - :tx) + abs(dy - :tx) + abs(dz - :tx)) AS devi
FROM visits
WHERE hash=:hashval AND devi < :nearby
ORDER BY devi
LIMIT 1
Line by line, first you list the rows and computed values you want (:tx is a placeholder; in your code you want to prepare a statement and then bind values to the placeholders before executing the statement) from the visit table.
Then in the WHERE clause you restrict what rows get returned to those matching the particular hash (That column should have an index for best results... CREATE INDEX visits_idx_hash ON visits(hash) for example), and that have a devi that is less than the value of the :nearby placeholder. (I think devi < :nearby is clearer than :nearby >= devi).
Then you say that you want those results sorted in increasing order according to devi, and LIMIT the returned results to a single row because you don't care about any others (If there are no rows that meet the WHERE constraints, nothing is returned).

How to eliminate nothing elements in a array (1D) in Julia?

I would like to know how I could eliminate nothing elements in a Julia array (1D) like the one below. It was built from reading a text file with lines with no relevant information mixed with lines with relevant information. "nothing" is type Void and I would like to clean the array of all of it.
nothing
nothing
nothing
nothing
nothing
" -16.3651\t 0.1678\t -4.6997\t -14.0152\t -2.6855\t -16.0294\t -7.8049\t -27.1912\t -5.0354\t -14.5187\t\r\n"
" -16.4490\t -1.0910\t -3.6087\t -12.6724\t -1.5945\t -14.7705\t -7.2174\t -25.2609\t -3.7766\t -14.3509\t\r\n"
" -16.4490\t -2.2659\t -2.4338\t -10.9100\t -0.5875\t -13.6795\t -6.7139\t -22.9950\t -2.9373\t -14.0991\t\r\n"
testvector[testvector.!=nothing] is also a very readable option.
benchmarking can help choose the most efficient code.
How are you reading that file?
You can filter out nothings from an array:
filter(x -> !is(nothing, x), [nothing, 42]) # => Any[42]
But you may want to clean your data first, with a tsv (tab separated values) file like this:
-16.3651 0.1678 -4.6997 -14.0152 -2.6855 -16.0294 -7.8049 -27.1912 -5.0354 -14.5187
-16.4490 -1.0910 -3.6087 -12.6724 -1.5945 -14.7705 -7.2174 -25.2609 -3.7766 -14.3509
-16.4490 -2.2659 -2.4338 -10.9100 -0.5875 -13.6795 -6.7139 -22.9950 -2.9373 -14.0991
Using readdlm:
julia> readdlm("data.tsv")
3x10 Array{Float64,2}:
-16.3651 0.1678 -4.6997 -14.0152 … -27.1912 -5.0354 -14.5187
-16.449 -1.091 -3.6087 -12.6724 -25.2609 -3.7766 -14.3509
-16.449 -2.2659 -2.4338 -10.91 -22.995 -2.9373 -14.0991
Using DataFrmaes.readtable:
julia> df = readtable("data.tsv");
julia> names!(df, [symbol(x) for x in 'A':'J'])
2x10 DataFrames.DataFrame
| Row | A | B | C | D | E | F | G |
|-----|---------|---------|---------|----------|---------|----------|---------|
| 1 | -16.449 | -1.091 | -3.6087 | -12.6724 | -1.5945 | -14.7705 | -7.2174 |
| 2 | -16.449 | -2.2659 | -2.4338 | -10.91 | -0.5875 | -13.6795 | -6.7139 |
| Row | H | I | J |
|-----|----------|---------|----------|
| 1 | -25.2609 | -3.7766 | -14.3509 |
| 2 | -22.995 | -2.9373 | -14.0991 |
one simple way is using filter! function to update your vector like this:
testvector=[fill(nothing,10) ; [1,2,3]];
# =>13-element Array{Any,1}:
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# 1
# 2
# 3
filter!(x->x!=nothing, testvector)
# => 3-element Array{Any,1}:
# 1
# 2
# 3
thanks #Daniel Arndt
EDIT, Refer to this paragraph from Julia doc:
nothing is a special value that does not print anything at the
interactive prompt. Other than not printing, it is a completely normal
value and you can test for it programmatically.
I think all of the conditions below, reach us to the same result
x!=nothing
x!==nothing
!is(x,nothing)
!isa(x,Void)
typeof(x)!=Void
To add to the answers above, it appears:
filter(!isnothing, [nothing, 42])
is a working shorthand for filter(x -> !isnothing(x), [nothing, 42]), and will correctly return 42.
Dear All,
At the end, the code became this:
tmpFile=open(fileName)
tmp=readdlm(tmpFile);
ind=pmap(typeof,tmp[:,1]).!=SubString{ASCIIString}; # if the first column typeof is string, than pmap will return false, else, it return true. This will provide an index of valid/not valid rows.
tmpClean=tmp[ind,:]; # only valid rows will be used
If you may have any suggestion to improve it, I would appreciate it. Thank you for your help.

Resources