Hmmm Assembly Fibonacci Sequence - recursion

I need to write and print out the Fibonacci sequence up to a given integer (can choose yourself)
I have to do this in Hmmm... Assembly
It gets stuck in infinite recursion, but I have no idea why
00 read r4 # User input
01 setn r4 -1 # adds -1 to r4
02 setn r1 1 # r1 == 1
03 setn r2 0 # r2 == 0
04 setn r5 1 # used as the first number of the fibonacci sequence
05 write r5 # 1
06 jeqzn r4 13 # if r4 == 0, the fibonacci sequence stops
07 add r3 r1 r2 # r3 = r1 + r2
08 addn r4 -1 # r4 = r4 -1
09 copy r2 r1 # r2 now equals r1
10 copy r1 r3 # r1 nog equals r3
11 write r3 # prints fibonacci number
12 jumpn 06 # checks if r4 == 0
13 halt # stops
current output:
1
1
2
3
5
8
13
21
34
55
89
144
233
377
..
..
Wanted output (example): if input (r4) = 10
1
1
2
3
5
8
13
21
34
55
08 addn r4 -1 (r4 should eventually end up being 0)
06 jeqzn r4 13 (should check when it's true, and it should halt)
What prevents it from halting?

Looks like line 1 sets r4 (the input) to -1 instead of the desired subtracting 1, so it should be addn r4 -1.
It's also worth noting the current implementation is iterative and not recursive, and the loop doesn't appear to be infinite but just really long as it would have to count down from -1 to wrap around to 0 (assuming addn does not saturate).

Related

Merge two data frames in R by the closest default options, not by excess

i have two dataframes:
df_bestquotes
df_transactions
df_transactions:
day time vol price buy ask bid
1 43688,08 100 195,8 1 195,8 195,74
1 56357,34 20 192,87 1 192,87 192,86
1 57576,14 14 192,48 -1 192,48 192,46
2 50468,29 3 193,83 1 193,86 193,77
2 56107,54 11 194,17 -1 194,2 194,16
7 42549,66 100 188,81 -1 188,85 188,78
7 42724,38 200 188,62 -1 188,66 188,61
7 48924,66 5 189,59 -1 189,62 189,59
8 48950,14 52 187,66 -1 187,7 187,66
9 36242,86 89 186,61 1 186,62 186,56
9 53910,46 1 189,81 -1 189,87 189,81
10 47041,94 15 187,87 -1 187,88 187,86
13 34380,73 87 187,29 -1 187,42 187,27
13 36037,18 100 188,94 1 188,95 188,94
14 46644,64 100 189,29 -1 189,34 189,29
14 57571,12 52 190,03 1 190,03 190
15 36418,71 45 192,07 1 192,07 192,04
15 37223,77 100 191,09 -1 191,07 191,06
17 37245,59 100 186,45 -1 186,47 186,45
23 34200,39 50 189,29 -1 189,29 189,27
24 40294,73 60 193,52 -1 193,54 193,5
29 52813,68 5 202,99 -1 203,01 202,99
29 55279,13 93 203,97 -1 203,98 203,9
30 51356,91 68 204,41 -1 204,45 204,4
30 53530,24 40 204,14 -1 204,18 204,14
df_bestquotes:
day time best_ask best_bid
1 51384,613 31,78 31,75
1 56593,74 31,6 31,55
3 40568,217 31,36 31,32
7 39169,237 31,34 31,28
8 44715,713 31,2 31,17
8 53730,707 31,24 31,19
8 55851,75 31,17 31,14
10 49376,267 31,06 30,99
16 48610,483 30,75 30,66
16 57360,917 30,66 30,64
17 53130,717 30,39 30,32
20 46353,133 30,72 30,63
23 46429,67 29,7 29,64
24 37627,727 29,81 29,63
24 46354,647 29,92 29,77
24 53863,69 30,04 29,93
24 53889,923 30,03 29,95
24 59047,223 29,99 29,2
28 39086,407 30,87 30,83
28 41828,703 30,87 30,8
28 50489,367 30,99 30,87
29 54264,467 30,97 30,85
30 34365,95 31,21 30,99
30 39844,357 31,06 31
30 57550,523 31,18 31,15
For each record of the df_transactions, from the day and time, I need to find the best_ask and the best_bid that was just before that moment, and incorporate this information to df_transactions.
df_joined: df_transactions + df_bestquotes
day time vol price buy ask bid best_ask best_bid
1 43688,08 100 195,8 1 195,8 195,74
1 56357,34 20 192,87 1 192,87 192,86
1 57576,14 14 192,48 -1 192,48 192,46
2 50468,29 3 193,83 1 193,86 193,77
2 56107,54 11 194,17 -1 194,2 194,16
7 42549,66 100 188,81 -1 188,85 188,78
7 42724,38 200 188,62 -1 188,66 188,61
7 48924,66 5 189,59 -1 189,62 189,59
8 48950,14 52 187,66 -1 187,7 187,66
9 36242,86 89 186,61 1 186,62 186,56
9 53910,46 1 189,81 -1 189,87 189,81
10 47041,94 15 187,87 -1 187,88 187,86
13 34380,73 87 187,29 -1 187,42 187,27
13 36037,18 100 188,94 1 188,95 188,94
14 46644,64 100 189,29 -1 189,34 189,29
14 57571,12 52 190,03 1 190,03 190
15 36418,71 45 192,07 1 192,07 192,04
15 37223,77 100 191,09 -1 191,07 191,06
17 37245,59 100 186,45 -1 186,47 186,45
23 34200,39 50 189,29 -1 189,29 189,27
24 40294,73 60 193,52 -1 193,54 193,5
29 52813,68 5 202,99 -1 203,01 202,99
29 55279,13 93 203,97 -1 203,98 203,9
30 51356,91 68 204,41 -1 204,45 204,4
30 53530,24 40 204,14 -1 204,18 204,14
I have tried with the next code, but it doesn't work:
library(data.table)
df_joined = df_bestquotes[df_transactions, on="time", roll = "nearest"]
Here are the real files with a lot more records, the ones I put before are an example of only 25 records.
df_transactions_original
df_bestquotes_original
And my code in R:
matching.R
Any suggestions on how to get it? Thanks a lot, guys.
The attempt you made uses data.table but you don't refer to data.table. Have you done library(data.table) before ?
I think it should rather be :
df_joined = df_bestquotes[df_transactions, on=.(day, time), roll = TRUE]
But I cannot test without the objects. Does it work ? roll="nearest" doesn't give you the previous best quotes but the nearest.
EDIT : Thanks for the objects, I checked, that works for me :
library(data.table)
dfb <- fread("df_bestquotes.csv", dec=",")
dft <- fread("df_transactions.csv", dec = ",")
dfb[, c("day2", "time2") := .(day,time)] # duplicated to keep track of the best quotes days
joinedDf <- dfb [dft, on=.(day, time), roll = +Inf]
It puts NA when there is no best quotes for the day. If you want to roll across days, I suggest you create a unique measure of time. I don't know exactly what time is. Considering the units of time is seconds :
dfb[, uniqueTime := day + time/(60*60*24)]
dft[, uniqueTime := day + time/(60*60*24)]
joinedDf <- dfb [dft, on=.(uniqueTime), roll = +Inf]
This works even if time is not seconds, only the ranking is important in this case.
Good morning #samuelallain, yes, I have used library(data.table) before.
I've edited it in the main commentary.
I have tried its solution and RStudio returns the following error:
library(data.table)
df_joined = df_bestquotes[df_transactions, on=.("day", "time"), roll = TRUE]
Error in [.data.frame(df_bestquotes, df_transactions, on = .(day, time), :
unused arguments (on = .("day", "time"), roll = TRUE)
Thank you.

how to optimize away common subexpressions?

select x+y as z,
case
when "x"+"y" < 0 then "Less Than Zero"
when "x"+"y" > 0 then "Non Zero"
else "Zero"
end
from sometable;
Returns expected result, but the addition is done with each row of data multiple times.
I am trying to optimize the query as follows but not working..
select x+y as z,
case
when "z" < 0 then "Less Than Zero"
when "z" > 0 then "Non Zero"
else "Zero"
end
from sometable;
Always returns "Less Than Zero".
What am I doing wrong on this query? How can I avoid adding A and B multiple times while the query is being executed?
Column aliases in the SELECT clause are not available in other expressions in the same SELECT clause. (What should happen with SELECT x AS y, y AS x ...?)
You can make such an alias available by moving it into a subquery:
SELECT z,
CASE WHEN z < 0 THEN 'Less Than Zero'
WHEN z > 0 THEN 'Non Zero'
ELSE 'Zero'
END
FROM (SELECT x + y AS z
FROM sometable);
However, this only saves typing; it does not actually optimize away the duplicate computation:
sqlite> explain select z, z from (select x+y as z from sometable);
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 11 0 00 Start at 11
1 OpenRead 1 2 0 2 00 root=2 iDb=0; sometable
2 Rewind 1 9 0 00
3 Column 1 0 3 00 r[3]=sometable.x
4 Column 1 1 4 00 r[4]=sometable.y
5 Add 4 3 1 00 r[1]=r[4]+r[3]
6 Add 4 3 2 00 r[2]=r[4]+r[3]
7 ResultRow 1 2 0 00 output=r[1..2]
8 Next 1 3 0 01
9 Close 1 0 0 00
10 Halt 0 0 0 00
11 Transaction 0 0 1 0 01 usesStmtJournal=0
12 TableLock 0 2 0 sometable 00 iDb=0 root=2 write=0
13 Goto 0 1 0 00

I am trying to figure out how to parse a webpage

I am working on a summer project. To grab course information from my school website.
I start off by going here: http://www.uah.edu/cgi-bin/schedule.pl?file=fall2015.html&segment=
to gather the course departments.
Then I grab info from pages like this one.
I have what I need filtered down to a list like:
[1] "91091 211 01 PRINC OF FINANCIAL ACCOUNTING 3.0 55 22 33 0 MW 12:45PM 02:05PM BAB 106 Rose-Green E"
[2] "91092 211 02 PRINC OF FINANCIAL ACCOUNTING 3.0 53 18 35 0 TR 09:35AM 10:55AM BAB 123 STAFF"
[3] "91093 211 03 PRINC OF FINANCIAL ACCOUNTING 3.0 48 29 19 0 TR 05:30PM 06:50PM BAB 220 Hoskins J"
[4] "91094 212 01 MANAGEMENT ACCOUNTING 3.0 55 33 22 0 MWF 11:30AM 12:25PM BAB 106 Hoskins J"
[5] "91095 212 02 MANAGEMENT ACCOUNTING 3.0 55 27 28 0 TR 02:20PM 03:40PM BAB 106 Bryson R"
However my issues are as follows:
www.uah.edu/cgi-bin/schedule.pl?file=fall2015.html&segment=CS
I need to add the department from each url. In the link I gave, the department was "CS". I need to have that included with each entry.
I need to turn this into a table, or some other object where I can reference the data like
Max Wait
CRN Course Title Credit Enrl Enrl Avail List Days Start End Bldg Room Instructor
------ ---------- ------------------------------ ------ ---- ---- -------- ---- ------- ------- ------- ----- ---------- --------------------
Basically how the data is displayed on the page.
So my end goal is to go through each of those links I grab, get all the course info(except the section type). Then put it into a giant data.frame that has all the courses like this.
Department CRN Course Title Credit MaxEnrl Enrl Avail WaitList Days Start End Bldg Room Instructor
ACC 91095 212 02 MANAGEMENT ACCOUNTING 3.0 55 27 28 0 TR 02:20PM 03:40PM BAB 106 Bryson R
So far I have this working
require(data.table)
require(gdata)
library(foreach)
uah <- readLines('http://www.uah.edu/cgi-bin/schedule.pl?file=fall2015.html&segment=')
uah <- substring(uah[grep('fall2015', uah)], 10)
uah <- sub("\\\"(.*)", "", uah)
uah <- paste("http://www.uah.edu" , uah , sep = "")
gatherClasses <- function(url){
dep <- readLines(url)
dep <- dep[grep('[[:digit:][:digit:][:digit:][:digit:]][[:digit:][:digit:][:digit:]] [[:digit:][:digit:]]', dep)]
dep <- substring(dep, 6)
dep <- foreach(i=dep) %do% i[grep('[[:digit:][:digit:][:digit:][:digit:]][[:digit:][:digit:][:digit:]] [[:digit:][:digit:]]', i)]
dep <- foreach(i=dep) %do% trim(i)
dep <- dep[2:length(dep)]
return(dep)
}
x <- gatherClasses(uah[1])
x <-unlist(x)
I am having trouble split the data in the right places. I am not sure what I should try next.
EDIT:(Working Now)
require(data.table)
require(gdata)
library(foreach)
uah <- readLines('http://www.uah.edu/cgi-bin/schedule.pl?file=sum2015b.html&segment=')
uah <- substring(uah[grep('sum2015b', uah)], 10)
uah <- sub("\\\"(.*)", "", uah)
uah <- paste("http://www.uah.edu" , uah , sep = "")
gatherClasses <- function(url){
L <- readLines(url)
Fields <- sub(" *$", " ", grep("---", L, value = TRUE))
widths <- diff(c(0, gregexpr(" ", Fields)[[1]]))
Data <- grep("\\d{5} \\d{3}", L, value = TRUE)
classes <- read.fwf(textConnection(Data), widths, as.is = TRUE, strip.white = TRUE)
classes$department <- unlist(strsplit(url, '='))[3]
return(classes)
}
allClasses = foreach(i=uah) %do% gatherClasses(i)
allClasses <- do.call("rbind", allClasses)
write.table(mydata, "c:/sum2015b.txt", sep="\t")
Read the lines into L, grab the "--- ---- etc." line into Fields and ensure that there is exactly one space at the end. Find the character positions of the spaces and difference them to get the field widths. Finally grep out the data portion and read it in using read.fwf which reads fixed width fields. For example, for Art History:
URL <- "http://www.uah.edu/cgi-bin/schedule.pl?file=fall2015.html&segment=ARH"
L <- readLines(URL)
Fields <- sub(" *$", " ", grep("---", L, value = TRUE))
widths <- diff(c(0, gregexpr(" ", Fields)[[1]]))
Data <- grep("\\d{5} \\d{3} \\d{2}", L, value = TRUE)
read.fwf(textConnection(Data), widths, as.is = TRUE, strip.white = TRUE)
giving:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
1 90628 100 01 ARH SURV:ANCIENT-MEDIEVAL 3 35 27 8 0 TR 12:45PM 02:05PM WIL 168 Joyce L
2 90630 101 01 ARH SURV:RENAISSANCE-MODERN 3 35 14 21 0 MW 12:45PM 02:05PM WIL 168 Stewart D
3 90631 101 02 ARH SURV:RENAISSANCE-MODERN 3 35 8 27 0 MW 03:55PM 05:15PM WIL 168 Stewart D
4 92269 101 03 ARH SURV:RENAISSANCE-MODERN 3 35 5 30 0 TR 11:10AM 12:30PM WIL 168 Shapiro Guanlao M
5 90632 101 04 ARH SURV:RENAISSANCE-MODERN 3 35 13 22 0 TR 02:20PM 03:40PM WIL 168 Shapiro Guanlao M
6 90633 301 01 ANCIENT GREEK ART 3 18 3 15 0 MW 02:20PM 03:40PM WIL 168 Joyce L
7 92266 306 01 COLLAPSE OF CIVILIZATIONS 3 10 4 6 0 TR 12:45PM 02:05PM SST 205 Sever T
8 W 90634 309 01 CONTEMPORARY ART & ISSUES 3 18 10 8 0 TR 09:35AM 10:55AM WIL 168 Stewart D
9 90635 320 01 ST: MODERN ARCHITECTURE 3 12 0 12 0 TR 11:10AM 12:30PM WIL 172 Takacs T
10 90636 400 01 SENIOR THESIS 3 0 0 0 0 TBA TBA TBA TBA Joyce L
11 90637 400 02 SENIOR THESIS 3 0 0 0 0 TBA TBA TBA TBA Stewart D
I wrote and donated that schedule.pl script about 20 years ago because they simply published the flat mainframe files of all the courses on offer for each session. The script's job is to break up the whole set and present it in human-consumable chunks. (That, and back then a browser would choke on that much data.) I understand from one of the former UAH IT people that they tried to do away with it once, but got a great hew and cry from users, so they figured out how to keep it working.
It would be easier for you to ask the UAH IT folks if you can't just retrieve the underlying flat file. It used to be on a public-facing URL, but like I said, that was about 20 years ago, so I don't recall the specifics. The output you see when viewing courses is the same as the flat file, but the flat file contains every department, so you don't have to fetch each separately.

Performence for calculating the distance between two positions on a tree?

Here is a tree. The first column is an identifier for the branch, where 0 is the trunk, L is the first branch on the left and R is the first branch on the right. LL is the branch on the extreme left after the second bifurcation, etc.. the variable length contains the length of each branch.
> tree
branch length
1 0 20
2 L 12
3 LL 19
4 R 19
5 RL 12
6 RLL 10
7 RLR 12
8 RR 17
tree = data.frame(branch = c("0","L", "LL", "R", "RL", "RLL", "RLR", "RR"), length=c(20,12,19,19,12,10,12,17))
tree$branch = as.character(tree$branch)
and here is a drawing of this tree
Here are two positions on this tree
posA = tree[4,]; posA$length = 12
posB = tree[6,]; posB$length = 3
The positions are given by the branch ID and the distance (variable length) to the origin of the branch (more info in edits).
I wrote the following messy distance function to calculate the shortest distance along the branches between any two points on the tree. The shortest distance along the branches can be understood as the minimal distance an ant would need to walk along the branches to reach one position from the other position.
distance = function(tree, pos1, pos2){
if (identical(pos1$branch, pos2$branch)){Dist=pos1$length-pos2$length;return(Dist)}
pos1path = strsplit(pos1$branch, "")[[1]]
if (pos1path[1]!="0") {pos1path = c("0", pos1path)}
pos2path = strsplit(pos2$branch, "")[[1]]
if (pos2path[1]!="0") {pos2path = c("0", pos2path)}
loop = 1:min(length(pos1path), length(pos2path))
loop = loop[-which(loop == 1)]
CommonTrace="included"; for (i in loop) {
if (pos1path[i] != pos2path[i]) {
CommonTrace = i-1; break
}
}
if(CommonTrace=="included"){
CommonTrace = min(length(pos1path), length(pos2path))
if (length(pos1path) > length(pos2path)) {
longerpos = pos1; shorterpos = pos2; longerpospath = pos1path
} else {
longerpos = pos2; shorterpos = pos1; longerpospath = pos2path
}
distToNode = 0
if ((CommonTrace+1) != length(longerpospath)){
for (i in (CommonTrace+1):(length(longerpospath)-1)){
distToNode = distToNode + tree$length[tree$branch == paste0(longerpospath[2:i], collapse='')]
}
}
Dist = distToNode + longerpos$length + (tree[tree$branch == shorterpos$branch,]$length-shorterpos$length)
if (identical(shorterpos, pos1)){Dist=-Dist}
return(Dist)
} else { # if they are sisterbranch
Dist=0
if((CommonTrace+1) != length(pos1path)){
for (i in (CommonTrace+1):(length(pos1path)-1)){
Dist = Dist + tree$length[tree$branch == paste0(pos1path[2:i], collapse='')]
}
}
if((CommonTrace+1) != length(pos2path)){
for (i in (CommonTrace+1):(length(pos2path)-1)){
Dist = Dist + tree$length[tree$branch == paste(pos2path[2:i], collapse='')]
}
}
Dist = Dist + pos1$length + pos2$length
return(Dist)
}
}
I think the algorithm works fine but it is not very efficient. Note the sign of the distance that is important. This sign only makes sense when the two positions are not found on "sister branches". That is the sign makes sense only if one of the two positions is found in the way between the roots and the other position.
distance(tree, posA, posB) # -22
I then just loop through all positions of interest like that:
allpositions=rbind(tree, tree)
allpositions$length = c(1,5,8,2,2,3,5,6,7,8,2,3,1,2,5,6)
mat = matrix(-1, ncol=nrow(allpositions), nrow=nrow(allpositions))
for (i in 1:nrow(allpositions)){
for (j in 1:nrow(allpositions)){
posA = allpositions[i,]
posB = allpositions[j,]
mat[i,j] = distance(tree, posA, posB)
}
}
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# 1 0 -24 -39 -21 -40 -53 -55 -44 -6 -27 -33 -22 -39 -52 -55 -44
# 2 24 0 -15 7 26 39 41 30 18 -3 -9 8 25 38 41 30
# 3 39 15 0 22 41 54 56 45 33 12 6 23 40 53 56 45
# 4 21 7 22 0 -19 -32 -34 -23 15 10 16 -1 -18 -31 -34 -23
# 5 40 26 41 19 0 -13 -15 8 34 29 35 18 1 -12 -15 8
# 6 53 39 54 32 13 0 8 21 47 42 48 31 14 1 8 21
# 7 55 41 56 34 15 8 0 23 49 44 50 33 16 7 0 23
# 8 44 30 45 23 8 21 23 0 38 33 39 22 7 20 23 0
# 9 6 -18 -33 -15 -34 -47 -49 -38 0 -21 -27 -16 -33 -46 -49 -38
# 10 27 3 -12 10 29 42 44 33 21 0 -6 11 28 41 44 33
# 11 33 9 -6 16 35 48 50 39 27 6 0 17 34 47 50 39
# 12 22 8 23 1 -18 -31 -33 -22 16 11 17 0 -17 -30 -33 -22
# 13 39 25 40 18 -1 -14 -16 7 33 28 34 17 0 -13 -16 7
# 14 52 38 53 31 12 -1 7 20 46 41 47 30 13 0 7 20
# 15 55 41 56 34 15 8 0 23 49 44 50 33 16 7 0 23
# 16 44 30 45 23 8 21 23 0 38 33 39 22 7 20 23 0
As an example, let's consider the first and the third positions in the object allpositions. The distance between them is 39 (and -39) because an ant would need to walk 19 units on branch 0 and then walk 12 units on branch L and finally the ant would need to walk 8 units on branch LL. 19 + 12 + 8 = 39
The issue is that I have about 20 very big trees with about 50000 positions and I would like to calculate the distance between any two positions. There are therefore 20 * 50000^2 distances to compute. It takes forever! Can you help me to improve my code?
EDIT
Please let me know if anything is still unclear
tree is a description of a tree. The tree has branches of a certain length. The name of the branches (variable: branch) gives indication about the relationship between the branches. The branch RL is a "parent branch" of the two branches RLL and RLR, where R and L stand for right and left.
allpositions is an data.frame, where each line represents one independent position on the tree. You can think of the position of a squirrel. The position is defined by two information. 1) The branch (variable: branch) on which the squirrel is standing and the the distance between the beginning of the branch and the position of the squirrel (variable: length).
Three examples
Consider a first squirrel that is at position (variable: length) 8 on the branch RL (which length is 12) and a second squirrel that is at position (variable: length) 2 on the branch RLL or RLR. The distance between the two squirrels is 12 - 8 + 2 = 6 (or -6).
Consider a first squirrel that is at position (variable: length) 8 on the branch RL and a second squirrel that is at position (variable: length) 2 on the branch RR. The distance between the two squirrels is 8 + 2 = 10 (or -10).
Consider a first squirrel that is at position (variable: length) 8 on the branch R (which length is 19) and a second squirrel that is at position (variable: length) 2 on the branch RLL. Knowing the that branch RL has a length of 12, the distance between the two squirrels is 19 - 8 + 12 + 2 = 25 (or -25).
The code below uses the igraph package to compute the distances between positions in tree and seems noticeably faster than the code you posted in your question. The approach is to create graph vertices at branch intersections and at positions along tree branches at the positions specified in allpositions. Graph edges are the branch segments between these vertices. It uses igraph to build a graph for the tree and allpositions and then finds the distances between the vertices corresponding to allposition data.
t.graph <- function(tree, positions) {
library(igraph)
# Assign vertex name to tree branch intersections
n_label <- nchar(tree$branch)
tree$high_vert <- tree$branch
tree$low_vert <- tree$branch
tree$brnch_type <- "tree"
for( i in 1:nrow(tree) ) {
tree$low_vert[i] <- if(n_label[i] > 1) substr(tree$branch[i], 1, n_label[i]-1)
else { if(tree$branch[i] %in% c("R","L")) "0"
else "root" }
}
# combine position data with tree data
positions$brnch_type <- "position"
temp <- merge(positions, tree, by = "branch")
positions <- temp[, c("branch","length.x","high_vert","low_vert","brnch_type.x")]
positions$high_vert <- paste(positions$branch, positions$length.x, sep="_")
colnames(positions) <- c("branch","length","high_vert","low_vert","brnch_type")
tree <- rbind(tree, positions)
# use positions to segment tree branches
tree_brnch <- split(tree, tree$branch)
tree <- data.frame( branch=NA_character_, length = NA_real_, high_vert = NA_character_,
low_vert = NA_character_, brnch_type =NA_character_, seg_len= NA_real_)
for( ib in 1: length(tree_brnch)) {
brnch_seg <- tree_brnch[[ib]][order(tree_brnch[[ib]]$length, decreasing=TRUE), ]
n_seg <- nrow(brnch_seg)
brnch_seg$seg_len <- brnch_seg$length
for( is in 1:(n_seg-1) ) {
brnch_seg$seg_len[is] <- brnch_seg$length[is] - brnch_seg$length[is+1]
brnch_seg$low_vert[is] <- brnch_seg$high_vert[is+1]
}
tree <- rbind(tree, brnch_seg)
}
tree <- tree[-1,]
# Create graph of tree and positions
tree_graph <- graph.data.frame(tree[,c("low_vert","high_vert")])
E(tree_graph)$label <- tree$high_vert
E(tree_graph)$brnch_type <- tree$brnch_type
E(tree_graph)$weight <- tree$seg_len
# calculate shortest distances between position vertices
position_verts <- V(tree_graph)[grep("_", V(tree_graph)$name)]
vert_dist <- shortest.paths(tree_graph, v=position_verts, to=position_verts, mode="all")
return(dist_mat= vert_dist )
}
I've benchmarked igraph code ( the t.graph function) against the code posted in your question by making a function named Remi for your code over allposition data using your distance function. Sample trees were created as extensions of your tree and allpositions data for trees of 64, 256, and 2048 branches and allpositions equal to twice these sizes. Comparisons of execution times are shown below. Notice that times are in milliseconds.
microbenchmark(matR16 <- Remi(tree, allpositions), matG16 <- t.graph(tree, allpositions),
matR256 <- Remi(tree256, allpositions256), matG256 <- t.graph(tree256, allpositions256), times=2)
Unit: milliseconds
expr min lq mean median uq max neval
matR8 <- Remi(tree, allpositions) 58.82173 58.82173 59.92444 59.92444 61.02714 61.02714 2
matG8 <- t.graph(tree, allpositions) 11.82064 11.82064 13.15275 13.15275 14.48486 14.48486 2
matR256 <- Remi(tree256, allpositions256) 114795.50865 114795.50865 114838.99490 114838.99490 114882.48114 114882.48114 2
matG256 <- t.graph(tree256, allpositions256) 379.54559 379.54559 379.76673 379.76673 379.98787 379.98787 2
Compared to the code you posted, the igraph results are only about 5 times faster for the 8 branch case but are over 300 times faster for 256 branches so igraph seems to scale better with size. I've also benchmarked the igraph code for the 2048 branch case with the following results. Again times are in milliseconds.
microbenchmark(matG8 <- t.graph(tree, allpositions), matG64 <- t.graph(tree64, allpositions64),
matG256 <- t.graph(tree256, allpositions256), matG2k <- t.graph(tree2k, allpositions2k), times=2)
Unit: milliseconds
expr min lq mean median uq max neval
matG8 <- t.graph(tree, allpositions) 11.78072 11.78072 12.00599 12.00599 12.23126 12.23126 2
matG64 <- t.graph(tree64, allpositions64) 73.29006 73.29006 73.49409 73.49409 73.69812 73.69812 2
matG256 <- t.graph(tree256, allpositions256) 377.21756 377.21756 410.01268 410.01268 442.80780 442.80780 2
matG2k <- t.graph(tree2k, allpositions2k) 11311.05758 11311.05758 11362.93701 11362.93701 11414.81645 11414.81645 2
so the distance matrix for about 4000 positions is calculated in less than 12 seconds.
t.graph returns the distance matrix where the rows and columns of the matrix are labeled by branch names - position on the branch so for example
0_7 0_1 L_8 L_5 LL_8 LL_2 R_3 R_2 RL_2 RL_1 RLL_3 RLL_2 RLR_5 RR_6
L_5 18 24 3 0 15 9 8 7 26 25 39 38 41 30
shows the distances from L-5, the position 5 units along the L branch, to the other positions.
I don't know that this will handle your largest cases, but it may be helpful for some. You also have problems with the storage requirements for your largest cases.

Equivalent bitget function in R

Is there a function in R that performs the same operation as bitget in MatLab/Octave:
bitget
From the bitget help page
Return the status of bit(s) n of unsigned integers in A the
lowest significant bit is n = 1.
bitget (100, 8:-1:1)
⇒ 0 1 1 0 0 1 0 0
so if you want to get the bit values for an integer in R, you can do
intToBits(100)[8:1]
# [1] 00 01 01 00 00 01 00 00
That technically returns a raw vector, so if you want just a numeric vector, do
as.numeric(intToBits(100)[8:1])
# [1] 0 1 1 0 0 1 0 0

Resources