How to switch between radians and degrees in SAS - math

just looking for an easy way to run trig functions in SAS without having to manually correct in each calculation. Below is what I am working with.
I am running this in SAS 9 probably, the SAS Studio Student Module but this is a general SAS question.
I have manually created a variable, 'rad' in the 'calc' data step to deal with this but it adds a step of complexity that I would like to avoid.
I am asking whether there is a system setting, alternate trig function or ... ? that would change the calculation from:
bh_x = cos(rad*bh_a)*bh_l ;
bh_x = cos(bh_a)*bh_l ;
so I don't have to manually convert my angle in degrees to radians for the trig function to work.
Thanks to anyone reading this and putting any mental effort to the solution!
data spec ;
b2h_a 8
b2h_l 8
b2h_l_e 8
bike $ 8
name $ 16
bike $
name $
datalines ;
srcn (0,0) 0 0 67 0 0 0 0 0 0
srcn c 41 658 71.5 27 40 25 120 100 13
srcn ne_27_n13 41 658 71.5 27 40 27 127 100 13
srcn ne_15_0 41 658 71.5 15 40 27 127 100 0
srcn ne_5_0 41 658 71.5 5 40 27 127 100 0
srcn ne_2_n9 41 658 71.5 2 40 27 127 100 9
srcn ne_5_10 41 658 71.5 5 40 27 127 100 -10
srcn ne_10_rf10 41 658 71.5 10 40 27 127 20 -10
srcn max 41 658 90 250 0 0 250 0 0
run ;
data calc ;
set spec ;
pi=constant('pi') ;
rad=pi/180 ;
bh_x = cos(rad*bh_a)*bh_l ;
bh_y = sin(rad*bh_a)*bh_l ;
sr_x = (cos(rad*ht_a)*(spcr+st_h/2))*-1 ;
sr_y = sin(rad*ht_a)*(spcr+st_h/2);
st_x = cos(rad*(90-ht_a+st_a))*st_l ;
st_y = sin(rad*(90-ht_a+st_a))*st_l ;
hb_x = cos(rad*(90-hb_a))*hb_r*-1 ;
hb_y = sin(rad*(90-hb_a))*hb_r ;
hd_x = bh_x + sr_x + st_x + hb_x ;
hd_y = bh_y + sr_y + st_y + hb_y ;
if hd_x=0 then do ;
b2h_a=0 ;
b2h_l=0 ;
end ;
else do ;
b2h_a = atan(hd_y/hd_x)/rad ;
b2h_l = hd_y/sin(b2h_a*rad) ;
end ;
b2h_l_e = b2h_l/25.4 ;
drop pi rad ;
b2h_a 5.
b2h_l 5.
b2h_l_e 5.
bh_a 5.
bh_l 5.
ht_a 5.
spcr 5.
st_h 5.
st_a 5.
st_l 5.
hb_r 5.
hb_a 5.
bh_x 5.
bh_y 5.
sr_x 5.
sr_y 5.
st_x 5.
st_y 5.
hb_x 5.
hb_y 5.
hd_x 5.
hd_y 5.
b2h_a 5.
b2h_l 5.
b2h_l_e 5.1
run ;

There are no trig functions in SAS that accept DEGREE or GRADIAN arguments. You always need to convert from your data's angular measurement system to RADIAN.
You can write a macro to perform the conversion. Example:
%macro cosD(theta);
%* theta is angle in degrees;
%* emit data step source code that performs conversion from degrees to radians;
In use:
data calc ;
set spec ;
bh_x = %cosD(bh_a) * bh_l ;
You could convert the angular data to radians during the step where input occurs and then not have to worry about it again.


Add values to a new column based on math calculations on three columns R

I have a data frame as the structure below:
geneA geneB start end position
1 Ypc1 Malat1 34 59 36
2 Ypc1 Malat1 35 60 26
3 Ypc1 Malat1 34 59 60
I want to add a new column called as distance based on conditional math operations on the three columns which are start, end and position. I used the if statements as below but I constantly get 0 for the distance column. After if statements my output looks like this:
if (test$position < test$start) {
test$distance <- test$start - test$position
} else if (test$position >= test$start & test$position <= test$end) {
test$distance <- 0
} else if (test$position > test$end) {
test$distance <- test$end - test$position
geneA geneB start end position distance
1 Ypc1 Malat1 34 59 36 0
2 Ypc1 Malat1 35 60 26 0
3 Ypc1 Malat1 34 59 60 0
The desired output should be:
geneA geneB start end position distance
1 Ypc1 Malat1 34 59 36 0
2 Ypc1 Malat1 35 60 26 9
3 Ypc1 Malat1 34 59 60 -1
How can I do this?
Thank you in advance.
When testing condition along a vector, you should use ifelse.
I corrected your code below :
test <- data.frame(geneA = c("Ypc1"), geneB = c("Malat1"),
start = c(34, 35, 34),
end = c(59, 60, 59),
position = c(36, 26, 60))
test$distance <- ifelse(
test$position < test$start,
test$start - test$position,
test$position > test$end,
test$end - test$position,
# geneA geneB start end position distance
# 1 Ypc1 Malat1 34 59 36 0
# 2 Ypc1 Malat1 35 60 26 9
# 3 Ypc1 Malat1 34 59 60 -1
Your code won't work because the replace the full column distance on the first evaluation, which return 0.
However this is not very understable, I'll look for a shorter way to compute this !
I also tried it using awk:
awk -F, '{if($3<$1);
print $1,$2,$3,$1-$3;
else if($3>$2);
print $1,$2,$3,$2-$3;
else print $1,$2,$3,0}'

How to create a window of arbitrary size in Kusto?

Using prev() function I can access previous rows individually.
| sort by Time asc
| extend mx = max_of(prev(Value, 1), prev(Value, 2), prev(Value, 3))
How to define a window to aggregate over in more generic way? Say I need maximum of 100 values in previous rows. How to write a query that does not require repeating prev() 100 times?
Can be achieved by combining scan and series_stats_dynamic().
scan is used to create an array of last x values, per record.
series_stats_dynamic() is used to get the max value of each array.
// Data sample generation. Not part of the solution
let mytable = materialize(range i from 1 to 15 step 1 | extend Time = ago(1d*rand()), Value = toint(rand(100)));
// Solution starts here
let window_size = 3; // >1
| order by Time asc
| scan declare (last_x_vals:dynamic)
step s1 : true => last_x_vals = array_concat(array_slice(s1.last_x_vals, -window_size + 1, -1), pack_array(Value));
| extend toint(series_stats_dynamic(last_x_vals).max)

Updating a vector using a while loop in R

I got the following dataframe called nodes_df:
x y node_demand
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
10 59 72 6
. . . .
. . . .
. . . .
. . . .
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
50 1 58 2
I have to split this dataframe between hubs and clients.
hubs <- nodes_df[keep <- sample(1:total_nodes, requested_hubs, replace = FALSE),]
client_nodes <- nodes_df[-keep, ]
I need to randomly select 1 row at a time from clients_nodes and calculate the total node_demand, I need to keep adding rows until random_clients$node_demand exceedes 120.
random_clients <- client_nodes[sample(nrow(client_nodes), size = 1, replace = FALSE),]
I created the following variables and while loop
node_demand <- c(0)
cumulative_demand <- cumsum(node_demand)
client_nodes <- nodes_df[-keep, ]
last_node <- cumsum(cumulative_demand) >= max_supply_capacity
condition = TRUE
random_clients <- client_nodes[sample(nrow(client_nodes), size = 1, replace = FALSE),]
node_demand <- c(node_demand,random_clients$node_demand)
cumulative_demand <- cumsum(node_demand)
if(cumulative_demand <= max_supply_capacity){
condition == FALSE
The loop doesn't stop and I get the following return value:
[1] 0 14 20 26 27 35 49 50 68 79 97 100 101 104 109 118
[17] 119 137 150 164 178 185 188 191 208 209 219 222 227 246 252 272 (it carries on and on)
I am not sure why the loop doesn't stop despite the condition cumulative_demand <= max_supply_capacity being met.
Anybody could show me how to fix it?
I managed to fix it :).
I had to use ifelse() so R could evaluate the condition of a vector. The normal if() statement wouldn't work in this case
random_clients <- client_nodes[sample(nrow(client_nodes), size = 1, replace = FALSE),]
node_demand <- c(node_demand,random_clients$node_demand)
cumulative_demand <- cumsum(node_demand)
last_node <- (cumulative_demand <= max_supply_capacity)
ifelse(last_node == FALSE,break,next)
I had to use and ifelse() instead of an if() statement as shown in the problem description.

Performence for calculating the distance between two positions on a tree?

Here is a tree. The first column is an identifier for the branch, where 0 is the trunk, L is the first branch on the left and R is the first branch on the right. LL is the branch on the extreme left after the second bifurcation, etc.. the variable length contains the length of each branch.
> tree
branch length
1 0 20
2 L 12
3 LL 19
4 R 19
5 RL 12
6 RLL 10
7 RLR 12
8 RR 17
tree = data.frame(branch = c("0","L", "LL", "R", "RL", "RLL", "RLR", "RR"), length=c(20,12,19,19,12,10,12,17))
tree$branch = as.character(tree$branch)
and here is a drawing of this tree
Here are two positions on this tree
posA = tree[4,]; posA$length = 12
posB = tree[6,]; posB$length = 3
The positions are given by the branch ID and the distance (variable length) to the origin of the branch (more info in edits).
I wrote the following messy distance function to calculate the shortest distance along the branches between any two points on the tree. The shortest distance along the branches can be understood as the minimal distance an ant would need to walk along the branches to reach one position from the other position.
distance = function(tree, pos1, pos2){
if (identical(pos1$branch, pos2$branch)){Dist=pos1$length-pos2$length;return(Dist)}
pos1path = strsplit(pos1$branch, "")[[1]]
if (pos1path[1]!="0") {pos1path = c("0", pos1path)}
pos2path = strsplit(pos2$branch, "")[[1]]
if (pos2path[1]!="0") {pos2path = c("0", pos2path)}
loop = 1:min(length(pos1path), length(pos2path))
loop = loop[-which(loop == 1)]
CommonTrace="included"; for (i in loop) {
if (pos1path[i] != pos2path[i]) {
CommonTrace = i-1; break
CommonTrace = min(length(pos1path), length(pos2path))
if (length(pos1path) > length(pos2path)) {
longerpos = pos1; shorterpos = pos2; longerpospath = pos1path
} else {
longerpos = pos2; shorterpos = pos1; longerpospath = pos2path
distToNode = 0
if ((CommonTrace+1) != length(longerpospath)){
for (i in (CommonTrace+1):(length(longerpospath)-1)){
distToNode = distToNode + tree$length[tree$branch == paste0(longerpospath[2:i], collapse='')]
Dist = distToNode + longerpos$length + (tree[tree$branch == shorterpos$branch,]$length-shorterpos$length)
if (identical(shorterpos, pos1)){Dist=-Dist}
} elseĀ { # if they are sisterbranch
if((CommonTrace+1) != length(pos1path)){
for (i in (CommonTrace+1):(length(pos1path)-1)){
Dist = Dist + tree$length[tree$branch == paste0(pos1path[2:i], collapse='')]
if((CommonTrace+1) != length(pos2path)){
for (i in (CommonTrace+1):(length(pos2path)-1)){
Dist = Dist + tree$length[tree$branch == paste(pos2path[2:i], collapse='')]
Dist = Dist + pos1$length + pos2$length
I think the algorithm works fine but it is not very efficient. Note the sign of the distance that is important. This sign only makes sense when the two positions are not found on "sister branches". That is the sign makes sense only if one of the two positions is found in the way between the roots and the other position.
distance(tree, posA, posB) # -22
I then just loop through all positions of interest like that:
allpositions=rbind(tree, tree)
allpositions$length = c(1,5,8,2,2,3,5,6,7,8,2,3,1,2,5,6)
mat = matrix(-1, ncol=nrow(allpositions), nrow=nrow(allpositions))
for (i in 1:nrow(allpositions)){
for (j in 1:nrow(allpositions)){
posA = allpositions[i,]
posB = allpositions[j,]
mat[i,j] = distance(tree, posA, posB)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# 1 0 -24 -39 -21 -40 -53 -55 -44 -6 -27 -33 -22 -39 -52 -55 -44
# 2 24 0 -15 7 26 39 41 30 18 -3 -9 8 25 38 41 30
# 3 39 15 0 22 41 54 56 45 33 12 6 23 40 53 56 45
# 4 21 7 22 0 -19 -32 -34 -23 15 10 16 -1 -18 -31 -34 -23
# 5 40 26 41 19 0 -13 -15 8 34 29 35 18 1 -12 -15 8
# 6 53 39 54 32 13 0 8 21 47 42 48 31 14 1 8 21
# 7 55 41 56 34 15 8 0 23 49 44 50 33 16 7 0 23
# 8 44 30 45 23 8 21 23 0 38 33 39 22 7 20 23 0
# 9 6 -18 -33 -15 -34 -47 -49 -38 0 -21 -27 -16 -33 -46 -49 -38
# 10 27 3 -12 10 29 42 44 33 21 0 -6 11 28 41 44 33
# 11 33 9 -6 16 35 48 50 39 27 6 0 17 34 47 50 39
# 12 22 8 23 1 -18 -31 -33 -22 16 11 17 0 -17 -30 -33 -22
# 13 39 25 40 18 -1 -14 -16 7 33 28 34 17 0 -13 -16 7
# 14 52 38 53 31 12 -1 7 20 46 41 47 30 13 0 7 20
# 15 55 41 56 34 15 8 0 23 49 44 50 33 16 7 0 23
# 16 44 30 45 23 8 21 23 0 38 33 39 22 7 20 23 0
As an example, let's consider the first and the third positions in the object allpositions. The distance between them is 39 (and -39) because an ant would need to walk 19 units on branch 0 and then walk 12 units on branch L and finally the ant would need to walk 8 units on branch LL. 19 + 12 + 8 = 39
The issue is that I have about 20 very big trees with about 50000 positions and I would like to calculate the distance between any two positions. There are therefore 20 * 50000^2 distances to compute. It takes forever! Can you help me to improve my code?
Please let me know if anything is still unclear
tree is a description of a tree. The tree has branches of a certain length. The name of the branches (variable: branch) gives indication about the relationship between the branches. The branch RL is a "parent branch" of the two branches RLL and RLR, where R and L stand for right and left.
allpositions is an data.frame, where each line represents one independent position on the tree. You can think of the position of a squirrel. The position is defined by two information. 1) The branch (variable: branch) on which the squirrel is standing and the the distance between the beginning of the branch and the position of the squirrel (variable: length).
Three examples
Consider a first squirrel that is at position (variable: length) 8 on the branch RL (which length is 12) and a second squirrel that is at position (variable: length) 2 on the branch RLL or RLR. The distance between the two squirrels is 12 - 8 + 2 = 6 (or -6).
Consider a first squirrel that is at position (variable: length) 8 on the branch RL and a second squirrel that is at position (variable: length) 2 on the branch RR. The distance between the two squirrels is 8 + 2 = 10 (or -10).
Consider a first squirrel that is at position (variable: length) 8 on the branch R (which length is 19) and a second squirrel that is at position (variable: length) 2 on the branch RLL. Knowing the that branch RL has a length of 12, the distance between the two squirrels is 19 - 8 + 12 + 2 = 25 (or -25).
The code below uses the igraph package to compute the distances between positions in tree and seems noticeably faster than the code you posted in your question. The approach is to create graph vertices at branch intersections and at positions along tree branches at the positions specified in allpositions. Graph edges are the branch segments between these vertices. It uses igraph to build a graph for the tree and allpositions and then finds the distances between the vertices corresponding to allposition data.
t.graph <- function(tree, positions) {
# Assign vertex name to tree branch intersections
n_label <- nchar(tree$branch)
tree$high_vert <- tree$branch
tree$low_vert <- tree$branch
tree$brnch_type <- "tree"
for( i in 1:nrow(tree) ) {
tree$low_vert[i] <- if(n_label[i] > 1) substr(tree$branch[i], 1, n_label[i]-1)
else { if(tree$branch[i] %in% c("R","L")) "0"
else "root" }
# combine position data with tree data
positions$brnch_type <- "position"
temp <- merge(positions, tree, by = "branch")
positions <- temp[, c("branch","length.x","high_vert","low_vert","brnch_type.x")]
positions$high_vert <- paste(positions$branch, positions$length.x, sep="_")
colnames(positions) <- c("branch","length","high_vert","low_vert","brnch_type")
tree <- rbind(tree, positions)
# use positions to segment tree branches
tree_brnch <- split(tree, tree$branch)
tree <- data.frame( branch=NA_character_, length = NA_real_, high_vert = NA_character_,
low_vert = NA_character_, brnch_type =NA_character_, seg_len= NA_real_)
for( ib in 1: length(tree_brnch)) {
brnch_seg <- tree_brnch[[ib]][order(tree_brnch[[ib]]$length, decreasing=TRUE), ]
n_seg <- nrow(brnch_seg)
brnch_seg$seg_len <- brnch_seg$length
for( is in 1:(n_seg-1) ) {
brnch_seg$seg_len[is] <- brnch_seg$length[is] - brnch_seg$length[is+1]
brnch_seg$low_vert[is] <- brnch_seg$high_vert[is+1]
tree <- rbind(tree, brnch_seg)
tree <- tree[-1,]
# Create graph of tree and positions
tree_graph <-[,c("low_vert","high_vert")])
E(tree_graph)$label <- tree$high_vert
E(tree_graph)$brnch_type <- tree$brnch_type
E(tree_graph)$weight <- tree$seg_len
# calculate shortest distances between position vertices
position_verts <- V(tree_graph)[grep("_", V(tree_graph)$name)]
vert_dist <- shortest.paths(tree_graph, v=position_verts, to=position_verts, mode="all")
return(dist_mat= vert_dist )
I've benchmarked igraph code ( the t.graph function) against the code posted in your question by making a function named Remi for your code over allposition data using your distance function. Sample trees were created as extensions of your tree and allpositions data for trees of 64, 256, and 2048 branches and allpositions equal to twice these sizes. Comparisons of execution times are shown below. Notice that times are in milliseconds.
microbenchmark(matR16 <- Remi(tree, allpositions), matG16 <- t.graph(tree, allpositions),
matR256 <- Remi(tree256, allpositions256), matG256 <- t.graph(tree256, allpositions256), times=2)
Unit: milliseconds
expr min lq mean median uq max neval
matR8 <- Remi(tree, allpositions) 58.82173 58.82173 59.92444 59.92444 61.02714 61.02714 2
matG8 <- t.graph(tree, allpositions) 11.82064 11.82064 13.15275 13.15275 14.48486 14.48486 2
matR256 <- Remi(tree256, allpositions256) 114795.50865 114795.50865 114838.99490 114838.99490 114882.48114 114882.48114 2
matG256 <- t.graph(tree256, allpositions256) 379.54559 379.54559 379.76673 379.76673 379.98787 379.98787 2
Compared to the code you posted, the igraph results are only about 5 times faster for the 8 branch case but are over 300 times faster for 256 branches so igraph seems to scale better with size. I've also benchmarked the igraph code for the 2048 branch case with the following results. Again times are in milliseconds.
microbenchmark(matG8 <- t.graph(tree, allpositions), matG64 <- t.graph(tree64, allpositions64),
matG256 <- t.graph(tree256, allpositions256), matG2k <- t.graph(tree2k, allpositions2k), times=2)
Unit: milliseconds
expr min lq mean median uq max neval
matG8 <- t.graph(tree, allpositions) 11.78072 11.78072 12.00599 12.00599 12.23126 12.23126 2
matG64 <- t.graph(tree64, allpositions64) 73.29006 73.29006 73.49409 73.49409 73.69812 73.69812 2
matG256 <- t.graph(tree256, allpositions256) 377.21756 377.21756 410.01268 410.01268 442.80780 442.80780 2
matG2k <- t.graph(tree2k, allpositions2k) 11311.05758 11311.05758 11362.93701 11362.93701 11414.81645 11414.81645 2
so the distance matrix for about 4000 positions is calculated in less than 12 seconds.
t.graph returns the distance matrix where the rows and columns of the matrix are labeled by branch names - position on the branch so for example
0_7 0_1 L_8 L_5 LL_8 LL_2 R_3 R_2 RL_2 RL_1 RLL_3 RLL_2 RLR_5 RR_6
L_5 18 24 3 0 15 9 8 7 26 25 39 38 41 30
shows the distances from L-5, the position 5 units along the L branch, to the other positions.
I don't know that this will handle your largest cases, but it may be helpful for some. You also have problems with the storage requirements for your largest cases.

Does Gray code exist for other bases than two?

Just a matter of curiosity, is the Gray code defined for bases other than base two?
I tried to count in base 3, writing consecutive values paying attention to change only one trit at a time. I've been able to enumerate all the values up to 26 (3**3-1) and it seems to work.
000 122 200
001 121 201
002 120 202
012 110 212
011 111 211
010 112 210
020 102 220
021 101 221
022 100 222
The only issue I can see, is that all three trits change when looping back to zero. But this is only true for odd bases. When using even bases looping back to zero would only change a single digit, as in binary.
I even guess it can be extended to other bases, even decimal. This could lead to another ordering when counting in base ten ... :-)
0 1 2 3 4 5 6 7 8 9 19 18 17 16 15 14 13 12 11 10
20 21 22 23 24 25 26 27 28 29 39 38 37 36 35 34 33 32 31 30
Now the question, has anyone ever heard of it? Is there an application for it? Or it is just mathematical frenzy?
Yes. Have a look at the Gray code article at wikipedia. It has a section on n-ary Gray Code.
There are many specialized types of Gray codes other than the binary-reflected Gray code. One such type of Gray code is the n-ary Gray code, also known as a non-Boolean Gray code. As the name implies, this type of Gray code uses non-Boolean values in its encodings.
Just for completeness (as aioobe already gave the right answer), here's a C++ program that lists all the 168 2-digit gray codes for base 3 that start with 00 and marks the 96 cyclic ones. Using the algorithm from Wikipedia, you can construct longer Gray codes easily for even bases. For uneven bases, you can change the program to generate according Gray codes.
The first cyclic 2-digit gray code found with this program is this one:
00 01 02 12 10 11 21 22 20
After changing the program, the first cyclic 3-digit gray found is this:
000 001 002 012 010 011 021 020 022 122 102 100 101 111
110 112 212 202 222 220 120 121 221 201 211 210 200
#include <stdio.h>
#include <stdlib.h>
// Highest number using two trits
#define MAXN 9
int gray_code_count, cyclic_count;
bool changes_one_trit(int code1, int code2) {
int trits_changed = 0;
if ((code1 / 3) != (code2 / 3)) trits_changed++;
if ((code1 % 3) != (code2 % 3)) trits_changed++;
return (trits_changed == 1);
int generate_gray_code(int* code, int depth) {
bool already_used;
if (depth == MAXN) {
for (int i = 0; i < MAXN; i++) {
printf("%i%i ", code[i]/3, code[i]%3);
// check if cyclic
if (changes_one_trit(code[MAXN-1], 0)) {
// Iterate through the codes that only change one trit
for (int i = 0; i < MAXN; i++) {
// Check if it was used already
already_used = false;
for (int j = 0; j < depth; j++) {
if (code[j] == i) already_used = true;
if (already_used) continue;
if (changes_one_trit(code[depth-1], i)) {
code[depth] = i;
generate_gray_code(code, depth + 1);
int main() {
int* code = (int*)malloc(MAXN * sizeof(int));
code[0] = 0;
gray_code_count = 0;
generate_gray_code(code, 1);
printf("%i gray codes found, %i of them are cyclic\n", gray_code_count, cyclic_count);
