Ada use of aspect Default_Component_Value on an array, with Pragma Pack - ada

I'm using gnat gcc 11.1, and wanted to know if someone can explain to me this behavior:
This is my code:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
pragma Ada2012;
procedure Hello is
type bool_arr is array (Integer range 1 .. Integer'Size) of Boolean with
Default_Component_Value => True;
pragma Pack (bool_arr);
test : bool_arr;
procedure P is
idx : String := "index ";
strg : Unbounded_String := To_Unbounded_String (idx);
strg2 : Unbounded_String;
begin
for I in test'range loop
Append (strg, I'Image);
Append (strg2, " " & test (I)'Image);
end loop;
Put_Line (To_string (strg));
Put_Line (To_String (strg2));
end P;
begin
P;
end Hello;
As Out I get this :
Hello, world! index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 TRUE TRUE FALSE FALSE FALSE
TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
Which is not what I wanted, by setting with Default_Component_Value => True.
If I comment out the Pragma Pack(bool_arr), then I get this:
Hello, world! index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Thanks for the help.

Looks like a bug in GNAT, I'm sorry to say.
If I explicitly use the Default_Component_Value, by declaring
test : bool_arr := (others => <>);
all components are True, as expected.

Related

Weird behavior of ggplot2::geom_text()

I want ggplot() to label observations with residuals higher than 1.5 times the standard error of the regression. The data are these (from Frank 1984):
d <- data.frame(x=c(43,32,32,30,26,25,23,22,22,21,20,20,19,19,19,18,18,17,17,16,16,16,15,13,12,12,10,10,9,7,6,3), y=c(63.0,54.3,51.0,39.0,52.0,55.0,41.2,47.7,44.5,43.0,46.8,42.4,56.5,55.0,53.0,55.0,45.0,50.7,37.5,61.0,48.1,30.0,51.5,40.6,51.3,50.3,62.4,39.3,43.2,40.4,37.7,27.7))
The model is simple:
m <- lm(y~x,data=d)
Then the ggplot() is:
ggplot(d, aes(x=x, y=y)) + geom_point() + geom_text(label=ifelse(abs(resid(m))>(1.5*sigma(m)),rownames(d),""),
nudge_x = 1, nudge_y = 0, check_overlap = T, color="blue")
giving this plot
which is missing a label for the observation in the top left corner (obs #27). Compare:
abs(resid(m))>(1.5*sigma(m))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
which indicates correctly that 27 satisfies the condition. Why is it not labelled?
Your labels in geom_text aren't inside an aes like they should be, although I'm unsure why you still got partially working labels without it.
I'm including some intermediate steps to work through this more slowly; for me, that helps with debugging and investigating how things work. Feel free to condense.
Assigning d and m are identical to the OP. With the extra steps:
library(tidyverse)
d2 <- d %>%
mutate(row = row_number()) %>%
mutate(abs_resid = abs(resid(m)), sig = sigma(m)) %>%
mutate(is_outlier = abs_resid > 1.5 * sig) %>%
mutate(label = ifelse(is_outlier, row, ""))
head(d2)
#> x y row abs_resid sig is_outlier label
#> 1 43 63.0 1 4.8398378 7.934235 FALSE
#> 2 32 54.3 2 0.9561793 7.934235 FALSE
#> 3 32 51.0 3 2.3438207 7.934235 FALSE
#> 4 30 39.0 4 13.4681223 7.934235 TRUE 4
#> 5 26 52.0 5 1.2832746 7.934235 FALSE
#> 6 25 55.0 6 4.7211239 7.934235 FALSE
ggplot(d2, aes(x = x, y = y)) +
geom_point() +
geom_text(aes(label = label), nudge_x = 1, color = "blue")
Created on 2018-07-31 by the reprex package (v0.2.0).

Creating new columns for consecutive TRUEs in R

I want to create new columns that puts TRUE if the number of consecutive wins are two, three etc. So I would like row 3, 6, 7, 8 to be TRUE in a new column called "twoconswins" and row 7, 8 to be true in a new column called "threeconswins" and so on. What is the best way for doing this?
> id date team teamscore opponent opponentscore home win
>9 9 2005-10-05 DET 5 STL 1 1 TRUE
>38 38 2005-10-09 DET 6 CAL 3 1 TRUE
>48 48 2005-10-10 DET 2 VAN 4 1 FALSE
>88 88 2005-10-17 DET 3 SJS 2 1 TRUE
>110 110 2005-10-21 DET 3 ANA 2 1 TRUE
>148 148 2005-10-27 DET 5 CHI 2 1 TRUE
>179 179 2005-11-01 DET 4 CHI 1 1 TRUE
>194 194 2005-11-03 DET 3 EDM 4 1 FALSE
>212 212 2005-11-05 DET 1 PHO 4 1 FALSE
I assumed row 1 should be the header, so that actually rows 2, 5, 6 and 7 should evaluate to TRUE for "twoconswins", and row 6 and 7 for "threeconswins".
You could do:
library(data.table)
df$twoconswins <- (df$win & shift(df$win, 1, NA)) == TRUE
df$threeconswins <- (df$win & shift(df$win, 1, NA) & shift(df$win, 2, NA)) == TRUE
I am thinking this could be more vectorized though, especially if 50 consecutive wins could be possible as well and you'd like to create columns for that as well.
If you like to automatically make the new columns as well, in case it happens sometimes 500 consecutive wins occur, you could do this:
df <- read.table(text =
'id date team teamscore opponent opponentscore home win
9 9 2005-10-05 DET 5 STL 1 1 TRUE
38 38 2005-10-09 DET 6 CAL 3 1 TRUE
48 48 2005-10-10 DET 2 VAN 4 1 FALSE
88 88 2005-10-17 DET 3 SJS 2 1 TRUE
110 110 2005-10-21 DET 3 ANA 2 1 TRUE
148 148 2005-10-27 DET 5 CHI 2 1 TRUE
179 179 2005-11-01 DET 4 CHI 1 1 TRUE
194 194 2005-11-03 DET 3 EDM 4 1 FALSE
212 212 2005-11-05 DET 1 PHO 4 1 FALSE',
header = TRUE)
rles <- data.frame(values = c(rle(df$win)$values),
lengths = c(rle(df$win)$lengths))
maxconwins <- max(rles[rles$values == TRUE,])
for(x in 1: maxconwins){
x <- seq(1,x)
partialstring <- paste("shift(df$win,", x, ",NA)", collapse = " & ")
fullstring <- paste0("df$nr", max(x), "conswins <- (", partialstring, ") == TRUE")
eval(parse(text = fullstring))
}
df[1:maxconwins,9:12][upper.tri(df[1:maxconwins,9:12], diag = TRUE)] <- NA
> df[,8:12]
win nr1conswins nr2conswins nr3conswins nr4conswins
9 TRUE NA NA NA NA
38 TRUE TRUE NA NA NA
48 FALSE TRUE TRUE NA NA
88 TRUE FALSE FALSE FALSE NA
110 TRUE TRUE FALSE FALSE FALSE
148 TRUE TRUE TRUE FALSE FALSE
179 TRUE TRUE TRUE TRUE FALSE
194 FALSE TRUE TRUE TRUE TRUE
212 FALSE FALSE FALSE FALSE FALSE
BTW, I only added the last line because (FALSE & TRUE & TRUE & NA) == TRUE evaluates to FALSE, while you probably like these cells to be NA. I just made sure of this here by setting the upper triagonal of the symmetric submatrix to NA afterwards. For readibility I manually added the column numbers 9 and 12 in here, but you could specify those with a function as well if you'd like.
UPDATE:
When using the Reduce() function as suggested by Frank, you could do this for loop instead of the above:
for(x in 1: maxconwins){
x <- seq(1,x)
eval(parse(text = paste0("df$nr", max(x), "conswins <- (Reduce(`&`, shift(df$win, 1:", max(x), "))) == TRUE")))
}

Match objects with same IDs except for one

I have a dataframe with the following format:
df <- data.frame(DS.ID=c(123,214,543,325,123,214),
P.ID=c("AAC","JGK","DIF","ADL","AAE","JGK"),
OP.ID=c("xxab","xxac","xxad","xxae","xxab","xxac"))
DS.ID P.ID OP.ID
1 123 AAC xxab
2 214 JGK xxac
3 543 DIF xxad
4 325 ADL xxae
5 123 AAE xxab
6 214 JGK xxac
I'm trying to find instances where DS.ID is equal to another DS.ID, OP.ID is equal to another OP.ID, but the P.ID's are not equal. I know how to do it with a loop but I'd rather do a quicker method so it returns the DS.ID's/information of those that do not match. Either with a logical vector in another column or through the DS.ID's.
Using duplicated:
df$match <- duplicated(df$DS.ID,df$OP.ID,fromLast=TRUE) |
duplicated(df$DS.ID,df$OP.ID)
# df
# DS.ID P.ID OP.ID match
# 1 123 AAC xxab TRUE
# 2 214 JGK xxac TRUE
# 3 543 DIF xxad FALSE
# 4 325 ADL xxab FALSE
# 5 123 AAE xxab TRUE
# 6 214 JGK xxac TRUE
EDIT after OP clarification
dupli.2 <- duplicated(df$DS.ID,df$OP.ID,fromLast=TRUE) | duplicated(df$DS.ID,df$OP.ID)
dupli.all <- duplicated(df) | duplicated(df,fromLast=TRUE)
as.logical(dupli.2 - dupli.all)
[1] TRUE FALSE FALSE FALSE TRUE FALSE

Take difference between two vectors

I have two vectors:
a<-1:100
b<-sample(1:100,80)
I would like to display those elements of a that are not included in b.
I have tried subset(a,a!==b) and a[a!==b] but these didn't work. What am I doing wrong?
Because of vectorization in R, using == wouldn't really work for your example. What you should use is setdiff or is.element (the latter of which is equivalent to %in%).
set.seed(1)
a<-1:100
b<-sample(1:100,80)
a[!is.element(a, b)]
# [1] 8 15 33 48 52 54 56 66 68 72 74 80 90 91 92 93 94 96 98 100
setdiff(a, b)
# [1] 8 15 33 48 52 54 56 66 68 72 74 80 90 91 92 93 94 96 98 100
If you look at how == works when you are comparing two vectors, it compares these one pair at a time, and recycles shorter vectors whenever necessary. In the first example of x == y, it seemed to work correctly, but look on to the second example, x == z. This basically checked to see whether x[1] == z[1], x[2] == z[2], and so on, so immediately, there was a misalignment of the sets.
x <- 1:10
y <- 1:5
z <- c(1, 3, 5, 7, 9)
x == y
# [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
x == z
# [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x %in% z
# [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
In R lingo, %in% is very common for identifying the common elements, and then negating that with !, but I find setdiff to be (at least more linguistically) logical.
A useful command is %in%. This will return TRUE or FALSE, for every element of a vector a, whether that element is in vector b. You can then negate this using !. So:
a[!(a %in% b)]

Identifying sequences of repeated numbers in R

I have a long time series where I need to identify and flag sequences of repeated values. Here's some data:
DATETIME WDIR
1 40360.04 22
2 40360.08 23
3 40360.12 126
4 40360.17 126
5 40360.21 126
6 40360.25 126
7 40360.29 25
8 40360.33 26
9 40360.38 132
10 40360.42 132
11 40360.46 132
12 40360.50 30
13 40360.54 132
14 40360.58 35
So if I need to note when a value is repeated three or more times, I have a sequence of four '126' and a sequence of three '132' that need to be flagged.
I'm very new to R. I expect I use cbind to create a new column in this array with a "T" in the corresponding rows, but how to populate the column correctly is a mystery. Any pointers please? Thanks a bunch.
As Ramnath says, you can use rle.
rle(dat$WDIR)
Run Length Encoding
lengths: int [1:9] 1 1 4 1 1 3 1 1 1
values : int [1:9] 22 23 126 25 26 132 30 132 35
rle returns an object with two components, lengths and values. We can use the lengths piece to build a new column that identifies which values are repeated more than three times.
tmp <- rle(dat$WDIR)
rep(tmp$lengths >= 3,times = tmp$lengths)
[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
This will be our new column.
newCol <- rep(tmp$lengths > 1,times = tmp$lengths)
cbind(dat,newCol)
DATETIME WDIR newCol
1 40360.04 22 FALSE
2 40360.08 23 FALSE
3 40360.12 126 TRUE
4 40360.17 126 TRUE
5 40360.21 126 TRUE
6 40360.25 126 TRUE
7 40360.29 25 FALSE
8 40360.33 26 FALSE
9 40360.38 132 TRUE
10 40360.42 132 TRUE
11 40360.46 132 TRUE
12 40360.50 30 FALSE
13 40360.54 132 FALSE
14 40360.58 35 FALSE
Use rle to do the job!! It is an amazing function that calculates the number of successive repetitions of numbers in a sequence. Here is some example code on how you can use rle to flag the miscreants in your data. This will return all rows from the data frame which have WDIR that are repeated 3 or more times successively.
runs = rle(mydf$WDIR)
subset(mydf, WDIR %in% runs$values[runs$lengths >= 3])
Two options for you.
Assuming the data is loaded:
dat <- read.table(textConnection("
DATETIME WDIR
40360.04 22
40360.08 23
40360.12 126
40360.17 126
40360.21 126
40360.25 126
40360.29 25
40360.33 26
40360.38 132
40360.42 132
40360.46 132
40360.50 30
40360.54 132
40360.58 35"), header=T)
Option 1: Sorting
dat <- dat[order(dat$WDIR),] # needed for the 'repeats' to be pasted into the correct rows in next step
dat$count <- rep(table(dat$WDIR),table(dat$WDIR))
dat$more4 <- ifelse(dat$count < 4, F, T)
dat <- dat[order(dat$DATETIME),] # sort back to original order
dat
Option 2: Oneliner
dat$more4 <- ifelse(dat$WDIR %in% names(which(table(dat$WDIR)>3)),T,F)
dat
I thought being a new user that option 1 might be an easier step by step approach although the rep(table(), table()) may not be intuitive initially.

Resources