I am trying to conditionally replace some fields in a dataframe; however, my code is finding about 25% of the actual instances present. I've searched through the other conditional search questions, but didn't find anything matching my problem -- I apologize in advance if I missed one.
Specifically, I am trying to replace all numbers 1 to 9 in dta$day, with a to i.
Here are the first 100 items in that vector: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9
When I conditionally search for values 1 to 9, using:
dta$day == c("1","2","3","4","5","6","7","8","9")
It states that only the first and last set in that grouping match my condition as below (I've bolded ~what should be TRUE for your reference):
[1] **TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **FALSE**
[33] **FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **FALSE FALSE**
[65] **FALSE FALSE FALSE FALSE FALSE FALSE FALSE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **TRUE TRUE TRUE TRUE TRUE TRUE**
[97] **TRUE TRUE TRUE**
The problem must be in that first step, but to show you the result, only the first and last set in that first 100 in my vector are appropriately replaced after applying this code:
dta[dta$day == c("1","2","3","4","5","6","7","8","9"),1
] <- c("a", "b", "c", "d", "e", "f", "g", "h", "i")
[1] **"a" "b" "c" "d" "e" "f" "g" "h" "i"** "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
[20] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" **"1" "2" "3" "4" "5" "6" "7"**
[39] "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
[58] "27" "28" **"1" "2" "3" "4" "5" "6" "7" "8" "9" "10"** "11" "12" "13" "14" "15" "16" "17"
[77] "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" **"a" "b" "c" "d" "e"
[96] "f" "g" "h" "i"**
If useful, here is the initial state of that vector:
is.numeric(dta$day)
[1] TRUE
summary(dta$day)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 8.00 16.00 15.73 23.00 31.00
I am reproducing the data frame here:
day <- c(1:31,1:28,1:31,1:30)
month <- c(rep_len(1,31),rep_len(2,28),rep_len(3,31),rep_len(4,30))
temp <- rnorm(length(month),10,10)
dta=as.data.frame(cbind(day,month,temp))
And actually, although I am able to reproduce the problem with this toy example, I get a warning that I do not get with my actual data (not reproduced here because it is very large): "longer object length is not a multiple of shorter object length".
I would love some help, and if I haven't provided something or haven't done so in the format needed, please kindly let me know!
It looks like you're checking equivalence to a vector, rather than it's components. Try %in% instead, like this:
dta[dta$day %in% c("1","2","3","4","5","6","7","8","9"), ]
Use %in% rather than == and then index your data frame/vector as below to replace 1:9 with a:i as wanted:
y <- c(1:9)
dta$day[dta$day %in% y] <- letters[1:length(y)]
Read more about the different behaviours of these operators here:
Difference between the == and %in% operators in R
And
Difference between `%in%` and `==`
Related
am really new at R and I can't find the way of subsetting matrix rows given a list of indexes.
I have a dataframe called 'demo' with 855 rows and 3 columns that looks like this:
## Subject AGE DX
## 1 011_S_0002_bl 74.3 0
## 2 011_S_0003_bl 81.3 1
## 3 011_S_0005_bl 73.7 0
## 4 022_S_0007_bl 75.4 1
## 5 011_S_0008_bl 84.5 0
## 6 011_S_0010_bl 73.9 1
From this, I want to extract the indexes for all the rows that match DX == 1. So I do:
rownames(demo[demo$DX == 1,])
Which returns:
## [1] "2" "4" "6" "14" "20" "31" "33" "34" "36" "39" "40" "41"
## [13] "46" "47" "53" "54" "55" "58" "64" "67" "69" "70" "72" "81"
## [25] "84" "87" "88" "92" "96" "98" "100" "101" "106" "108" "109" "112"
....
Now I have a matrix called T_hat with 855 rows and 1 column that looks like this:
## [,1]
## [1,] 5.812925
## [2,] 10.477721
## [3,] 1.519726
## [4,] -0.221328
## [5,] 1.784920
What I want is to use the numbers in 'al' to subset the values with the corresponding numbers in the indexes and to get something like this:
## [,1]
## [2,] 10.477721
## [4,] -0.221328
...and so on.
I've tried all these options:
T_hat_a <- T_hat[rownames(demo[demo$DX == 1,]),1]
T_hat_b <- T_hat[is.numeric(rownames(demo[demo$DX == 1,])),1]
T_hat_c <- T_hat[rownames(T_hat) %in% rownames(demo[demo$DX == 1,]),1]
T_hat_d <- T_hat[rownames(T_hat) %in% is.numeric(rownames(demo[demo$DX == 1,])),1]
But none returns what I expect.
T_hat_a = ERROR "no 'dimnames' attributes for array
T_hat_b = numeric(0)
T_hat_c = numeric(0)
T_hat_d = numeric(0)
I've also tried to convert my matrix to a df, but only the T_hat_a option returns a result, but it is not at all as desired, since it returns different values...
I have a data set that is in a .Rdata format - something I haven't worked with before. I would like to export the data to a csv or related file for use in Python. I've used "write.csv", "write.table", and a few others and while they all seem like they are writing to the file, when I open it it's completely blank. I've also tried converting the data to a dataframe before exporting with no luck so far.
After importing the file in R, the data is labeled as a Large array (1499904 elements, 11.5 Mb) with the following attributes:
> attributes(data.station)
$`dim`
[1] 12 31 288 7 2
$dimnames
$dimnames[[1]]
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
$dimnames[[2]]
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21"
[22] "22" "23" "24" "25" "26" "27" "28" "29" "30" "31"
$dimnames[[3]]
[1] "" "00:05:00" "00:10:00" "00:15:00" "00:20:00" "00:25:00" "00:30:00" "00:35:00" "00:40:00"
[10] "00:45:00" "00:50:00" "00:55:00" "01:00:00" "01:05:00" "01:10:00" "01:15:00" "01:20:00" "01:25:00"
[19] "01:30:00" "01:35:00" "01:40:00" "01:45:00" "01:50:00" "01:55:00" "02:00:00" "02:05:00" "02:10:00"
[28] "02:15:00" "02:20:00" "02:25:00" "02:30:00" "02:35:00" "02:40:00" "02:45:00" "02:50:00" "02:55:00"
[37] "03:00:00" "03:05:00" "03:10:00" "03:15:00" "03:20:00" "03:25:00" "03:30:00" "03:35:00" "03:40:00"
[46] "03:45:00" "03:50:00" "03:55:00" "04:00:00" "04:05:00" "04:10:00" "04:15:00" "04:20:00" "04:25:00"
[55] "04:30:00" "04:35:00" "04:40:00" "04:45:00" "04:50:00" "04:55:00" "05:00:00" "05:05:00" "05:10:00"
[64] "05:15:00" "05:20:00" "05:25:00" "05:30:00" "05:35:00" "05:40:00" "05:45:00" "05:50:00" "05:55:00"
[73] "06:00:00" "06:05:00" "06:10:00" "06:15:00" "06:20:00" "06:25:00" "06:30:00" "06:35:00" "06:40:00"
[82] "06:45:00" "06:50:00" "06:55:00" "07:00:00" "07:05:00" "07:10:00" "07:15:00" "07:20:00" "07:25:00"
[91] "07:30:00" "07:35:00" "07:40:00" "07:45:00" "07:50:00" "07:55:00" "08:00:00" "08:05:00" "08:10:00"
[100] "08:15:00" "08:20:00" "08:25:00" "08:30:00" "08:35:00" "08:40:00" "08:45:00" "08:50:00" "08:55:00"
[109] "09:00:00" "09:05:00" "09:10:00" "09:15:00" "09:20:00" "09:25:00" "09:30:00" "09:35:00" "09:40:00"
[118] "09:45:00" "09:50:00" "09:55:00" "10:00:00" "10:05:00" "10:10:00" "10:15:00" "10:20:00" "10:25:00"
[127] "10:30:00" "10:35:00" "10:40:00" "10:45:00" "10:50:00" "10:55:00" "11:00:00" "11:05:00" "11:10:00"
[136] "11:15:00" "11:20:00" "11:25:00" "11:30:00" "11:35:00" "11:40:00" "11:45:00" "11:50:00" "11:55:00"
[145] "12:00:00" "12:05:00" "12:10:00" "12:15:00" "12:20:00" "12:25:00" "12:30:00" "12:35:00" "12:40:00"
[154] "12:45:00" "12:50:00" "12:55:00" "13:00:00" "13:05:00" "13:10:00" "13:15:00" "13:20:00" "13:25:00"
[163] "13:30:00" "13:35:00" "13:40:00" "13:45:00" "13:50:00" "13:55:00" "14:00:00" "14:05:00" "14:10:00"
[172] "14:15:00" "14:20:00" "14:25:00" "14:30:00" "14:35:00" "14:40:00" "14:45:00" "14:50:00" "14:55:00"
[181] "15:00:00" "15:05:00" "15:10:00" "15:15:00" "15:20:00" "15:25:00" "15:30:00" "15:35:00" "15:40:00"
[190] "15:45:00" "15:50:00" "15:55:00" "16:00:00" "16:05:00" "16:10:00" "16:15:00" "16:20:00" "16:25:00"
[199] "16:30:00" "16:35:00" "16:40:00" "16:45:00" "16:50:00" "16:55:00" "17:00:00" "17:05:00" "17:10:00"
[208] "17:15:00" "17:20:00" "17:25:00" "17:30:00" "17:35:00" "17:40:00" "17:45:00" "17:50:00" "17:55:00"
[217] "18:00:00" "18:05:00" "18:10:00" "18:15:00" "18:20:00" "18:25:00" "18:30:00" "18:35:00" "18:40:00"
[226] "18:45:00" "18:50:00" "18:55:00" "19:00:00" "19:05:00" "19:10:00" "19:15:00" "19:20:00" "19:25:00"
[235] "19:30:00" "19:35:00" "19:40:00" "19:45:00" "19:50:00" "19:55:00" "20:00:00" "20:05:00" "20:10:00"
[244] "20:15:00" "20:20:00" "20:25:00" "20:30:00" "20:35:00" "20:40:00" "20:45:00" "20:50:00" "20:55:00"
[253] "21:00:00" "21:05:00" "21:10:00" "21:15:00" "21:20:00" "21:25:00" "21:30:00" "21:35:00" "21:40:00"
[262] "21:45:00" "21:50:00" "21:55:00" "22:00:00" "22:05:00" "22:10:00" "22:15:00" "22:20:00" "22:25:00"
[271] "22:30:00" "22:35:00" "22:40:00" "22:45:00" "22:50:00" "22:55:00" "23:00:00" "23:05:00" "23:10:00"
[280] "23:15:00" "23:20:00" "23:25:00" "23:30:00" "23:35:00" "23:40:00" "23:45:00" "23:50:00" "23:55:00"
$dimnames[[4]]
[1] "tempinf" "tempf" "humidityin" "humidity" "solarradiation" "hourlyrainin"
[7] "windspeedmph"
$dimnames[[5]]
[1] "2020" "2021"
Any advice on how to handle this? Thank you!
You have to flatten the array to write it. First we create a reproducible example of your data:
x <- 1:(2 * 3 * 4 * 5 * 6)
dnames <- list(LETTERS[1:2], LETTERS[3:5], LETTERS[6:9], LETTERS[10:14], LETTERS[15:20])
y <- array(x, dim=c(2, 3, 4, 5, 6), dimnames=dnames)
str(y)
# int [1:2, 1:3, 1:4, 1:5, 1:6] 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, "dimnames")=List of 5
# ..$ : chr [1:2] "A" "B"
# ..$ : chr [1:3] "C" "D" "E"
# ..$ : chr [1:4] "F" "G" "H" "I"
# ..$ : chr [1:5] "J" "K" "L" "M" ...
# ..$ : chr [1:6] "O" "P" "Q" "R" ...
attributes(y)
# $dim
# [1] 2 3 4 5 6
#
# $dimnames
# $dimnames[[1]]
# [1] "A" "B"
#
# $dimnames[[2]]
# [1] "C" "D" "E"
#
# $dimnames[[3]]
# [1] "F" "G" "H" "I"
#
# $dimnames[[4]]
# [1] "J" "K" "L" "M" "N"
#
# $dimnames[[5]]
# [1] "O" "P" "Q" "R" "S" "T"
Now we flatten the array and write it to a file:
z <- as.data.frame.table(y)
str(z)
# 'data.frame': 720 obs. of 6 variables:
# $ Var1: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 ...
# $ Var2: Factor w/ 3 levels "C","D","E": 1 1 2 2 3 3 1 1 2 2 ...
# $ Var3: Factor w/ 4 levels "F","G","H","I": 1 1 1 1 1 1 2 2 2 2 ...
# $ Var4: Factor w/ 5 levels "J","K","L","M",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ Var5: Factor w/ 6 levels "O","P","Q","R",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ Freq: int 1 2 3 4 5 6 7 8 9 10 ...
write.csv(z, file="dfz.csv", row.names=FALSE)
Finally we read the file and convert it back to an array:
a <- read.csv("dfz.csv", as.is=FALSE)
b <- xtabs(Freq~., a)
class(b) <- "array"
attr(b, "call") <- NULL
names(dimnames(b)) <- NULL
str(b)
# int [1:2, 1:3, 1:4, 1:5, 1:6] 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, "dimnames")=List of 5
# ..$ : chr [1:2] "A" "B"
# ..$ : chr [1:3] "C" "D" "E"
# ..$ : chr [1:4] "F" "G" "H" "I"
# ..$ : chr [1:5] "J" "K" "L" "M" ...
# ..$ : chr [1:6] "O" "P" "Q" "R" ...
I am working with the following data:
District <- c("AR01", "AZ03", "AZ05", "AZ08", "CA01", "CA05", "CA11", "CA16", "CA18", "CA21")
I want to split the string after the second character and put them into two columns.
So that the data looks like this:
state district
AR 01
AZ 03
AZ 05
AZ 08
CA 01
CA 05
CA 11
CA 16
CA 18
CA 21
Is there a simple code to get this done? Thanks so much for you help
You can use substr if you always want to split by the second character.
District <- c("AR01", "AZ03", "AZ05", "AZ08", "CA01", "CA05", "CA11", "CA16", "CA18", "CA21")
#split district starting at the first and ending at the second
state <- substr(District,1,2)
#split district starting at the 3rd and ending at the 4th
district <- substr(District,3,4)
#put in data frame if needed.
st_dt <- data.frame(state = state, district = district, stringsAsFactors = FALSE)
you could use strcapture from base R:
strcapture("(\\w{2})(\\w{2})",District,
data.frame(state = character(),District = character()))
state District
1 AR 01
2 AZ 03
3 AZ 05
4 AZ 08
5 CA 01
6 CA 05
7 CA 11
8 CA 16
9 CA 18
10 CA 21
where \\w{2} means two words
The OP has written
I'm more familiar with strsplit(). But since there is nothing to split
on, its not applicable in this case
Au contraire! There is something to split on and it's called lookbehind:
strsplit(District, "(?<=[A-Z]{2})", perl = TRUE)
The lookbehind works like "inserting an invisible break" after 2 capital letters and splits the strings there.
The result is a list of vectors
[[1]]
[1] "AR" "01"
[[2]]
[1] "AZ" "03"
[[3]]
[1] "AZ" "05"
[[4]]
[1] "AZ" "08"
[[5]]
[1] "CA" "01"
[[6]]
[1] "CA" "05"
[[7]]
[1] "CA" "11"
[[8]]
[1] "CA" "16"
[[9]]
[1] "CA" "18"
[[10]]
[1] "CA" "21"
which can be turned into a matrix, e.g., by
do.call(rbind, strsplit(District, "(?<=[A-Z]{2})", perl = TRUE))
[,1] [,2]
[1,] "AR" "01"
[2,] "AZ" "03"
[3,] "AZ" "05"
[4,] "AZ" "08"
[5,] "CA" "01"
[6,] "CA" "05"
[7,] "CA" "11"
[8,] "CA" "16"
[9,] "CA" "18"
[10,] "CA" "21"
We can use str_match to capture first two characters and the remaining string in separate columns.
stringr::str_match(District, "(..)(.*)")[, -1]
# [,1] [,2]
# [1,] "AR" "01"
# [2,] "AZ" "03"
# [3,] "AZ" "05"
# [4,] "AZ" "08"
# [5,] "CA" "01"
# [6,] "CA" "05"
# [7,] "CA" "11"
# [8,] "CA" "16"
# [9,] "CA" "18"
#[10,] "CA" "21"
With the tidyverse this is very easy using the function separate from tidyr:
library(tidyverse)
District %>%
as.tibble() %>%
separate(value, c("state", "district"), sep = "(?<=[A-Z]{2})")
# A tibble: 10 × 2
state district
<chr> <chr>
1 AR 01
2 AZ 03
3 AZ 05
4 AZ 08
5 CA 01
6 CA 05
7 CA 11
8 CA 16
9 CA 18
10 CA 21
Treat it as fixed width file, and import:
# read fixed width file
read.fwf(textConnection(District), widths = c(2, 2), colClasses = "character")
# V1 V2
# 1 AR 01
# 2 AZ 03
# 3 AZ 05
# 4 AZ 08
# 5 CA 01
# 6 CA 05
# 7 CA 11
# 8 CA 16
# 9 CA 18
# 10 CA 21
I have a 5-level factor that looks like the following:
tmp
[1] NA
[2] 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46
[3] NA
[4] NA
[5] 5,9,16,24,35,36,42
[6] 4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
[7] 8,39
5 Levels: 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46 ...
I want to access the items within each level except NA. So I use the levels() function, which gives me:
> levels(tmp)
[1] "1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46"
[2] "4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50"
[3] "5,9,16,24,35,36,42"
[4] "8,39"
[5] "NA"
Then I would like to access the elements in each level, and store them as numbers. However, for example,
>as.numeric(cat(levels(tmp)[3]))
5,9,16,24,35,36,42numeric(0)
Can you help me removing the commas within the numbers and the numeric(0) at the very end. I would like to have a vector of numerics 5, 9, 16, 24, 35, 36, 42 so that I can use them as indices to access a data frame. Thanks!
You need to use a combination of unlist, strsplit and unique.
First, recreate your data:
dat <- read.table(text="
NA
1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46
NA
NA
5,9,16,24,35,36,42
4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
8,39")$V1
Next, find all the unique levels, after using strsplit:
sort(unique(unlist(
sapply(levels(dat), function(x)unlist(strsplit(x, split=",")))
)))
[1] "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" "23" "24" "25" "26"
[20] "27" "28" "29" "3" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "4" "40" "41" "42" "43"
[39] "44" "45" "46" "47" "48" "49" "5" "50" "6" "7" "8" "9"
Does this do what you want?
levels_split <- strsplit(levels(tmp), ",")
lapply(levels_split, as.numeric)
Using Andrie's dat
val <- scan(text=levels(dat),sep=",")
#Read 50 items
split(val,cumsum(c(T,diff(val) <0)))
#$`1`
#[1] 1 2 3 6 11 12 13 18 20 21 22 26 29 33 40 43 46
#$`2`
#[1] 4 7 10 14 15 17 19 23 25 27 28 30 31 32 34 37 38 41 44 45 47 48 49 50
#$`3`
#[1] 5 9 16 24 35 36 42
#$`4`
#[1] 8 39
Let's say I have a factor variable with numerous levels and I am trying to group them into several groups.
> levels(dat$years_continuously_insured_order2)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18"
[19] "19" "20"
> levels(dat$age_of_oldest_driver)
[1] "-16" "1" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
[22] "34" "35" "36" "37" "38" "39" "40
I have a script which runs through these variables and groups them into several categories. However, the number of levels could (and usually is) different each time my script runs. Therefore, if my original code to group the variables was the following (see below), it wouldn't be of use if in an hour later, my script runs and the levels are different. Instead of 15 levels, I could now have 25 levels and the values are different, but I still need to group them into specific categories.
dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)
How can I find a more elegant way to group variables into segments? Are there better ways to do this in R?
Thanks!
You could convert your factor levels in the continuously insured variable to numeric and then cut to your categories and re-factor(). The first step is described in the R-FAQ (to do properly it's a two step process):
dat$years_cont <- factor( cut( as.numeric(as.character(
dat$years_continuously_insured_order2)),
breaks=c(0,2,3, Inf), right=FALSE ),
labels=c( "1 or less", "2", "3 +")
)
#-----------------
> str(dat)
'data.frame': 100 obs. of 2 variables:
$ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
$ years_cont : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...
If your original column is a number, treat it as a number, not a factor. A much easier way to do what you're doing is:
bin.value = function(x) {
ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}
dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))