Linear regression after select() method in R

Linear regression after select() method in R - r

I am trying to create a linear regression model from openintro::babies that predicts a baby's birthweight from all other variables in the data except case.
I have to following code:
library(tidyverse)
library(tidymodels)
babies <- openintro::babies %>%
drop_na() %>%
mutate(bwt = 28.3495 * bwt) %>%
mutate(weight = 0.453592 * weight)
linear_reg() %>%
set_engine("lm") %>%
fit(formula = bwt ~ ., data = babies %>% select(-case)) %>%
pluck("fit") %>%
augment(babies)
but in my output, I obtain the case variable as well
# A tibble: 1,174 x 14
case bwt gestation parity age height weight smoke .fitted .resid .hat .sigma .cooksd .std.resid
<int> <dbl> <int> <int> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3402. 284 0 27 62 45.4 0 3459. -56.8 0.00374 449. 0.00000863 -0.127
2 2 3203. 282 0 33 64 61.2 0 3547. -344. 0.00227 449. 0.000191 -0.767
3 3 3629. 279 0 28 64 52.2 1 3244. 385. 0.00291 449. 0.000307 0.858
4 5 3062. 282 0 23 67 56.7 1 3396. -334. 0.00475 449. 0.000379 -0.746
5 6 3856. 286 0 25 62 42.2 0 3474. 381. 0.00495 449. 0.000515 0.851
6 7 3912. 244 0 33 62 80.7 0 3065. 848. 0.0137 448. 0.00715 1.90
7 8 3742. 245 0 23 65 63.5 0 3124. 618. 0.00716 449. 0.00197 1.38
8 9 3402. 289 0 25 62 56.7 0 3558. -156. 0.00301 449. 0.0000521 -0.348
9 10 4054. 299 0 30 66 61.7 1 3591. 463. 0.00462 449. 0.000710 1.03
10 11 3969. 351 0 27 68 54.4 0 4527. -558. 0.0221 449. 0.00510 -1.26
# ... with 1,164 more rows
I'm not sure is it the correct way or it is inherent with the output.

Your code is correct. You're getting the case column because of the augment(babies) call, but if you replace it with augment(babies %>% select(-case)) you wont get that column. In other words, the regression model you're fitting does not take into acount the case column].

Related

Scraping web tables in R with interactive elements on page

I'm using rvest and tidyverse to scrape and process some data off the web.
There was recently a change to the website where some of the data is now in 2 tables and you can change between them using a button.
I'm trying to figure out how to scrape the data from both. They seem to have the same css class now so I can't figure out how to access each individually.
The code below seems to grab the "extended snowfall history", but I can't seem to figure out how to get the "2022-2023 winter season" data. Obviously I'll need to do a little processing and math to put the "2022-2023 winter season" into a new row in "extended snowfall history", but I can't even figure out how to grab it.
Currently I have :
library(rvest)
library(tidyverse)
mammoth <- read_html('https://www.mammothmountain.com/on-the-mountain/historical-snowfall')
snow <- mammoth %>%
html_element('table.css-86hwhl') %>%
html_table(header= TRUE, convert = TRUE) %>%
mutate_if(is.character,as.factor) %>%
mutate_if(is.integer,as.double) %>%
select(-Total)

A simple approach would be to use rvest::html_elements('table.css-86hwhl') (plural rather than singular) which will extract all html elements with the css class 'table.css-86hwhl'. Then you can manually choose the tables you want.
For example:
mammoth %>%
html_elements('table.css-86hwhl') %>%
html_table(header= TRUE, convert = TRUE)
gives a list of datasets
[[1]]
# A tibble: 53 × 13
Season `Pre-Oct` Oct Nov Dec Jan Feb Mar Apr May Jun Jul Total
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 1969-70 22 0 0 41 78 30.5 46 27 0 0 0 244.
2 1970-71 60 0 0 109 29 19.5 24 14 0 0 0 256.
3 1971-72 22 0 9 140. 32.2 11 1 53.5 0 0 0 268.
4 1972-73 4 0 57.1 64.5 84.9 103 43 10 4 0 0 370.
5 1973-74 45 0 0 45 87.5 9 82 38 0 0 0 306.
6 1974-75 15 0 13 58.5 26 101 90 75 0 0 0 378.
7 1975-76 27 0 0 14.5 13.5 54 50 38.5 0 0 0 198.
8 1976-77 4 0 0 0 26 27 37 0 0 0 0 94
9 1977-78 6 0 26 98 95.5 97 85.5 78.5 1 0 0 488.
10 1978-79 6 0 29.5 51.5 102. 96 78 11.5 11.5 0 0 386.
# … with 43 more rows
# ℹ Use `print(n = ...)` to see more rows
[[2]]
# A tibble: 4 × 3
Date Inches `Season Total to Date`
<chr> <chr> <chr>
1 November 8 "15\"" "28\""
2 November 7 "2\"" "13\""
3 November 3 "5\"" "11\""
4 November 2 "6\"" "6\""
[[3]]
# A tibble: 53 × 13
Season `Pre-Oct` Oct Nov Dec Jan Feb Mar Apr May Jun Jul Total
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 1969-70 22 0 0 41 78 30.5 46 27 0 0 0 244.
2 1970-71 60 0 0 109 29 19.5 24 14 0 0 0 256.
3 1971-72 22 0 9 140. 32.2 11 1 53.5 0 0 0 268.
4 1972-73 4 0 57.1 64.5 84.9 103 43 10 4 0 0 370.
5 1973-74 45 0 0 45 87.5 9 82 38 0 0 0 306.
6 1974-75 15 0 13 58.5 26 101 90 75 0 0 0 378.
7 1975-76 27 0 0 14.5 13.5 54 50 38.5 0 0 0 198.
8 1976-77 4 0 0 0 26 27 37 0 0 0 0 94
9 1977-78 6 0 26 98 95.5 97 85.5 78.5 1 0 0 488.
10 1978-79 6 0 29.5 51.5 102. 96 78 11.5 11.5 0 0 386.
# … with 43 more rows
# ℹ Use `print(n = ...)` to see more rows
[[4]]
# A tibble: 4 × 3
Date Inches `Season Total to Date`
<chr> <chr> <chr>
1 November 8 "15\"" "28\""
2 November 7 "2\"" "13\""
3 November 3 "5\"" "11\""
4 November 2 "6\"" "6\""
[[5]]
# A tibble: 53 × 13
Season `Pre-Oct` Oct Nov Dec Jan Feb Mar Apr May Jun Jul Total
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 1969-70 22 0 0 41 78 30.5 46 27 0 0 0 244.
2 1970-71 60 0 0 109 29 19.5 24 14 0 0 0 256.
3 1971-72 22 0 9 140. 32.2 11 1 53.5 0 0 0 268.
4 1972-73 4 0 57.1 64.5 84.9 103 43 10 4 0 0 370.
5 1973-74 45 0 0 45 87.5 9 82 38 0 0 0 306.
6 1974-75 15 0 13 58.5 26 101 90 75 0 0 0 378.
7 1975-76 27 0 0 14.5 13.5 54 50 38.5 0 0 0 198.
8 1976-77 4 0 0 0 26 27 37 0 0 0 0 94
9 1977-78 6 0 26 98 95.5 97 85.5 78.5 1 0 0 488.
10 1978-79 6 0 29.5 51.5 102. 96 78 11.5 11.5 0 0 386.
# … with 43 more rows
# ℹ Use `print(n = ...)` to see more rows
You can then just extract [[1]] and [[2]] and go from there, the tables that you are looking for. I'm sure there's a more principled approach out there, but this should do the job.

Make multiple new columns (ideally tidyverse) by applying mutate across a vector?

I am trying to simulate dataset for a linear regression in a bit of bayesian stats.
Obviously the overall formula is
Y = A + BX
I have simulated a variety of values of A and B using
A <- rnorm(10,0,1)
B <- rnorm(10,0,1)
#10 Random draws from a normal distribution for the values of each of A and B
I setup a list of possible values of X
stuff <- tibble(x = seq(130,170,10)) %>%
#Make table for possible values of X between 130>170 in intervals of 10
mutate(Y = A + B*x)
Make new value which is A plus B*each value of X
This works fine when I have only 1 value in A & B (i.e if I do A <- rnorm(1,0,1))
But obviously it doesnt work when the length of A & B > 1
What I am trying to figure out how to do us something that would be like
mutate(Y[i] = A[i] + B[i]*x
Resulting in 10 new columns Y1>Y10
Any suggestions welcomed

Here's how I would do what I think you want. I'd start long and then convert to wide...
library(tidyverse)
set.seed(123)
df <- tibble() %>%
expand(
nesting(
ID=1:10,
A=rnorm(10,0,1),
B=rnorm(10,0,1)
),
X=seq(130,170,10)
) %>%
mutate(Y=A + B*X)
df
# A tibble: 50 × 5
ID A B X Y
<int> <dbl> <dbl> <dbl> <dbl>
1 1 -1.07 0.426 130 54.4
2 1 -1.07 0.426 140 58.6
3 1 -1.07 0.426 150 62.9
4 1 -1.07 0.426 160 67.2
5 1 -1.07 0.426 170 71.4
6 2 -0.218 -0.295 130 -38.6
7 2 -0.218 -0.295 140 -41.5
8 2 -0.218 -0.295 150 -44.5
9 2 -0.218 -0.295 160 -47.4
10 2 -0.218 -0.295 170 -50.4
# … with 40 more rows
Now, pivot to wide...
df %>%
pivot_wider(
names_from=ID,
values_from=Y,
names_prefix="Y",
id_cols=X
)
# A tibble: 5 × 11
X Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 130 54.4 -38.6 115. 113. 106. 87.8 72.8 -7.90 -40.9 -48.2
2 140 58.6 -41.5 124. 122. 114. 94.7 78.4 -8.51 -44.0 -52.0
3 150 62.9 -44.5 133. 131. 123. 102. 83.9 -9.13 -47.0 -55.8
4 160 67.2 -47.4 142. 140. 131. 108. 89.5 -9.75 -50.1 -59.6
5 170 71.4 -50.4 151. 149. 139. 115. 95.0 -10.4 -53.2 -63.4
At this point you've lost A & B, because you'd need another 10 columns to store the original A's and another 10 to store the original B's.
Personally, I'd probably stick with the long format, because that's most likely going to make your future workflow easier. And I get to keep the A's and B's.

Create a columns conditioned to two possible levels contained in another column

I have the following dataset
out
# A tibble: 1,356 x 7
ID GROUP Gender Age Education tests score
<dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 TRAINING 1 74 18 ADAS_CogT0 14.7
2 1 TRAINING 1 74 18 ROCF_CT0 32
3 1 TRAINING 1 74 18 ROCF_IT0 3.7
4 1 TRAINING 1 74 18 ROCF_RT0 3.9
5 1 TRAINING 1 74 18 PVF_T0 41.3
6 1 TRAINING 1 74 18 SVF_T0 40
7 1 TRAINING 1 74 18 ADAS_CogT7 16
8 1 TRAINING 1 74 18 ROCF_CT7 33
9 1 TRAINING 1 74 18 ROCF_IT7 1.7
10 1 TRAINING 1 74 18 ROCF_RT7 2.4
If I would like to create a column where in place of the tests ending with T0 would corresponf the value score0 whereas in place of tests ending with T7 the value would be score7`, which are the possible way to fulfill this?

Please be so kind put the data in your posts. >> dput(df)
You could use a combination of case_when and str_detect
library(dplyr)
library(stringr)
df <- structure(
list(
ID = 1:10,
GROUP = rep('TRAINING', 10),
Gender = rep(1, 10),
Education = rep(74, 10),
test = c(
'ADAS_CogT0',
'ROCF_CT0',
'ROCF_IT0',
'ROCF_RT0',
'PVF_T0',
'SVF_T0',
'ADAS_CogT7',
'ROCF_CT7',
'ROCF_IT7',
'ROCF_RT7'
),
score = c(14.7,32,3.7,3.9,41.3,40,16,33,1.7,2.4)
),
row.names = c(1:10),
class = "data.frame"
)
df2 <- df %>%
mutate(new = case_when(str_detect(test, 'T0') ~ 'score0',
str_detect(test, 'T7') ~ 'score7',
TRUE ~ test)
)
ID GROUP Gender Education test score new
1 1 TRAINING 1 74 ADAS_CogT0 14.7 score0
2 2 TRAINING 1 74 ROCF_CT0 32.0 score0
3 3 TRAINING 1 74 ROCF_IT0 3.7 score0
4 4 TRAINING 1 74 ROCF_RT0 3.9 score0
5 5 TRAINING 1 74 PVF_T0 41.3 score0
6 6 TRAINING 1 74 SVF_T0 40.0 score0
7 7 TRAINING 1 74 ADAS_CogT7 16.0 score7
8 8 TRAINING 1 74 ROCF_CT7 33.0 score7
9 9 TRAINING 1 74 ROCF_IT7 1.7 score7
10 10 TRAINING 1 74 ROCF_RT7 2.4 score7

Do you want the output to be string 'score0' and 'score7' ?
You may try -
library(dplyr)
out %>%
mutate(result = case_when(grepl('T0$', tests) ~ 'score0',
grepl('T7$', tests) ~ 'score7'))
# ID GROUP Gender Age Education tests score result
#1 1 TRAINING 1 74 18 ADAS_CogT0 14.7 score0
#2 1 TRAINING 1 74 18 ROCF_CT0 32.0 score0
#3 1 TRAINING 1 74 18 ROCF_IT0 3.7 score0
#4 1 TRAINING 1 74 18 ROCF_RT0 3.9 score0
#5 1 TRAINING 1 74 18 PVF_T0 41.3 score0
#6 1 TRAINING 1 74 18 SVF_T0 40.0 score0
#7 1 TRAINING 1 74 18 ADAS_CogT7 16.0 score7
#8 1 TRAINING 1 74 18 ROCF_CT7 33.0 score7
#9 1 TRAINING 1 74 18 ROCF_IT7 1.7 score7
#10 1 TRAINING 1 74 18 ROCF_RT7 2.4 score7
Or another option with readr::parse_number.
out %>%
mutate(result = paste0('score', readr::parse_number(tests)))

Looping linear regression output in a data frame in r

I have a dataset below in which I want to do linear regression for each country and state and then cbind the predicted values in the dataset:
Final data frame after adding three more columns:
I have done it for one country and one area but want to do it for each country and area and put the predicted, upper and lower limit values back in the data set by cbind:
data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))
data_1 <- data[(data$country=="US" & data$Area=="G"),]
model <- lm(amount ~ week, data = data_1)
pre <- predict(model,newdata = data_1,interval = "prediction",level = 0.95)
pre
How can I loop this for other combination of country and Area?

...and a Base R solution:
data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))
splitVar <- paste0(data$country,"-",data$Area)
dfList <- split(data,splitVar)
result <- do.call(rbind,lapply(dfList,function(x){
model <- lm(amount ~ week, data = x)
cbind(x,predict(model,newdata = x,interval = "prediction",level = 0.95))
}))
result
...the results:
country Area week amount fit lwr upr
UK-A.11 UK A 1 45 36.6 -6.0463638 79.24636
UK-A.12 UK A 2 34 37.1 -1.3409128 75.54091
UK-A.13 UK A 3 23 37.6 0.6671656 74.53283
UK-A.14 UK A 4 43 38.1 -0.3409128 76.54091
UK-A.15 UK A 5 43 38.6 -4.0463638 81.24636
US-G.1 US G 1 12 20.8 -27.6791493 69.27915
US-G.2 US G 2 23 21.7 -21.9985147 65.39851
US-G.3 US G 3 34 22.6 -19.3841749 64.58417
US-G.4 US G 4 32 23.5 -20.1985147 67.19851
US-G.5 US G 5 12 24.4 -24.0791493 72.87915
US-I.6 US I 1 12 20.8 -33.8985900 75.49859
US-I.7 US I 2 34 30.5 -18.8046427 79.80464
US-I.8 US I 3 45 40.2 -7.1703685 87.57037
US-I.9 US I 4 65 49.9 0.5953573 99.20464
US-I.10 US I 5 45 59.6 4.9014100 114.29859

We can also use function augment from package broom to get your desired information:
library(purrr)
library(broom)
data %>%
group_by(country, Area) %>%
nest() %>%
mutate(models = map(data, ~ lm(amount ~ week, data = .)),
aug = map(models, ~ augment(.x, interval = "prediction"))) %>%
unnest(aug) %>%
select(country, Area, amount, week, .fitted, .lower, .upper)
# A tibble: 15 x 7
# Groups: country, Area [3]
country Area amount week .fitted .lower .upper
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 US G 12 1 20.8 -27.7 69.3
2 US G 23 2 21.7 -22.0 65.4
3 US G 34 3 22.6 -19.4 64.6
4 US G 32 4 23.5 -20.2 67.2
5 US G 12 5 24.4 -24.1 72.9
6 US I 12 1 20.8 -33.9 75.5
7 US I 34 2 30.5 -18.8 79.8
8 US I 45 3 40.2 -7.17 87.6
9 US I 65 4 49.9 0.595 99.2
10 US I 45 5 59.6 4.90 114.
11 UK A 45 1 36.6 -6.05 79.2
12 UK A 34 2 37.1 -1.34 75.5
13 UK A 23 3 37.6 0.667 74.5
14 UK A 43 4 38.1 -0.341 76.5
15 UK A 43 5 38.6 -4.05 81.2

Here is a tidyverse way to do this for every combination of country and Area.
library(tidyverse)
data %>%
group_by(country, Area) %>%
nest() %>%
mutate(model = map(data, ~ lm(amount ~ week, data = .x)),
result = map2(model, data, ~data.frame(predict(.x, newdata = .y,
interval = "prediction",level = 0.95)))) %>%
ungroup %>%
select(-model) %>%
unnest(c(data, result))
# country Area week amount fit lwr upr
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 US G 1 12 20.8 -27.7 69.3
# 2 US G 2 23 21.7 -22.0 65.4
# 3 US G 3 34 22.6 -19.4 64.6
# 4 US G 4 32 23.5 -20.2 67.2
# 5 US G 5 12 24.4 -24.1 72.9
# 6 US I 1 12 20.8 -33.9 75.5
# 7 US I 2 34 30.5 -18.8 79.8
# 8 US I 3 45 40.2 -7.17 87.6
# 9 US I 4 65 49.9 0.595 99.2
#10 US I 5 45 59.6 4.90 114.
#11 UK A 1 45 36.6 -6.05 79.2
#12 UK A 2 34 37.1 -1.34 75.5
#13 UK A 3 23 37.6 0.667 74.5
#14 UK A 4 43 38.1 -0.341 76.5
#15 UK A 5 43 38.6 -4.05 81.2

And one more:
library(tidyverse)
data %>%
mutate(CountryArea=paste0(country,Area) %>% factor %>% fct_inorder) %>%
split(.$CountryArea) %>%
map(~lm(amount~week, data=.)) %>%
map(predict, interval = "prediction",level = 0.95) %>%
reduce(rbind) %>%
cbind(data, .)
country Area week amount fit lwr upr
1 US G 1 12 20.8 -27.6791493 69.27915
2 US G 2 23 21.7 -21.9985147 65.39851
3 US G 3 34 22.6 -19.3841749 64.58417
4 US G 4 32 23.5 -20.1985147 67.19851
5 US G 5 12 24.4 -24.0791493 72.87915
6 US I 1 12 20.8 -33.8985900 75.49859
7 US I 2 34 30.5 -18.8046427 79.80464
8 US I 3 45 40.2 -7.1703685 87.57037
9 US I 4 65 49.9 0.5953573 99.20464
10 US I 5 45 59.6 4.9014100 114.29859
11 UK A 1 45 36.6 -6.0463638 79.24636
12 UK A 2 34 37.1 -1.3409128 75.54091
13 UK A 3 23 37.6 0.6671656 74.53283
14 UK A 4 43 38.1 -0.3409128 76.54091
15 UK A 5 43 38.6 -4.0463638 81.24636

regression by group and retain all the columns in R

I am doing a linear regression by group and want to extract the residuals of the regression
library(dplyr)
set.seed(124)
dat <- data.frame(ID = sample(111:503, 18576, replace = T),
ID2 = sample(11:50, 18576, replace = T),
ID3 = sample(1:14, 18576, replace = T),
yearRef = sample(1998:2014, 18576, replace = T),
value = rnorm(18576))
resid <- dat %>% dplyr::group_by(ID3) %>%
do(augment(lm(value ~ yearRef, data=.))) %>% ungroup()
How do I retain the ID, ID2 as well in the resid. At the moment, it only retains the ID3 in the final data frame

Use group_split then loop through each group using map_dfr to bind ID, ID2 and augment output using bind_cols
library(dplyr)
library(purrr)
dat %>% group_split(ID3) %>%
map_dfr(~bind_cols(select(.x,ID,ID2), augment(lm(value~yearRef, data=.x))), .id = "ID3")
# A tibble: 18,576 x 12
ID3 ID ID2 value yearRef .fitted .se.fit .resid .hat .sigma .cooksd
<chr> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 196 16 -0.385 2009 -0.0406 0.0308 -0.344 1.00e-3 0.973 6.27e-5
2 1 372 47 -0.793 2012 -0.0676 0.0414 -0.726 1.81e-3 0.973 5.05e-4
3 1 470 15 -0.496 2011 -0.0586 0.0374 -0.438 1.48e-3 0.973 1.50e-4
4 1 242 40 -1.13 2010 -0.0496 0.0338 -1.08 1.21e-3 0.973 7.54e-4
5 1 471 34 1.28 2006 -0.0135 0.0262 1.29 7.26e-4 0.972 6.39e-4
6 1 434 35 -1.09 1998 0.0586 0.0496 -1.15 2.61e-3 0.973 1.82e-3
7 1 467 45 -0.0663 2011 -0.0586 0.0374 -0.00769 1.48e-3 0.973 4.64e-8
8 1 334 27 -1.37 2003 0.0135 0.0305 -1.38 9.86e-4 0.972 9.92e-4
9 1 186 25 -0.0195 2003 0.0135 0.0305 -0.0331 9.86e-4 0.973 5.71e-7
10 1 114 34 1.09 2014 -0.0857 0.0500 1.18 2.64e-3 0.973 1.94e-3
# ... with 18,566 more rows, and 1 more variable: .std.resid <dbl>

Taking the "many models" approach, you can nest the data on ID3 and use purrr::map to create a list-column of the broom::augment data frames. The data list-column has all the original columns aside from ID3; map into that and select just the ones you want. Here I'm assuming you want to keep any column that starts with "ID", but you can change this. Then unnest both the data and the augment data frames.
library(dplyr)
library(tidyr)
dat %>%
group_by(ID3) %>%
nest() %>%
mutate(aug = purrr::map(data, ~broom::augment(lm(value ~ yearRef, data = .))),
data = purrr::map(data, select, starts_with("ID"))) %>%
unnest(c(data, aug))
#> # A tibble: 18,576 x 12
#> # Groups: ID3 [14]
#> ID3 ID ID2 value yearRef .fitted .se.fit .resid .hat .sigma
#> <int> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 11 431 15 0.619 2002 0.0326 0.0346 0.586 1.21e-3 0.995
#> 2 11 500 21 -0.432 2000 0.0299 0.0424 -0.462 1.82e-3 0.995
#> 3 11 392 28 -0.246 1998 0.0273 0.0515 -0.273 2.67e-3 0.995
#> 4 11 292 40 -0.425 1998 0.0273 0.0515 -0.452 2.67e-3 0.995
#> 5 11 175 36 -0.258 1999 0.0286 0.0468 -0.287 2.22e-3 0.995
#> 6 11 419 23 3.13 2005 0.0365 0.0273 3.09 7.54e-4 0.992
#> 7 11 329 17 -0.0414 2007 0.0391 0.0274 -0.0806 7.57e-4 0.995
#> 8 11 284 23 -0.450 2006 0.0378 0.0268 -0.488 7.25e-4 0.995
#> 9 11 136 28 -0.129 2006 0.0378 0.0268 -0.167 7.25e-4 0.995
#> 10 11 118 17 -1.55 2013 0.0470 0.0470 -1.60 2.24e-3 0.995
#> # … with 18,566 more rows, and 2 more variables: .cooksd <dbl>,
#> # .std.resid <dbl>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Linear regression after select() method in R - r

Your code is correct. You're getting the case column because of the augment(babies) call, but if you replace it with augment(babies %>% select(-case)) you wont get that column. In other words, the regression model you're fitting does not take into acount the case column].

Related

Scraping web tables in R with interactive elements on page

Make multiple new columns (ideally tidyverse) by applying mutate across a vector?

Create a columns conditioned to two possible levels contained in another column

Looping linear regression output in a data frame in r

regression by group and retain all the columns in R

Categories

Resources