Say I have this data:
df <- structure(list(a_bracket = structure(c(9L, 8L, 9L,
9L, 9L, 9L), .Label = c("0-15", "16-20", "21-60", "61-100", "101-500",
"501-1000", "1001-3500", "3501-5000", "5001+"), class = "factor"), b_bracket = structure(c(3L,
2L, 3L, 4L, 1L, 4L), .Label = c("18-25", "26-35", "36-40", "41-45",
"46-48", "49-70", "71+"), class = "factor"), gender = structure(c(2L,
2L, 2L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor"),
q1 = structure(c(2L, 2L, 4L, 3L, 1L, 4L
), .Label = c("I don't\nlike a thing",
"I don't\na thing at all", "I like a\nthing",
"Ambivalent about\nthe thing"), class = "factor"), q2 = structure(c(3L,
2L, 1L, 1L, 4L, 1L), .Label = c("Neither like\nnor dislike",
"Somewhat\ndislike", "Somewhat\nlike", "Strongly\ndislike",
"Strongly\nlike"), class = "factor"), q3 = structure(c(2L,
2L, 2L, 3L, 2L, 1L), .Label = c("Moderately", "Not at\nall",
"Quite", "Slightly", "Very"
), class = "factor")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
df
# A tibble: 6 x 6
a_bracket b_bracket gender q1 q2 q3
<fct> <fct> <fct> <fct> <fct> <fct>
1 5001+ 36-40 Male "I don't\na thing at all" "Somewhat\nlike" "Not at\nall"
2 3501-5000 26-35 Male "I don't\na thing at all" "Somewhat\ndislike" "Not at\nall"
3 5001+ 36-40 Male "Ambivalent about\nthe thing" "Neither like\nnor dislike" "Not at\nall"
4 5001+ 41-45 Male "I like a\nthing" "Neither like\nnor dislike" "Quite"
5 5001+ 18-25 Female "I don't\nlike a thing" "Strongly\ndislike" "Not at\nall"
6 5001+ 41-45 Male "Ambivalent about\nthe thing" "Neither like\nnor dislike" "Moderately"
I'm trying to run a series of models, extract the r-squared and the AIC and append them together in a new df with the name of the dependent variable as the third row.
This is my attempt:
model_stats <- function(data){
mod <- glance(
lm(as.numeric(data) ~
a_bracket +
b_bracket +
gender,
data = df))
tibble(
r_squared = mod %>% select(r.squared),
AIC = mod %>% select(AIC)
)
}
map_dfr(
df %>%
select(starts_with("q")),
model_stats,
.id = "question"
) %>% unnest()
But for some reason I don't understand this repeats the output by N times for the number of models i'm running.
Does anyone know what i'm doing wrong here?
Try this -
library(tidyverse)
library(broom)
model_stats <- function(data){
mod <- glance(
lm(as.numeric(data) ~
a_bracket +
b_bracket +
gender,
data = df))
tibble(
r_squared = mod %>% pull(r.squared),
AIC = mod %>% pull(AIC)
)
df %>%
select(starts_with('q')) %>%
map_df(model_stats, .id = 'question')
# question r_squared AIC
# <chr> <dbl> <dbl>
#1 q1 6.59e- 1 21.8
#2 q2 7.5 e- 1 20.4
#3 q3 2.22e-31 20.4
Related
I have the following dataset
structure(list(Var1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("0", "1"), class = "factor"), Var2 = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("congruent", "incongruent"
), class = "factor"), Var3 = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("spoken", "written"), class = "factor"),
Freq = c(8L, 2L, 10L, 2L, 10L, 2L, 10L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
I would like to add another column reporting sum of coupled subsequent rows. Thus the final result would look like this:
I have proceeded like this
Table = as.data.frame(table(data_1$unimodal,data_1$cong_cond, data_1$presentation_mode)) %>%
mutate(Var1 = factor(Var1, levels = c('0', '1')))
row = Table %>% #is.factor(Table$Var1)
summarise(across(where(is.numeric),
~ .[Var1 == '0'] + .[Var1 == '1'],
.names = "{.col}_sum"))
column = c(rbind(row$Freq_sum,rep(NA, 4)))
Table$column = column
But I am looking for the quickest way possible with no scripting separated codes. Here I have used the dplyr package, but if you might know possibly suggest some other ways with map(), for loop, and or the method you deem as the best, please just let me know.
This should do:
df$column <-
rep(colSums(matrix(df$Freq, 2)), each=2) * c(1, NA)
If you are fine with no NAs in the dataframe, you can
df %>%
group_by(Var2, Var3) %>%
mutate(column = sum(Freq))
# A tibble: 8 × 5
# Groups: Var2, Var3 [4]
Var1 Var2 Var3 Freq column
<fct> <fct> <fct> <int> <int>
1 0 congruent spoken 8 10
2 1 congruent spoken 2 10
3 0 incongruent spoken 10 12
4 1 incongruent spoken 2 12
5 0 congruent written 10 12
6 1 congruent written 2 12
7 0 incongruent written 10 12
8 1 incongruent written 2 12
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data frame sex(male & female), age(child & adult), survive(yes & no) and frequency. How can I create a cross tab of sex and age?
sex age survive freq
male child yes 4
male adult yes 0
female child yes 6
female adult yes 3
male child no 1
male adult no 0
female child no 2
female adult no 1
I think you are looking for reshaping your data using pivot_wider from tidyr:
library(tidyr)
df %>% pivot_wider(., names_from = age, values_from = freq)
# A tibble: 4 x 4
sex survive child adult
<fct> <fct> <int> <int>
1 male yes 4 0
2 female yes 6 3
3 male no 1 0
4 female no 2 1
or
library(tidyr)
df %>% pivot_wider(., names_from = c(age, survive), values_from = freq)
# A tibble: 2 x 5
sex child_yes adult_yes child_no adult_no
<fct> <int> <int> <int> <int>
1 male 4 0 1 0
2 female 6 3 2 1
Is it what you are looking for ? If not, can you provide the expected outcome ?
Data
df = structure(list(sex = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L), .Label = c("female", "male"), class = "factor"), age = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("adult", "child"), class = "factor"),
survive = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), freq = c(4L, 0L, 6L, 3L, 1L, 0L,
2L, 1L)), class = "data.frame", row.names = c(NA, -8L))
Let say I want to find out the mean for other column group by the another column quantile.
For my table, I have several columns, now I got the 10% quantile for SalePrice column, there are some other numeric columns in my table(there are also some other factor variables in this table to).
And I want to calculate these variables' mean group by SalePrice column.
Then after that, I want to save these result in to a data frame.
I want to use loop to construct this data frame, I have some basic idea about the loop, but don't know how to finish it. Or add the column in the data frame in the loop
for (i in 1:lenth(tr)){
if(tr$i == numeric){
Result <- data.frame()
}
}
here is what I got for the SalePrice 10% quantile
> quantile(tr$SalePrice, c(seq(0, 1,0.1)),na.rm = TRUE, names = TRUE)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
34900 106450 124000 135500 147000 163000 179360 198740 230000 278000 755000
And my data look like this:
> dput(head(tr, 5))
structure(list(
MSSubClass = structure(c(6L, 1L, 6L, 7L, 6L), .Label = c("20", "30", "40", "45", "50", "60", "70", "75", "80", "85", "90", "120", "160", "180", "190"), class = "factor"),
MSZoning = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("C (all)", "FV", "RH", "RL", "RM"), class = "factor"),
LotFrontage = c(65, 80, 68, 60, 84),
LotArea = c(8450, 9600, 11250, 9550, 14260),
Street = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("Grvl", "Pave"), class = "factor"),
Alley = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("Grvl", "NA", "Pave"), class = "factor"),
LotShape = structure(c(4L, 4L, 1L, 1L, 1L), .Label = c("IR1", "IR2", "IR3", "Reg"), class = "factor"),
LandContour = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("Bnk", "HLS", "Low", "Lvl"), class = "factor"),
Utilities = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("AllPub", "NoSeWa"), class = "factor"),
LotConfig = structure(c(5L, 3L, 5L, 1L, 3L), .Label = c("Corner", "CulDSac", "FR2", "FR3", "Inside"), class = "factor"),
LandSlope = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("Gtl", "Mod", "Sev"), class = "factor"),
Neighborhood = structure(c(6L, 25L, 6L, 7L, 14L), .Label = c("Blmngtn", "Blueste", "BrDale", "BrkSide", "ClearCr", "CollgCr", "Crawfor", "Edwards", "Gilbert", "IDOTRR", "MeadowV", "Mitchel", "NAmes", "NoRidge", "NPkVill", "NridgHt", "NWAmes", "OldTown", "Sawyer", "SawyerW", "Somerst", "StoneBr", "SWISU", "Timber", "Veenker"), class = "factor"),
Condition1 = structure(c(3L, 2L, 3L, 3L, 3L), .Label = c("Artery", "Feedr", "Norm", "PosA", "PosN", "RRAe", "RRAn", "RRNe", "RRNn"), class = "factor"),
Condition2 = structure(c(3L, 3L, 3L, 3L, 3L), .Label = c("Artery", "Feedr", "Norm", "PosA", "PosN", "RRAe", "RRAn", "RRNn"), class = "factor"),
BldgType = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("1Fam", "2fmCon", "Duplex", "Twnhs","TwnhsE"), class = "factor"),
SalePrice = c(208500, 181500, 223500, 140000, 250000)), row.names = c(NA, 5L), class = "data.frame")
I only attach some variables here, not all of them.
You did not provide any data so I was left making a few assumptions. Assuming that your data is called df perhaps you can use dput(head(df, 100)) and copy and paste the output here?
If not does this work for you?
d1 <- runif(1000)
d2 <- runif(1000)
d3 <- runif(1000)
df <- data.frame(SalePrice = d1,
data2 = d2,
data3 = d3)
library(dplyr)
df %>%
mutate(Mydeciles = ntile(data2, 10)) %>%
group_by(Mydeciles) %>%
summarise(mean_sales_price = mean(SalePrice),
mean_data2 = mean(data2),
mean_data3 = mean(data3))
Output:
# A tibble: 10 x 4
Mydeciles mean_sales_price mean_data2 mean_data3
<int> <dbl> <dbl> <dbl>
1 1 0.497 0.0450 0.450
2 2 0.520 0.144 0.522
3 3 0.506 0.250 0.487
4 4 0.472 0.360 0.457
5 5 0.510 0.469 0.553
6 6 0.555 0.564 0.503
7 7 0.510 0.652 0.540
8 8 0.461 0.751 0.482
9 9 0.465 0.844 0.485
10 10 0.530 0.952 0.534
Solution 2:
df %>%
mutate(Mydeciles = ntile(SalePrice, 2)) %>%
group_by(Mydeciles) %>%
summarise_if(is.numeric, funs(mean))
Gives:
# A tibble: 2 x 4
Mydeciles LotFrontage LotArea SalePrice
<int> <dbl> <dbl> <dbl>
1 1 68.3 9200 176667.
2 2 76 12755 236750
A data.table answer:
library(data.table)
setDT(df)
df[, .(mean_price = mean(salesPrice), mean_r1 = mean(data1), mean_r2 = mean(data2)), by = .(qtl = quantile(salesPrice, seq(0, 1, 0.1)))]
Assuming these are few timestamped observations in a dataset:
Id Status DateCreated Group
10 Read 2017-11-04 18:24:55 Red
10 Write 2017-11-04 18:24:56 Red
10 Review 2017-11-04 18:25:16 Red
10 Read 2017-11-04 18:26:17 Red
10 Write 2017-11-04 18:26:47 Red
How do I collapse rows that are within 1 minute of each other?
For example, rows 1,2,3 are collapsed into 1 row and rows 4 and 5 are collapsed into second row.
The expected output would look like this:
Id Status DateCreated Date Ended Group
10 Read,Write,Review 2017-11-04 18:24:55 2017-11-04 18:25:16 Red, Red, Red
10 Read,Write 2017-11-04 18:26:17 2017-11-04 18:26:47 Red, Red
Here is the code to reproduce the test dataset in this example:
df <- structure(list(Id = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "10", class = "factor"),
Status = structure(c(1L, 3L, 2L, 1L, 3L), .Label = c("Read",
"Review", "Write"), class = "factor"), DateCreated = structure(1:5, .Label = c("2017-11-04 18:24:55",
"2017-11-04 18:24:56", "2017-11-04 18:25:16", "2017-11-04 18:26:17",
"2017-11-04 18:26:47"), class = "factor"), Group = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "Red", class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
I would do something like that:
df %>%
mutate(DateCreated = ymd_hms(DateCreated))%>%
group_by(minute(DateCreated))%>%
arrange(DateCreated)%>%
summarise(Status = paste(Status,collapse = ", "),DateCreated = DateCreated[1],Date_ended = last(DateCreated),Group = paste(Group,collapse = ", "))
library(lubridate)
library(dplyr)
library(purrr)
df <-
structure(
list(
Id = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "10", class = "factor"),
Status = structure(
c(1L, 3L, 2L, 1L, 3L),
.Label = c("Read",
"Review", "Write"),
class = "factor"
),
DateCreated = structure(
1:5,
.Label = c(
"2017-11-04 18:24:55",
"2017-11-04 18:24:56",
"2017-11-04 18:25:16",
"2017-11-04 18:26:17",
"2017-11-04 18:26:47"
),
class = "factor"
),
Group = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "Red", class = "factor")
),
class = "data.frame",
row.names = c(NA,-5L)
)
df2 <-
df %>%
mutate(DateCreated = as_datetime(df$DateCreated)) %>%
arrange(DateCreated) %>%
mutate(diff = DateCreated - lag(DateCreated))
df2$diff[1] <- 0L
g <- 0
df3 <- mutate(df2, date_groups =
accumulate(df2$diff, function(x, y)
if (y - x < 60)
g
else {
g <<- g + 1
})) %>%
group_by(date_groups) %>%
summarise(
Status = paste(Status, collapse = ", "),
DateCreated = DateCreated[1],
Date_ended = last(DateCreated),
Group = paste(Group, collapse = ", ")
)
df3
#> # A tibble: 2 x 5
#> date_groups Status DateCreated Date_ended Group
#> <dbl> <chr> <dttm> <dttm> <chr>
#> 1 0 Read, Write… 2017-11-04 18:24:55 2017-11-04 18:24:55 Red, Re…
#> 2 1 Read, Write 2017-11-04 18:26:17 2017-11-04 18:26:17 Red, Red
Created on 2019-01-28 by the reprex package (v0.2.1)
I am studying this webpage, and cannot figure out how to rename freq to something else, say number of times imbibed
Here is dput
structure(list(name = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("Bill", "Llib"), class = "factor"), drink = structure(c(2L,
3L, 1L, 4L, 2L, 3L, 1L, 4L), .Label = c("cocoa", "coffee", "tea",
"water"), class = "factor"), cost = 1:8), .Names = c("name",
"drink", "cost"), row.names = c(NA, -8L), class = "data.frame")
And this is working code with output. Again, I'd like to rename the freq column. Thanks!
library(plyr)
bevs$cost <- as.integer(bevs$cost)
count(bevs, "name")
Output
name freq
1 Bill 4
2 Llib 4
Are you trying to do this?
counts <- count(bevs, "name")
names(counts) <- c("name", "number of times imbibed")
counts
The count() function returns a data.frame. Just rename it like any other data.frame:
counts <- count(bevs, "name")
names(counts)[which(names(counts) == "freq")] <- "number of times imbibed"
print(counts)
# name number of times imbibed
# 1 Bill 4
# 2 Llib 4