This is the code that I used to try and get the output in the attached:
fun_plot5 <- function(ycol, ylab, xcol, data) {
xx3 <- paste(ycol, xcol, sep = "~")
xx3 <- as.formula(xx3)
plotmeans(xx3, data = get_proposer,
xlab = "Gender", ylab = ylab,
main = "Mean Plot with 95% CI")
}
y_cols6 <- names(get_proposer[24:29])
y_lab6 <- c("Actual Offer (by A)", "Actual Amount Transferred to Partner (Bot)", "Actual Payoff (for A)", "Practice Offer (by A)", "Pradtice Amount Transferred to Partner (Bot)", "Practice Payoff (for A)")
old_par4 <- par(mfrow = c(3,3))
mapply(fun_plot5, y_cols6, y_lab6,
MoreArgs = list(
xcol = "gender",
data = get_proposer
))
I'm trying to change the x-axis values (for all plots) from 1 and 2, to "Male" and "Female", respectively. I tried including this line of code at the end of the code above, but I was still not able to get the outcome I want.
fun_plot5 +
scale_x_discrete(limits = c("Male", "Female"))
When I added this line to one of my other plots that used ggplot, it worked. But it didn't work for the current plot, in attached. How should I go about with this?
Many thanks!
Updated with Data
# A tibble: 31 x 10
similar_task age gender income actual_offer actual_payoff actual_partner_transfer practice_partner_transfer practice_offer practice_payoff
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 29 1 4 40 126 66 48 30 118
2 3 36 1 4 100 273 273 180 100 180
3 5 39 2 2 0 100 0 0 0 100
4 3 25 1 7 100 6 6 195 100 195
5 3 28 2 7 25 99 24 84 50 134
6 2 45 2 5 80 29 9 42 100 42
7 3 30 1 6 100 45 45 123 100 123
8 5 37 1 3 0 100 0 0 0 100
9 2 38 2 2 25 99 24 63 25 138
10 1 25 1 1 100 183 183 285 100 285
# ... with 21 more rows
The columns that I used in my plots (in attached) can be found in the last few columns in the data, from "actual_offer" to "practice_payoff" (or, columns 24:29 in the entire dataset).
In the plotmeans documentation the legends argument is defined as the vector containing strings to label the groups. What happens when you try legends=c("Male, "Female") within your plotmeans call?
Related
I have a question related to the function tmerge() in the R package survival.
Trying to set up a data set with time-dependent covariates, but the value(s) of the initial time period is set to NA (see reprex below).
I have one data frame with baseline variables, time-, and event data, and a second data frame with variables measured 3 months after baseline.
Have used the same approach as in the PBC-data example in the vignette by Terry Therneau and Co. (or tried at least! https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf). On p. 11 it says:
"The tdc and cumtdc arguments can have 1, 2 or three arguments. The first is always the
time point, the second, if present, is the value to be inserted, and an optional third argument is the initial value. If the tdc call has a single argument the result is always a 0/1 variable, 0 before the time point and 1 after. For the 2 or three argument form, the starting value before the first definition of the new variable (before the first time point) will be the initial value. The default for the initial value is NA, the value of the tdcstart option." Not sure I understand the last bit highlighted in bold.
Do not get the same problem when I replicate the PBC-example. Tried to specify init in the second tmerge call and/or the tdcstart option without any success (both generates an error). There are no missing values in the covariates or the outcome (time, event).
Reaching out here, since I cannot find out what I am doing wrong.
Thanks a lot in advance!
PS. This is my first post, so apologize if I have missed something. Hope it makes sense.
library(tidyverse)
library(survival)
set.seed(123)
# Generate data
df_base <- tibble(
ID = as.numeric(1:100),
time = as.integer(runif(100, min = 100, max = 730)),
status = as.factor(sample(x = c("0", "1"), prob = c(0.7, 0.3), size = 100, replace = T)),
vas = as.integer(rnorm(n = 100, mean = 53, sd = 10)))
df_fu <- tibble(
ID = as.numeric(1:100),
fu_3mo = 91,
vas = as.integer(rnorm(n = 100, mean = 44, sd = 15)))
# Baseline data
head(df_base)
#> # A tibble: 6 x 4
#> ID time status vas
#> <dbl> <int> <fct> <int>
#> 1 1 281 0 45
#> 2 2 596 0 55
#> 3 3 357 0 50
#> 4 4 656 1 49
#> 5 5 692 0 43
#> 6 6 128 1 52
# Follow-up data
head(df_fu)
#> # A tibble: 6 x 3
#> ID fu_3mo vas
#> <dbl> <dbl> <int>
#> 1 1 91 76
#> 2 2 91 63
#> 3 3 91 40
#> 4 4 91 52
#> 5 5 91 37
#> 6 6 91 36
# Generate time-dependent covariates
df_tdc <- tmerge(df_base, df_base, id = ID, surgery = event(time, status))
head(df_tdc)
#> ID time status vas tstart tstop surgery
#> 1 1 281 0 45 0 281 0
#> 2 2 596 0 55 0 596 0
#> 3 3 357 0 50 0 357 0
#> 4 4 656 1 49 0 656 1
#> 5 5 692 0 43 0 692 0
#> 6 6 128 1 52 0 128 1
df_tdc <- tmerge(df_tdc, df_fu, id = ID, vas = tdc(fu_3mo, vas))
#> Warning in tmerge(df_tdc, df_fu, id = ID, vas = tdc(fu_3mo, vas)): replacement
#> of variable 'vas'
head(df_tdc)
#> ID time status vas tstart tstop surgery
#> 1 1 281 0 NA 0 91 0
#> 2 1 281 0 76 91 281 0
#> 3 2 596 0 NA 0 91 0
#> 4 2 596 0 63 91 596 0
#> 5 3 357 0 NA 0 91 0
#> 6 3 357 0 40 91 357 0
Created on 2021-11-26 by the reprex package (v0.3.0)
I have a data frame named df. in first step I have changed age into age-group and then got sum of each row based on agegroup and gender.
df<- data_frame(age= c(0,1,3,5,6,29,43,12,1,3,5,12,29,43,0,6), pop= c(12,11,33,45,56,54,67,76,65,11,78,90,112,29,70,60),gender=c(2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1))
changing age into age-group :
x <- df$age %/% 5
x <- pmax(0, pmin(20, x))
df$agegroup<- c(paste(0:19*5, 1:20*5-1, sep="-"), "+100")[x+1]
sum of each row:
df1 <- aggregate(formula = pop ~ gender + agegroup, data = df, FUN = sum)
gender agegroup pop
1 1 0-4 146
2 2 0-4 56
3 1 10-14 90
4 2 10-14 76
5 1 25-29 112
6 2 25-29 54
7 1 40-44 29
8 2 40-44 67
9 1 5-9 138
10 2 5-9 101
as shown in df1, the age-group 5-9 is located after 40-44 but I want to have ordered age-group. my desired output would be like this :
gender agegroup pop
1 1 0-4 146
2 2 0-4 56
3 1 5-9 138
4 2 5-9 101
5 1 10-14 90
6 2 10-14 76
7 1 25-29 112
8 2 25-29 54
9 1 40-44 29
10 2 40-44 67
You're going to want to set agegroup to a factor and specify the factor order. One way to do this is with reorder(). For example
df$agegroup <- reorder(df$agegroup,
as.numeric(gsub("-\\d+","", df$agegroup)))
We use gsub() to take off the second number, and then we can use that to sort by the numeric value of the first number.
Once you've updated the level order to be what you want, you should get the results in the order you want.
levels(df$agegroup)
# [1] "0-4" "5-9" "10-14" "25-29" "40-44"
I am kind of reinventing the wheel here for something that you have already solved but you can use cut and pass breaks and labels to it.
The benefit of using cut is that it will give you factor levels which are already in the order that you want, you just need to arrange them.
library(dplyr)
x1 <- c(0, seq(4, 100, 5))
labels <- c(paste(x1[-length(x1)] + 1, x1[-1], sep = '-'), '100+')
labels[1] <- '0-4'
df %>%
group_by(gender, agegroup = cut(age, c(x1, Inf), labels, include.lowest = TRUE)) %>%
summarise(pop = sum(pop)) %>%
ungroup %>%
arrange(agegroup)
# gender agegroup pop
# <dbl> <fct> <dbl>
# 1 1 0-4 146
# 2 2 0-4 56
# 3 1 5-9 138
# 4 2 5-9 101
# 5 1 10-14 90
# 6 2 10-14 76
# 7 1 25-29 112
# 8 2 25-29 54
# 9 1 40-44 29
#10 2 40-44 67
We can use mixedorder from gtools
df1[gtools::mixedorder(df1$agegroup),]
gender agegroup pop
1 1 0-4 146
2 2 0-4 56
9 1 5-9 138
10 2 5-9 101
3 1 10-14 90
4 2 10-14 76
5 1 25-29 112
6 2 25-29 54
7 1 40-44 29
8 2 40-44 67
I have the following codes for Netflix experiment to reduce the price of Netflix and see if people watch more or less TV. Each time someone uses Netflix, it shows what they watched and how long they watched it for.
**library(tidyverse)
sample_size <- 10000
set.seed(853)
viewing_data <-
tibble(unique_person_id = sample(x = c(1:100),
size = sample_size,
replace = TRUE),
tv_show = sample(x = c("Broadchurch", "Duty-Shame", "Drive to Survive", "Shetland", "The Crown"),
size = sample_size,
replace = TRUE),
)**
I then want to write some code that would randomly assign people into one of two groups - treatment and control. However, the dataset it's in a row level as there are 1000 observations. I want change it to person level in R, then I could sign a person be either treated or not. A person should not be both treated and not treated. However, the tv_show shows many times for one person. Any one know how to reshape the dataset in this case?
library(dplyr)
treatment <- viewing_data %>%
distinct(unique_person_id) %>%
mutate(treated = sample(c("yes", "no"), size = 100, replace = TRUE))
viewing_data %>%
left_join(treatment, by = "unique_person_id")
You can change the way of sampling if you need to...
You can do the below, this groups your observations by person id, assigns a unique "treat/control" per group:
library(dplyr)
viewing_data %>%
group_by(unique_person_id) %>%
mutate(group=sample(c("treated","control"),1))
# A tibble: 10,000 x 3
# Groups: unique_person_id [100]
unique_person_id tv_show group
<int> <chr> <chr>
1 9 Drive to Survive control
2 64 Shetland treated
3 90 The Crown treated
4 93 Drive to Survive treated
5 17 Duty-Shame treated
6 29 The Crown control
7 84 Broadchurch control
8 83 The Crown treated
9 3 The Crown control
10 33 Broadchurch control
# … with 9,990 more rows
We can check our results, all of the ids have only 1 group of treated / control:
newdata <- viewing_data %>%
group_by(unique_person_id) %>%
mutate(group=sample(c("treated","control"),1))
tapply(newdata$group,newdata$unique_person_id,n_distinct)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
In case you wanted random and equal allocation of persons into the two groups (complete random allocation), you can use the following code.
library(dplyr)
Persons <- viewing_data %>%
distinct(unique_person_id) %>%
mutate(group=sample(100), # in case the ids are not truly random
group=ifelse(group %% 2 == 0, 0, 1)) # works if only two groups
Persons
# A tibble: 100 x 2
unique_person_id group
<int> <dbl>
1 1 0
2 2 0
3 3 1
4 4 0
5 5 1
6 6 1
7 7 1
8 8 0
9 9 1
10 10 0
# ... with 90 more rows
And to check that we've got 50 in each group:
Persons %>% count(group)
# A tibble: 2 x 2
group n
<dbl> <int>
1 0 50
2 1 50
You could also use the randomizr package, which has many more features apart from complete random allocation.
library(randomizr)
Persons <- viewing_data %>%
distinct(unique_person_id) %>%
mutate(group=complete_ra(N=100, m=50))
Persons %>% count(group) # Check
To link this back to the viewing_data, use inner_join.
viewing_data %>% inner_join(Persons, by="unique_person_id")
# A tibble: 10,000 x 3
unique_person_id tv_show group
<int> <chr> <int>
1 10 Shetland 1
2 95 Broadchurch 0
3 7 Duty-Shame 1
4 68 Drive to Survive 0
5 17 Drive to Survive 1
6 70 Shetland 0
7 78 Drive to Survive 0
8 21 Broadchurch 1
9 80 The Crown 0
10 70 Shetland 0
# ... with 9,990 more rows
I am giving a data set called ChickWeight. This has the weights of chicks over a time period. I need to introduce a new variable that measures the current weight difference compared to day 0.
I first cleaned the data set and took out only the chicks that were recorded for all 12 weigh ins:
library(datasets)
library(dplyr)
Frequency <- dplyr::count(ChickWeight$Chick)
colnames(Frequency)[colnames(Frequency)=="x"] <- "Chick"
a <- inner_join(ChickWeight, Frequency, by='Chick')
complete <- a[(a$freq == 12),]
head(complete,3)
This data set is in the library(datasets) of r, called ChickWeight.
You can try:
library(dplyr)
ChickWeight %>%
group_by(Chick) %>%
filter(any(Time == 21)) %>%
mutate(wdiff = weight - first(weight))
# A tibble: 540 x 5
# Groups: Chick [45]
weight Time Chick Diet wdiff
<dbl> <dbl> <ord> <fct> <dbl>
1 42 0 1 1 0
2 51 2 1 1 9
3 59 4 1 1 17
4 64 6 1 1 22
5 76 8 1 1 34
6 93 10 1 1 51
7 106 12 1 1 64
8 125 14 1 1 83
9 149 16 1 1 107
10 171 18 1 1 129
# ... with 530 more rows
I have a dataset that looks like this:
ID SEX WEIGHT BMI
1 2 65 25
1 2 65 25
1 2 65 25
2 1 70 30
2 1 70 30
2 1 70 30
2 1 70 30
3 2 50 18
3 2 50 18
4 1 85 20
4 1 85 20
I want to calculate fat free mass (FFM) and attach the value in a new column in the dataset for each individual. These are the functions to calculate FFM for males and females:
for males (SEX=1):
FFMCalMale <- function (WEIGHT, BMI) {
FFM = 9270*WEIGHT/(6680+216*BMI)
}
and for females (SEX=2):
FFMCalFemale <- function(WEIGHT, BMI) {
FFM = 9270*WEIGHT/(8780+244*BMI)
}
I want to modify this function so it check for the SEX (1, male or 2 is female) then do the calculation for FFM based on that and apply the function for each individual. Could you please help?
Thanks in advance!
You could use ifelse
data$FFM <- ifelse(data$SEX==1,
FFMCalMale(data$WEIGHT, data$BMI),
FFMCalFemale(data$WEIGHT, data$BMI))
A data.table approach:
mydata <- read.table(
header = T, con <- textConnection
('
ID SEX WEIGHT BMI
1 2 65 25
1 2 65 25
1 2 65 25
2 1 70 30
2 1 70 30
2 1 70 30
2 1 70 30
3 2 50 18
3 2 50 18
4 1 85 20
4 1 85 20
'), stringsAsFactors = FALSE)
close(con)
library(data.table) ## load data.table
setDT(mydata) ## convert the data to datatable
FFMCalMale <- function (WEIGHT, BMI) {
FFM = 9270*WEIGHT/(6680+216*BMI)
}
FFMCalFemale <- function(WEIGHT, BMI) {
FFM = 9270*WEIGHT/(8780+BMI)
}
setkey(mydata, SEX)
mydata[, FFM := ifelse(SEX == 1,
FFMCalMale(WEIGHT, BMI),
FFMCalFemale(WEIGHT, BMI))][]
# ID SEX WEIGHT BMI FFM
# 1: 2 1 70 30 49.30851
# 2: 2 1 70 30 49.30851
# 3: 2 1 70 30 49.30851
# 4: 2 1 70 30 49.30851
# 5: 4 1 85 20 71.63182
# 6: 4 1 85 20 71.63182
# 7: 1 2 65 25 68.43271
# 8: 1 2 65 25 68.43271
# 9: 1 2 65 25 68.43271
# 10: 3 2 50 18 52.68243
# 11: 3 2 50 18 52.68243
Here are two ways, one just taking the dataframe (assuming it contains columns with the names SEX, WEIGHT, and BMI):
dffunc <- function(dataframe) {
ifelse(dataframe$SEX == 1,
9270 * dataframe$WEIGHT / (6680 + 216 * dataframe$BMI),
9270 * dataframe$WEIGHT / (8780 + dataframe$BMI))
}
or as you originally formatted it, but adding the SEX parameter:
func <- function(WEIGHT, BMI, SEX) {
ifelse(SEX == 1,
9270 * WEIGHT / (6680 + 216 * BMI),
9270 * WEIGHT / (8780 + BMI))
}