recoding variables in a loop in R - r

I want to recode several variables together. All these variables will undergo same recoding change.
For this, I followed the thread below. The thread below describes two ways of doing it.
1). Using column number
2). using variable names
I tried both but I get an error message.
Error message for 1) and 2).
Error in (function (var, recodes, as.factor, as.numeric = TRUE, levels) :
unused arguments (2 = "1", 3 = "1", 1 = "0", 4 = "0", na.rm = TRUE)
recode variable in loop R
#Uploading libraries
library(dplyr)
library(magrittr)
library(plyr)
library(readxl)
library(tidyverse)
#Importing file
mydata <- read_excel("CCorr_Data.xlsx")
df <- data.frame(mydata)
attach(df)
#replacing codes for variables
df %>%
mutate_at(c(1:7), recode, '2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(c(15:24), recode, '2'='0', na.rm = TRUE)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0', na.rm = TRUE)
Can someone tell me where am I going wrong?
In my dataset there are missing values that's why I have included na.rm = T. I even tried without including the missing value command, the error message was the same even then.
Please see below for sample data.
structure(list(Country = c(1, 1, 1, 1, 1, 1), HHID = c("12ae5148e245079f-122042",
"12ae5148e245079f-123032", "12ae5148e245079f-123027", "12ae5148e245079f-123028",
"12ae5148e245079f-N123001", "12ae5148e245079f-123041"), HHCode = c("122042",
"123032", "123027", "123028", "N123001", "123041"), A103 = c(2,
2, 2, 2, 2, 2), A104 = c("22", "23", "23", "23", "23", "23"),
Community = c("Mehmada", "Dhobgama", "Dhobgama", "Dhobgama",
"Dhobgama", "Dhobgama"), E301 = c(3, 3, 3, 3, 3, 3), E302 = c(3,
2, 4, 4, 3, 3), E303 = c(3, 2, 3, 3, 3, 3), E304 = c(3, 4,
4, 4, 3, 3), E305 = c(3, 2, 3, 3, 3, 3), E306 = c(3, 3, 3,
3, 3, 3), E307 = c(3, 3, 3, 3, 3, 3), E308 = c(3, 1, 3, 3,
3, 3), B201.1 = c(NA, 1, 1, 1, 1, 1), B202.1 = c(NA, 1, 1,
1, 1, 1), B203.1 = c(NA, 1, 1, 2, 2, 1), B204.1 = c(NA, 2,
1, 2, 1, 1), B205.1 = c(NA, 2, 1, 2, 2, 2), B206.1 = c(NA,
1, 1, 1, 2, 1), B207.1 = c(NA, 2, 1, 2, 2, 1), B208.1 = c(NA,
2, 2, 2, 2, 2), B209.1 = c(NA, 2, 1, 1, 1, 1), B210.1 = c(NA,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
```

The issue is with in the na.rm = TRUE, recode doesn't have that argument
library(dplyr)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0')

Try using :
library(dplyr)
df %>%
mutate_at(1:7, recode, '2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(15:24, recode, '2'='0')

Related

Multistep alluvial diagram in R

I have a data frame in R with 6 categories Pearson1, Spearman1, Kendall1, Pearson2, Spearman2, and Kendall2 and I have 6 variables X1, X2, X3, X4, X5 and X6. In each category I have the ranking of the variables from highest to lowest, for example X1 appear as the least significant in all categories (6 placement).
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
I want to create an alluvial diagram with the variables that goes from one step to the other. I want in the first column (step) to have the variables and then seeing the ranking it the 6 steps. My final result looks like this but only black and white with different textures for each variable if thats possible.
I have tried the following but it's not working
df_long <- reshape2::melt(df, id.vars = "Variable")
alluvial(df_long, col = "Variable", freq = "value",
group = "Variable", border = "white",
hide = c("Variable"))
Using the first example from the documentation as a code template, and adding a "freq" column to the sample df, makes this chart. No reshaping required.
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
df$freq<-1
alluvial(df[1:7], freq=df$freq, cex = 0.7)
Reverse vertical order of furst column:
alluvial(df[1:7], freq=df$freq,
cex = 0.7,
ordering = list(
order(df$Variable, decreasing=TRUE),
NULL,
NULL,
NULL,
NULL,
NULL,
NULL
)
)

Problem with gtsummary tbl_stack and tbl_svysummary for continuous weight variables

Problem:
I'm trying to create tabulate tables of a continuous variable of a weight census. I want to separate the mean of this continuous variable by a categorical variable c("Pobre Extremo", "Pobre No Extremo"). So my desired table looks something like this (numbers are random):
This a table from a past year, so when i run my syntax for a new year i run into this problem table:
I had verified if its a problem of the labelled data but it seems to be fine that side.
Syntax so far for the tables:
data %>%
as_label() %>%
as_survey_design(weight = fac500a) %>%
tbl_strata2(
strata = pobreza,
~ .x %>%
tbl_svysummary(
by = ocupinf,
include = ing_cap,
missing = "no",
statistic = list(all_continuous() ~ "{mean}"),
label = list(ing_cap = .y)
) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
add_overall(., col_label = "Total"),
.combine_with = "tbl_stack",
.combine_args = list(group_header = NULL)
) %>%
modify_table_body(
~ .x %>%
mutate(variable = "pobreza", row_type = "level") %>%
tibble::add_row(
row_type = "label",
variable = "pobreza",
label = "pobreza",
.before = 1L
)
) %>%
modify_column_indent(columns = label, rows = row_type == "level") %>%
bold_labels() %>%
modify_footnote(all_stat_cols() ~ "ing_cap: Mean")
This is my data:
I run function dput():
structure(list(ing_cap = structure(c(3153.9033203125, 3153.9033203125,
3153.9033203125, 3153.9033203125, 2420.76844618056, 1920.38439941406,
1920.38439941406, 2773.28385416667, 3264.26846590909, 3264.26846590909,
4211.30403645833, 3283.44856770833, 3481.44609375, 3481.44609375,
6730.44587053571, 6730.44587053571, 6730.44587053571, 3571.64485677083,
3571.64485677083, 6990.048828125, 6990.048828125, 6990.048828125,
6374, 6374, 6374, 3532.215625, 5201.203125, 5201.203125, 4516.51395089286,
4516.51395089286, 10841.1783854167, 10841.1783854167, 10841.1783854167,
6081.54609375, 6081.54609375, 6081.54609375, 6081.54609375, 3669.76139322917,
3669.76139322917, 3669.76139322917, 3460.72778320312, 5076.026953125,
2791.3478515625, 5264.654296875, 5264.654296875, 3697.99633789062,
3697.99633789062, 5197.804296875, 4063.18391927083, 3975.13313802083,
3975.13313802083, 6428.6640625, 6428.6640625, 5685.87834821429,
5685.87834821429, 5685.87834821429, 5685.87834821429, 5737.10049715909,
5737.10049715909, 4644, 4644, 4644, 5124.6, 5124.6, 1830.73010253906,
3699.47631835938, 8673.4296875, 8673.4296875, 8673.4296875, 3629.81884765625,
3629.81884765625, 6726.15364583333, 6726.15364583333, 6726.15364583333,
5125.9521484375, 5125.9521484375, 7991.56591796875, 7991.56591796875,
8089.87926136364, 8089.87926136364, 8089.87926136364, 8089.87926136364,
2730.0802734375, 13985.8271484375, 13985.8271484375, 13985.8271484375,
13985.8271484375, 5944.7998046875, 5944.7998046875, 5944.7998046875,
2476.12651909722, 2476.12651909722, 2476.12651909722, 2476.12651909722,
13624, 7012.70654296875, 7012.70654296875, 7012.70654296875,
7012.70654296875, 6648.0015625), label = "Ingreso per cápita en el hogar", class = c("haven_labelled_spss",
"haven_labelled", "vctrs_vctr", "double")), fac500a = structure(c(354.4443359375,
269.111877441406, 467.653961181641, 467.653961181641, 345.380615234375,
1201.30834960938, 1262.73962402344, 1383.26965332031, 1191.63061523438,
935.718688964844, 769.666625976562, 1235.62524414062, 391.513061523438,
341.510711669922, 391.513061523438, 287.484558105469, 287.484558105469,
334.339538574219, 291.639129638672, 262.072875976562, 251.327713012695,
251.327713012695, 347.6591796875, 229.84504699707, 255.283050537109,
221.039138793945, 258.418426513672, 329.677368164062, 217.956893920898,
242.079177856445, 420.243377685547, 536.125610351562, 467.653961181641,
258.418426513672, 247.823104858398, 242.079177856445, 247.823104858398,
536.125610351562, 403.013153076172, 393.672302246094, 334.339538574219,
345.244873046875, 506.461639404297, 376.046264648438, 417.665008544922,
178.606018066406, 218.673873901367, 334.339538574219, 167.823974609375,
267.186492919922, 316.787017822266, 446.065185546875, 294.903411865234,
287.572387695312, 329.677368164062, 247.823104858398, 258.418426513672,
404.189147949219, 404.189147949219, 267.218231201172, 404.189147949219,
352.567840576172, 216.117523193359, 323.650573730469, 287.572387695312,
329.677368164062, 329.677368164062, 287.572387695312, 217.956893920898,
329.677368164062, 287.572387695312, 287.572387695312, 329.677368164062,
242.079177856445, 334.339538574219, 291.639129638672, 262.072875976562,
245.502563476562, 221.039138793945, 291.639129638672, 245.502563476562,
245.502563476562, 334.339538574219, 205.848175048828, 234.055160522461,
228.630340576172, 311.361968994141, 205.848175048828, 234.055160522461,
271.596160888672, 221.039138793945, 245.502563476562, 251.327713012695,
251.327713012695, 258.418426513672, 217.956893920898, 258.418426513672,
242.079177856445, 329.677368164062, 355.819549560547), label = "Factor de Expansión de Empleo/Ingresos proyecciones CPV-2007", format.spss = "F8.2"),
ocupinf = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2,
1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1,
1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1,
1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2), label = "Situación de informalidad (ocup.principal)", format.spss = "F8.2", labels = c(`empleo informal` = 1,
`empleo formal` = 2), class = c("haven_labelled", "vctrs_vctr",
"double")), pobreza = structure(c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), label = "pobreza", format.spss = "F8.2", labels = c(`Pobre Extremo` = 1,
`Pobre No Extremo` = 2, `No Pobre` = 3), class = c("haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -100L), class = c("tbl_df",
"tbl", "data.frame"))
Let me know if you spot any insights.
As #DanielD.Sjoberg specified in the comments, it was necessary to explicit that every variable in the rows was a continuous variable. So add type = everything() ~ "continuous" to the code:
data %>%
as_label() %>%
as_survey_design(weight = fac500a) %>%
tbl_strata2(
strata = pobreza,
~ .x %>%
tbl_svysummary(
by = ocupinf,
include = ing_cap,
missing = "no",
type = everything() ~ "continuous",
statistic = list(all_continuous() ~ "{mean}"),
label = list(ing_cap = .y)
) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
add_overall(., col_label = "Total"),
.combine_with = "tbl_stack",
.combine_args = list(group_header = NULL)
) %>%
modify_table_body(
~ .x %>%
mutate(variable = "pobreza", row_type = "level") %>%
tibble::add_row(
row_type = "label",
variable = "pobreza",
label = "pobreza",
.before = 1L
)
) %>%
modify_column_indent(columns = label, rows = row_type == "level") %>%
bold_labels() %>%
modify_footnote(all_stat_cols() ~ "ing_cap: Mean")

"sample sizes in the longitudinal and event processes differ" in JointModel in r

I am trying to perform a joint model analysis with simulated data. I believe I have formatted the data properly, but I receive this error:
"Error in jointModel(lmeFitJ, coxFit, timeVar = "time.point") :
sample sizes in the longitudinal and event processes differ; maybe you forgot the cluster() argument."
I only see this mentioned in the source code for JM and in one brief and unresolved troubleshooting thread. Where have I messed up? Thank you for any help!
Minimal complete example with first 4 participants:
#required packages
library(readxl, nlme, JM)
#long_data
structure(list(particip.id = c(1, 1, 1, 1, 2, 2, 3, 4, 4, 4,
4), time.point = c(1, 2, 3, 4, 1, 2, 1, 1, 2, 3, 4), school4me = c("DPU",
"DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU",
"DPU"), hours.a = c(3, 3, 2, 3, 0, 0, 6, 10, 13, 16, 15), hours.b = c(4,
6, 0, 0, 0, 1, 3, 7, 15, 9, 10), enrolled = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1), TimeQ = c(4, 4, 4, 4, 2.9369807105977, 2.9369807105977,
1.50240888306871, 4, 4, 4, 4)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
#short_data
structure(list(particip.id = c(1, 2, 3, 4), time.point = c(3,
2, 3, 4), school4me = c("DPU", "DPU", "DPU", "DPU"), enrolled = c(0,
0, 0, 1), TimeQ = c(2.376576055, 1.152660467, 2.300307851, 4),
actual = c(1, 1, 1, 0)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
#Analysis
lmeFitJ <- lme(hours.a ~ time.point + time.point:school4me, data=long_data, random = ~time.point | particip.id)
coxFit <- coxph(Surv(TimeQ, actual) ~ school4me, data = short_data, x = TRUE)
fitJOINT <- jointModel(lmeFitJ, coxFit, timeVar = "time.point")
#analysis produces: "Error in jointModel(lmeFitJ, coxFit, timeVar = "time.point") : sample sizes in
#the longitudinal and event processes differ; maybe you forgot the cluster() argument."
In the source code you can find
if (is.null(survObject$model))
stop("\nplease refit the Cox model including in the ",
"call to coxph() the argument 'model = TRUE'.")
and
nT <- length(unique(idT))
if (LongFormat && is.null(survObject$model$cluster))
stop("\nuse argument 'model = TRUE' and cluster() in coxph().")
Unfortunately the longitudinal process warning is occurring first so you don't see them.
("sample sizes in the longitudinal and event processes differ; ",
"maybe you forgot the cluster() argument.\n")
Try adding model = TRUE and cluster(particip.id) to your coxFit i.e.
coxFit <- coxph(Surv(TimeQ, actual) ~ school4me + cluster(particip.id), data = short_data, x = TRUE, model = TRUE)

How to plot graphs through two loops

Though this problem has been 'solved' many times, it turns out there's always another problem.
Without the print function it runs with no errors, but with it I get the following:
Error in .subset2(x, i) : recursive indexing failed at level 2
Which I'm taking to mean it doesn't like graphs being created in two layers of iteration? Changing the method to 'qplot(whatever:whatever)' has the exact same problem.
It's designed to print a graph for every pairing of the variables I'm looking at. There's too many for them to fit in a singular picture, such as for the pairs function, and I need to be able to see the actual variable names in the axes.
load("Transport_Survey.RData")
variables <- select(Transport, "InfOfReceievingWeather", "InfOfReceievingTraffic", "InfOfSeeingTraffic", "InfWeather.Ice", "InfWeather.Rain", "InfWeather.Wind", "InfWeather.Storm", "InfWeather.Snow", "InfWeather.Cold", "InfWeather.Warm", "InfWeather.DarkMorn", "InfWeather.DarkEve", "HomeParking", "WorkParking", "Disability", "Age", "CommuteFlexibility", "Gender", "PassionReduceCongest")
varnames <- list("InfOfReceivingWeather", "InfOfReceivingTraffic", "InfOfSeeingTraffic", "InfWeather.Ice", "InfWeather.Rain", "InfWeather.Wind", "InfWeather.Storm", "InfWeather.Snow", "InfWeather.Cold", "InfWeather.Warm", "InfWeather.DarkMorn", "InfWeather.DarkEve", "HomeParking", "WorkParking", "Disability", "Age", "CommuteFlexibility", "Gender", "PassionReduceCongest")
counterx = 1
countery = 1
for (a in variables) {
for (b in variables) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]],
xlab=varnames[counterx], ylab=varnames[countery]))+
geom_point())
countery = countery+1
counterx = counterx+1
}
}
#variables2 <- select(Transport, one_of(InfOfReceivingWeather, InfOfReceivingTraffic, InfOfSeeingTraffic, InfWeather.Ice, InfWeather.Rain, InfWeather.Wind, InfWeather.Storm, InfWeather.Snow, InfWeather.Cold, InfWeather.Warm, InfWeather.DarkMorn, InfWeather.DarkEve, HomeParking, WorkParking, Disability, Age, CommuteFlexibility, Gender, PassionReduceCongest))
Here is a mini-data frame for reference, sampled from the columns I'm using:
structure(list(InfOfReceievingWeather = c(1, 1, 1, 1, 4), InfOfReceievingTraffic = c(1,
1, 1, 1, 4), InfOfSeeingTraffic = c(1, 1, 1, 1, 4), InfWeather.Ice = c(3,
1, 3, 5, 5), InfWeather.Rain = c(1, 1, 2, 2, 4), InfWeather.Wind = c(1,
1, 2, 2, 4), InfWeather.Storm = c(1, 1, 1, 2, 5), InfWeather.Snow = c(1,
1, 2, 5, 5), InfWeather.Cold = c(1, 1, 1, 2, 5), InfWeather.Warm = c(1,
1, 1, 1, 3), InfWeather.DarkMorn = c(1, 1, 1, 1, 1), InfWeather.DarkEve = c(1,
1, 1, 1, 1), HomeParking = c(1, 1, 3, 1, 1), WorkParking = c(1,
4, 4, 5, 4), Disability = c(1, 1, 1, 1, 1), Age = c(19, 45, 35,
40, 58), CommuteFlexibility = c(2, 1, 5, 1, 2), Gender = c(2,
2, 2, 2, 1), PassionReduceCongest = c(0, 0, 2, 0, 2)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
You get an error in the assignment of your a and b. Basically, when defining a and b in variables, they become the vector of values contained in columns of variables. Thus, in your aes mapping, when you are calling variables[[a]], basically, you are writing (for the first iteration of a in variables):
variables[[c(1, 1, 1, 1, 4)]] instead of variables[["InfOfReceievingWeather"]]. So, it can't work.
To get over this issue, you have to either choose between:
for (a in variables) {
for (b in variables) {
print(ggplot(variables, mapping=aes(x=a, y=b)) ...
or
for (a in 1:ncol(variables)) {
for (b in 1:ncol(variables)) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]])) ...
Despite the first one seems to be simpler, I will rather prefere the second option because it will allow you to recycle a and b as column indicator to extract colnames of variables for xlab and ylab.
At the end, writing something like this should work:
for (a in 1:ncol(variables)) {
for (b in 1:ncol(variables)) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]])) +
xlab(colnames(variables)[a])+
ylab(colnames(variables)[b])+
geom_point())
}
}
Does it answer your question ?

Make boxplots of columns in R

I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)

Resources