I want to recode several variables together. All these variables will undergo same recoding change.
For this, I followed the thread below. The thread below describes two ways of doing it.
1). Using column number
2). using variable names
I tried both but I get an error message.
Error message for 1) and 2).
Error in (function (var, recodes, as.factor, as.numeric = TRUE, levels) :
unused arguments (2 = "1", 3 = "1", 1 = "0", 4 = "0", na.rm = TRUE)
recode variable in loop R
#Uploading libraries
library(dplyr)
library(magrittr)
library(plyr)
library(readxl)
library(tidyverse)
#Importing file
mydata <- read_excel("CCorr_Data.xlsx")
df <- data.frame(mydata)
attach(df)
#replacing codes for variables
df %>%
mutate_at(c(1:7), recode, '2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(c(15:24), recode, '2'='0', na.rm = TRUE)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0', na.rm = TRUE)
Can someone tell me where am I going wrong?
In my dataset there are missing values that's why I have included na.rm = T. I even tried without including the missing value command, the error message was the same even then.
Please see below for sample data.
structure(list(Country = c(1, 1, 1, 1, 1, 1), HHID = c("12ae5148e245079f-122042",
"12ae5148e245079f-123032", "12ae5148e245079f-123027", "12ae5148e245079f-123028",
"12ae5148e245079f-N123001", "12ae5148e245079f-123041"), HHCode = c("122042",
"123032", "123027", "123028", "N123001", "123041"), A103 = c(2,
2, 2, 2, 2, 2), A104 = c("22", "23", "23", "23", "23", "23"),
Community = c("Mehmada", "Dhobgama", "Dhobgama", "Dhobgama",
"Dhobgama", "Dhobgama"), E301 = c(3, 3, 3, 3, 3, 3), E302 = c(3,
2, 4, 4, 3, 3), E303 = c(3, 2, 3, 3, 3, 3), E304 = c(3, 4,
4, 4, 3, 3), E305 = c(3, 2, 3, 3, 3, 3), E306 = c(3, 3, 3,
3, 3, 3), E307 = c(3, 3, 3, 3, 3, 3), E308 = c(3, 1, 3, 3,
3, 3), B201.1 = c(NA, 1, 1, 1, 1, 1), B202.1 = c(NA, 1, 1,
1, 1, 1), B203.1 = c(NA, 1, 1, 2, 2, 1), B204.1 = c(NA, 2,
1, 2, 1, 1), B205.1 = c(NA, 2, 1, 2, 2, 2), B206.1 = c(NA,
1, 1, 1, 2, 1), B207.1 = c(NA, 2, 1, 2, 2, 1), B208.1 = c(NA,
2, 2, 2, 2, 2), B209.1 = c(NA, 2, 1, 1, 1, 1), B210.1 = c(NA,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
```
The issue is with in the na.rm = TRUE, recode doesn't have that argument
library(dplyr)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0')
Try using :
library(dplyr)
df %>%
mutate_at(1:7, recode, '2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(15:24, recode, '2'='0')
Related
I have a data frame in R with 6 categories Pearson1, Spearman1, Kendall1, Pearson2, Spearman2, and Kendall2 and I have 6 variables X1, X2, X3, X4, X5 and X6. In each category I have the ranking of the variables from highest to lowest, for example X1 appear as the least significant in all categories (6 placement).
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
I want to create an alluvial diagram with the variables that goes from one step to the other. I want in the first column (step) to have the variables and then seeing the ranking it the 6 steps. My final result looks like this but only black and white with different textures for each variable if thats possible.
I have tried the following but it's not working
df_long <- reshape2::melt(df, id.vars = "Variable")
alluvial(df_long, col = "Variable", freq = "value",
group = "Variable", border = "white",
hide = c("Variable"))
Using the first example from the documentation as a code template, and adding a "freq" column to the sample df, makes this chart. No reshaping required.
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
df$freq<-1
alluvial(df[1:7], freq=df$freq, cex = 0.7)
Reverse vertical order of furst column:
alluvial(df[1:7], freq=df$freq,
cex = 0.7,
ordering = list(
order(df$Variable, decreasing=TRUE),
NULL,
NULL,
NULL,
NULL,
NULL,
NULL
)
)
Problem:
I'm trying to create tabulate tables of a continuous variable of a weight census. I want to separate the mean of this continuous variable by a categorical variable c("Pobre Extremo", "Pobre No Extremo"). So my desired table looks something like this (numbers are random):
This a table from a past year, so when i run my syntax for a new year i run into this problem table:
I had verified if its a problem of the labelled data but it seems to be fine that side.
Syntax so far for the tables:
data %>%
as_label() %>%
as_survey_design(weight = fac500a) %>%
tbl_strata2(
strata = pobreza,
~ .x %>%
tbl_svysummary(
by = ocupinf,
include = ing_cap,
missing = "no",
statistic = list(all_continuous() ~ "{mean}"),
label = list(ing_cap = .y)
) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
add_overall(., col_label = "Total"),
.combine_with = "tbl_stack",
.combine_args = list(group_header = NULL)
) %>%
modify_table_body(
~ .x %>%
mutate(variable = "pobreza", row_type = "level") %>%
tibble::add_row(
row_type = "label",
variable = "pobreza",
label = "pobreza",
.before = 1L
)
) %>%
modify_column_indent(columns = label, rows = row_type == "level") %>%
bold_labels() %>%
modify_footnote(all_stat_cols() ~ "ing_cap: Mean")
This is my data:
I run function dput():
structure(list(ing_cap = structure(c(3153.9033203125, 3153.9033203125,
3153.9033203125, 3153.9033203125, 2420.76844618056, 1920.38439941406,
1920.38439941406, 2773.28385416667, 3264.26846590909, 3264.26846590909,
4211.30403645833, 3283.44856770833, 3481.44609375, 3481.44609375,
6730.44587053571, 6730.44587053571, 6730.44587053571, 3571.64485677083,
3571.64485677083, 6990.048828125, 6990.048828125, 6990.048828125,
6374, 6374, 6374, 3532.215625, 5201.203125, 5201.203125, 4516.51395089286,
4516.51395089286, 10841.1783854167, 10841.1783854167, 10841.1783854167,
6081.54609375, 6081.54609375, 6081.54609375, 6081.54609375, 3669.76139322917,
3669.76139322917, 3669.76139322917, 3460.72778320312, 5076.026953125,
2791.3478515625, 5264.654296875, 5264.654296875, 3697.99633789062,
3697.99633789062, 5197.804296875, 4063.18391927083, 3975.13313802083,
3975.13313802083, 6428.6640625, 6428.6640625, 5685.87834821429,
5685.87834821429, 5685.87834821429, 5685.87834821429, 5737.10049715909,
5737.10049715909, 4644, 4644, 4644, 5124.6, 5124.6, 1830.73010253906,
3699.47631835938, 8673.4296875, 8673.4296875, 8673.4296875, 3629.81884765625,
3629.81884765625, 6726.15364583333, 6726.15364583333, 6726.15364583333,
5125.9521484375, 5125.9521484375, 7991.56591796875, 7991.56591796875,
8089.87926136364, 8089.87926136364, 8089.87926136364, 8089.87926136364,
2730.0802734375, 13985.8271484375, 13985.8271484375, 13985.8271484375,
13985.8271484375, 5944.7998046875, 5944.7998046875, 5944.7998046875,
2476.12651909722, 2476.12651909722, 2476.12651909722, 2476.12651909722,
13624, 7012.70654296875, 7012.70654296875, 7012.70654296875,
7012.70654296875, 6648.0015625), label = "Ingreso per cápita en el hogar", class = c("haven_labelled_spss",
"haven_labelled", "vctrs_vctr", "double")), fac500a = structure(c(354.4443359375,
269.111877441406, 467.653961181641, 467.653961181641, 345.380615234375,
1201.30834960938, 1262.73962402344, 1383.26965332031, 1191.63061523438,
935.718688964844, 769.666625976562, 1235.62524414062, 391.513061523438,
341.510711669922, 391.513061523438, 287.484558105469, 287.484558105469,
334.339538574219, 291.639129638672, 262.072875976562, 251.327713012695,
251.327713012695, 347.6591796875, 229.84504699707, 255.283050537109,
221.039138793945, 258.418426513672, 329.677368164062, 217.956893920898,
242.079177856445, 420.243377685547, 536.125610351562, 467.653961181641,
258.418426513672, 247.823104858398, 242.079177856445, 247.823104858398,
536.125610351562, 403.013153076172, 393.672302246094, 334.339538574219,
345.244873046875, 506.461639404297, 376.046264648438, 417.665008544922,
178.606018066406, 218.673873901367, 334.339538574219, 167.823974609375,
267.186492919922, 316.787017822266, 446.065185546875, 294.903411865234,
287.572387695312, 329.677368164062, 247.823104858398, 258.418426513672,
404.189147949219, 404.189147949219, 267.218231201172, 404.189147949219,
352.567840576172, 216.117523193359, 323.650573730469, 287.572387695312,
329.677368164062, 329.677368164062, 287.572387695312, 217.956893920898,
329.677368164062, 287.572387695312, 287.572387695312, 329.677368164062,
242.079177856445, 334.339538574219, 291.639129638672, 262.072875976562,
245.502563476562, 221.039138793945, 291.639129638672, 245.502563476562,
245.502563476562, 334.339538574219, 205.848175048828, 234.055160522461,
228.630340576172, 311.361968994141, 205.848175048828, 234.055160522461,
271.596160888672, 221.039138793945, 245.502563476562, 251.327713012695,
251.327713012695, 258.418426513672, 217.956893920898, 258.418426513672,
242.079177856445, 329.677368164062, 355.819549560547), label = "Factor de Expansión de Empleo/Ingresos proyecciones CPV-2007", format.spss = "F8.2"),
ocupinf = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2,
1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1,
1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1,
1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2), label = "Situación de informalidad (ocup.principal)", format.spss = "F8.2", labels = c(`empleo informal` = 1,
`empleo formal` = 2), class = c("haven_labelled", "vctrs_vctr",
"double")), pobreza = structure(c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), label = "pobreza", format.spss = "F8.2", labels = c(`Pobre Extremo` = 1,
`Pobre No Extremo` = 2, `No Pobre` = 3), class = c("haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -100L), class = c("tbl_df",
"tbl", "data.frame"))
Let me know if you spot any insights.
As #DanielD.Sjoberg specified in the comments, it was necessary to explicit that every variable in the rows was a continuous variable. So add type = everything() ~ "continuous" to the code:
data %>%
as_label() %>%
as_survey_design(weight = fac500a) %>%
tbl_strata2(
strata = pobreza,
~ .x %>%
tbl_svysummary(
by = ocupinf,
include = ing_cap,
missing = "no",
type = everything() ~ "continuous",
statistic = list(all_continuous() ~ "{mean}"),
label = list(ing_cap = .y)
) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
add_overall(., col_label = "Total"),
.combine_with = "tbl_stack",
.combine_args = list(group_header = NULL)
) %>%
modify_table_body(
~ .x %>%
mutate(variable = "pobreza", row_type = "level") %>%
tibble::add_row(
row_type = "label",
variable = "pobreza",
label = "pobreza",
.before = 1L
)
) %>%
modify_column_indent(columns = label, rows = row_type == "level") %>%
bold_labels() %>%
modify_footnote(all_stat_cols() ~ "ing_cap: Mean")
I am trying to perform a joint model analysis with simulated data. I believe I have formatted the data properly, but I receive this error:
"Error in jointModel(lmeFitJ, coxFit, timeVar = "time.point") :
sample sizes in the longitudinal and event processes differ; maybe you forgot the cluster() argument."
I only see this mentioned in the source code for JM and in one brief and unresolved troubleshooting thread. Where have I messed up? Thank you for any help!
Minimal complete example with first 4 participants:
#required packages
library(readxl, nlme, JM)
#long_data
structure(list(particip.id = c(1, 1, 1, 1, 2, 2, 3, 4, 4, 4,
4), time.point = c(1, 2, 3, 4, 1, 2, 1, 1, 2, 3, 4), school4me = c("DPU",
"DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU",
"DPU"), hours.a = c(3, 3, 2, 3, 0, 0, 6, 10, 13, 16, 15), hours.b = c(4,
6, 0, 0, 0, 1, 3, 7, 15, 9, 10), enrolled = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1), TimeQ = c(4, 4, 4, 4, 2.9369807105977, 2.9369807105977,
1.50240888306871, 4, 4, 4, 4)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
#short_data
structure(list(particip.id = c(1, 2, 3, 4), time.point = c(3,
2, 3, 4), school4me = c("DPU", "DPU", "DPU", "DPU"), enrolled = c(0,
0, 0, 1), TimeQ = c(2.376576055, 1.152660467, 2.300307851, 4),
actual = c(1, 1, 1, 0)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
#Analysis
lmeFitJ <- lme(hours.a ~ time.point + time.point:school4me, data=long_data, random = ~time.point | particip.id)
coxFit <- coxph(Surv(TimeQ, actual) ~ school4me, data = short_data, x = TRUE)
fitJOINT <- jointModel(lmeFitJ, coxFit, timeVar = "time.point")
#analysis produces: "Error in jointModel(lmeFitJ, coxFit, timeVar = "time.point") : sample sizes in
#the longitudinal and event processes differ; maybe you forgot the cluster() argument."
In the source code you can find
if (is.null(survObject$model))
stop("\nplease refit the Cox model including in the ",
"call to coxph() the argument 'model = TRUE'.")
and
nT <- length(unique(idT))
if (LongFormat && is.null(survObject$model$cluster))
stop("\nuse argument 'model = TRUE' and cluster() in coxph().")
Unfortunately the longitudinal process warning is occurring first so you don't see them.
("sample sizes in the longitudinal and event processes differ; ",
"maybe you forgot the cluster() argument.\n")
Try adding model = TRUE and cluster(particip.id) to your coxFit i.e.
coxFit <- coxph(Surv(TimeQ, actual) ~ school4me + cluster(particip.id), data = short_data, x = TRUE, model = TRUE)
Though this problem has been 'solved' many times, it turns out there's always another problem.
Without the print function it runs with no errors, but with it I get the following:
Error in .subset2(x, i) : recursive indexing failed at level 2
Which I'm taking to mean it doesn't like graphs being created in two layers of iteration? Changing the method to 'qplot(whatever:whatever)' has the exact same problem.
It's designed to print a graph for every pairing of the variables I'm looking at. There's too many for them to fit in a singular picture, such as for the pairs function, and I need to be able to see the actual variable names in the axes.
load("Transport_Survey.RData")
variables <- select(Transport, "InfOfReceievingWeather", "InfOfReceievingTraffic", "InfOfSeeingTraffic", "InfWeather.Ice", "InfWeather.Rain", "InfWeather.Wind", "InfWeather.Storm", "InfWeather.Snow", "InfWeather.Cold", "InfWeather.Warm", "InfWeather.DarkMorn", "InfWeather.DarkEve", "HomeParking", "WorkParking", "Disability", "Age", "CommuteFlexibility", "Gender", "PassionReduceCongest")
varnames <- list("InfOfReceivingWeather", "InfOfReceivingTraffic", "InfOfSeeingTraffic", "InfWeather.Ice", "InfWeather.Rain", "InfWeather.Wind", "InfWeather.Storm", "InfWeather.Snow", "InfWeather.Cold", "InfWeather.Warm", "InfWeather.DarkMorn", "InfWeather.DarkEve", "HomeParking", "WorkParking", "Disability", "Age", "CommuteFlexibility", "Gender", "PassionReduceCongest")
counterx = 1
countery = 1
for (a in variables) {
for (b in variables) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]],
xlab=varnames[counterx], ylab=varnames[countery]))+
geom_point())
countery = countery+1
counterx = counterx+1
}
}
#variables2 <- select(Transport, one_of(InfOfReceivingWeather, InfOfReceivingTraffic, InfOfSeeingTraffic, InfWeather.Ice, InfWeather.Rain, InfWeather.Wind, InfWeather.Storm, InfWeather.Snow, InfWeather.Cold, InfWeather.Warm, InfWeather.DarkMorn, InfWeather.DarkEve, HomeParking, WorkParking, Disability, Age, CommuteFlexibility, Gender, PassionReduceCongest))
Here is a mini-data frame for reference, sampled from the columns I'm using:
structure(list(InfOfReceievingWeather = c(1, 1, 1, 1, 4), InfOfReceievingTraffic = c(1,
1, 1, 1, 4), InfOfSeeingTraffic = c(1, 1, 1, 1, 4), InfWeather.Ice = c(3,
1, 3, 5, 5), InfWeather.Rain = c(1, 1, 2, 2, 4), InfWeather.Wind = c(1,
1, 2, 2, 4), InfWeather.Storm = c(1, 1, 1, 2, 5), InfWeather.Snow = c(1,
1, 2, 5, 5), InfWeather.Cold = c(1, 1, 1, 2, 5), InfWeather.Warm = c(1,
1, 1, 1, 3), InfWeather.DarkMorn = c(1, 1, 1, 1, 1), InfWeather.DarkEve = c(1,
1, 1, 1, 1), HomeParking = c(1, 1, 3, 1, 1), WorkParking = c(1,
4, 4, 5, 4), Disability = c(1, 1, 1, 1, 1), Age = c(19, 45, 35,
40, 58), CommuteFlexibility = c(2, 1, 5, 1, 2), Gender = c(2,
2, 2, 2, 1), PassionReduceCongest = c(0, 0, 2, 0, 2)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
You get an error in the assignment of your a and b. Basically, when defining a and b in variables, they become the vector of values contained in columns of variables. Thus, in your aes mapping, when you are calling variables[[a]], basically, you are writing (for the first iteration of a in variables):
variables[[c(1, 1, 1, 1, 4)]] instead of variables[["InfOfReceievingWeather"]]. So, it can't work.
To get over this issue, you have to either choose between:
for (a in variables) {
for (b in variables) {
print(ggplot(variables, mapping=aes(x=a, y=b)) ...
or
for (a in 1:ncol(variables)) {
for (b in 1:ncol(variables)) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]])) ...
Despite the first one seems to be simpler, I will rather prefere the second option because it will allow you to recycle a and b as column indicator to extract colnames of variables for xlab and ylab.
At the end, writing something like this should work:
for (a in 1:ncol(variables)) {
for (b in 1:ncol(variables)) {
print(ggplot(variables, mapping=aes(x=variables[[a]], y=variables[[b]])) +
xlab(colnames(variables)[a])+
ylab(colnames(variables)[b])+
geom_point())
}
}
Does it answer your question ?
I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)