Multistep alluvial diagram in R - r

I have a data frame in R with 6 categories Pearson1, Spearman1, Kendall1, Pearson2, Spearman2, and Kendall2 and I have 6 variables X1, X2, X3, X4, X5 and X6. In each category I have the ranking of the variables from highest to lowest, for example X1 appear as the least significant in all categories (6 placement).
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
I want to create an alluvial diagram with the variables that goes from one step to the other. I want in the first column (step) to have the variables and then seeing the ranking it the 6 steps. My final result looks like this but only black and white with different textures for each variable if thats possible.
I have tried the following but it's not working
df_long <- reshape2::melt(df, id.vars = "Variable")
alluvial(df_long, col = "Variable", freq = "value",
group = "Variable", border = "white",
hide = c("Variable"))

Using the first example from the documentation as a code template, and adding a "freq" column to the sample df, makes this chart. No reshaping required.
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
df$freq<-1
alluvial(df[1:7], freq=df$freq, cex = 0.7)
Reverse vertical order of furst column:
alluvial(df[1:7], freq=df$freq,
cex = 0.7,
ordering = list(
order(df$Variable, decreasing=TRUE),
NULL,
NULL,
NULL,
NULL,
NULL,
NULL
)
)

Related

Is there a way, i can order the axis on a melted ggplot? [duplicate]

This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
How do you specifically order ggplot2 x axis instead of alphabetical order? [duplicate]
(2 answers)
ggplot2, Ordering y axis
(1 answer)
R ggplot ordering bars within groups
(1 answer)
Closed 6 months ago.
I have a Problem with a Plot I want to order, but it seems like it cant be.
install.packages("reshape2")
library(reshape2)
install.packages("ggplot2")
library(ggplot2)
df <- createRegressionTable(data,colname)
gg <- melt(df, id = "colname")
return(
ggplot(gg, aes(
x = colname, y = variable, fill = value
)) +
geom_tile(show.legend = FALSE) +
geom_text(aes(label = value), alpha = 0.6) +
scale_fill_gradient(low = "#D5E8D4", high = "#F8CECC") +
labs(
x = "Regressant",
y = "Regressor"
) +
theme(legend.key = element_blank())
)
I know the function createRegressionTable is a black box but this is the result:
list(colname = c("zielrichtungU", "zielrichtungO",
"imitationU", "imitationO", "steuerungU", "steuerungO", "neuheitU",
"neuheitO", "netzwerkU", "netzwerkO"), zielrichtungU = c(5, 1,
5, 1, 3, 4, 1, 1, 1, 1), zielrichtungO = c(1, 5, 1, 5, 1, 5,
3, 5, 1, 1), imitationU = c(5, 1, 5, 5, 1, 5, 1, 1, 4, 1), imitationO = c(1,
5, 5, 5, 1, 1, 5, 5, 5, 5), steuerungU = c(3, 1, 1, 1, 5, 5,
1, 2, 1, 1), steuerungO = c(4, 5, 5, 1, 5, 5, 3, 5, 1, 3), neuheitU = c(1,
3, 1, 5, 1, 3, 5, 5, 1, 1), neuheitO = c(1, 5, 1, 5, 2, 5, 5,
5, 1, 1), netzwerkU = c(1, 1, 4, 5, 1, 1, 1, 1, 5, 5), netzwerkO = c(1,
1, 1, 5, 1, 3, 1, 1, 5, 5))
I tested whether the output of melt is scrambled, but it seems to be ordered, as I wished, and now I don't know where the problem lies
And here is the Plot, that I'd love to order:

recoding variables in a loop in R

I want to recode several variables together. All these variables will undergo same recoding change.
For this, I followed the thread below. The thread below describes two ways of doing it.
1). Using column number
2). using variable names
I tried both but I get an error message.
Error message for 1) and 2).
Error in (function (var, recodes, as.factor, as.numeric = TRUE, levels) :
unused arguments (2 = "1", 3 = "1", 1 = "0", 4 = "0", na.rm = TRUE)
recode variable in loop R
#Uploading libraries
library(dplyr)
library(magrittr)
library(plyr)
library(readxl)
library(tidyverse)
#Importing file
mydata <- read_excel("CCorr_Data.xlsx")
df <- data.frame(mydata)
attach(df)
#replacing codes for variables
df %>%
mutate_at(c(1:7), recode, '2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(c(15:24), recode, '2'='0', na.rm = TRUE)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0', na.rm = TRUE)
Can someone tell me where am I going wrong?
In my dataset there are missing values that's why I have included na.rm = T. I even tried without including the missing value command, the error message was the same even then.
Please see below for sample data.
structure(list(Country = c(1, 1, 1, 1, 1, 1), HHID = c("12ae5148e245079f-122042",
"12ae5148e245079f-123032", "12ae5148e245079f-123027", "12ae5148e245079f-123028",
"12ae5148e245079f-N123001", "12ae5148e245079f-123041"), HHCode = c("122042",
"123032", "123027", "123028", "N123001", "123041"), A103 = c(2,
2, 2, 2, 2, 2), A104 = c("22", "23", "23", "23", "23", "23"),
Community = c("Mehmada", "Dhobgama", "Dhobgama", "Dhobgama",
"Dhobgama", "Dhobgama"), E301 = c(3, 3, 3, 3, 3, 3), E302 = c(3,
2, 4, 4, 3, 3), E303 = c(3, 2, 3, 3, 3, 3), E304 = c(3, 4,
4, 4, 3, 3), E305 = c(3, 2, 3, 3, 3, 3), E306 = c(3, 3, 3,
3, 3, 3), E307 = c(3, 3, 3, 3, 3, 3), E308 = c(3, 1, 3, 3,
3, 3), B201.1 = c(NA, 1, 1, 1, 1, 1), B202.1 = c(NA, 1, 1,
1, 1, 1), B203.1 = c(NA, 1, 1, 2, 2, 1), B204.1 = c(NA, 2,
1, 2, 1, 1), B205.1 = c(NA, 2, 1, 2, 2, 2), B206.1 = c(NA,
1, 1, 1, 2, 1), B207.1 = c(NA, 2, 1, 2, 2, 1), B208.1 = c(NA,
2, 2, 2, 2, 2), B209.1 = c(NA, 2, 1, 1, 1, 1), B210.1 = c(NA,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
```
The issue is with in the na.rm = TRUE, recode doesn't have that argument
library(dplyr)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0')
Try using :
library(dplyr)
df %>%
mutate_at(1:7, recode, '2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(15:24, recode, '2'='0')

"sample sizes in the longitudinal and event processes differ" in JointModel in r

I am trying to perform a joint model analysis with simulated data. I believe I have formatted the data properly, but I receive this error:
"Error in jointModel(lmeFitJ, coxFit, timeVar = "time.point") :
sample sizes in the longitudinal and event processes differ; maybe you forgot the cluster() argument."
I only see this mentioned in the source code for JM and in one brief and unresolved troubleshooting thread. Where have I messed up? Thank you for any help!
Minimal complete example with first 4 participants:
#required packages
library(readxl, nlme, JM)
#long_data
structure(list(particip.id = c(1, 1, 1, 1, 2, 2, 3, 4, 4, 4,
4), time.point = c(1, 2, 3, 4, 1, 2, 1, 1, 2, 3, 4), school4me = c("DPU",
"DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU", "DPU",
"DPU"), hours.a = c(3, 3, 2, 3, 0, 0, 6, 10, 13, 16, 15), hours.b = c(4,
6, 0, 0, 0, 1, 3, 7, 15, 9, 10), enrolled = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1), TimeQ = c(4, 4, 4, 4, 2.9369807105977, 2.9369807105977,
1.50240888306871, 4, 4, 4, 4)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
#short_data
structure(list(particip.id = c(1, 2, 3, 4), time.point = c(3,
2, 3, 4), school4me = c("DPU", "DPU", "DPU", "DPU"), enrolled = c(0,
0, 0, 1), TimeQ = c(2.376576055, 1.152660467, 2.300307851, 4),
actual = c(1, 1, 1, 0)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
#Analysis
lmeFitJ <- lme(hours.a ~ time.point + time.point:school4me, data=long_data, random = ~time.point | particip.id)
coxFit <- coxph(Surv(TimeQ, actual) ~ school4me, data = short_data, x = TRUE)
fitJOINT <- jointModel(lmeFitJ, coxFit, timeVar = "time.point")
#analysis produces: "Error in jointModel(lmeFitJ, coxFit, timeVar = "time.point") : sample sizes in
#the longitudinal and event processes differ; maybe you forgot the cluster() argument."
In the source code you can find
if (is.null(survObject$model))
stop("\nplease refit the Cox model including in the ",
"call to coxph() the argument 'model = TRUE'.")
and
nT <- length(unique(idT))
if (LongFormat && is.null(survObject$model$cluster))
stop("\nuse argument 'model = TRUE' and cluster() in coxph().")
Unfortunately the longitudinal process warning is occurring first so you don't see them.
("sample sizes in the longitudinal and event processes differ; ",
"maybe you forgot the cluster() argument.\n")
Try adding model = TRUE and cluster(particip.id) to your coxFit i.e.
coxFit <- coxph(Surv(TimeQ, actual) ~ school4me + cluster(particip.id), data = short_data, x = TRUE, model = TRUE)

Make boxplots of columns in R

I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)

Is there a simple way of pairing unique data points in a data frame?

I want to extract pairs of data from a data frame, where they are paired with data that is not in their own column. Each number in column 1 is paired with all the numbers to the right of that column. Likewise numbers in column 2 are only paired with numbers in columns 3 or above.
I have created a script that does it using a bird's nest of 'for' loops but I feel there should be a more elegant way to do it.
Example data:
structure(list(A = 1:3, B = 4:6, C = 7:9), .Names = c("A", "B",
"C"), class = "data.frame", row.names = c(NA, -3L))
Desired output:
structure(list(X1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6), X2 = c(4, 5, 6, 7,
8, 9, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, 7,
8, 9)), .Names = c("X1", "X2"), row.names = c(NA, 27L), class = "data.frame")
Here's an approach using data.table package and its very efficient CJ and rbindlist functions (assuming your data set called df)
library(data.table)
res <- rbindlist(lapply(seq_len(length(df) - 1),
function(i) CJ(df[, i], unlist(df[, -(seq_len(i))]))))
You could then set your column names by reference (if you insist on "X1" and "X2") using setnames
setnames(res, 1:2, c("X1", "X2"))
You can also convert back to data.frame by reference (if you want to match your desired output "exactly") by using setDF()
setDF(res)
Here df is the input dataset
out1 <- do.call(rbind,lapply(1:(ncol(df)-1), function(i) {
x1 <- df[,i:(ncol(df))]
Un1 <-unique(unlist(x1[,-1]))
data.frame(X1=rep(x1[,1], each=length(Un1)), X2= Un1)}))
all.equal(out, out1) #if `out` is the expected output
#[1] TRUE
Another approach:
res <- do.call(rbind, unlist(lapply(seq(ncol(dat) - 1), function(x)
lapply(seq(x + 1, ncol(dat)), function(y)
"names<-"(expand.grid(dat[c(x, y)]), c("X1", "X2")))),
recursive = FALSE))
where dat is the name of your data frame.
You can sort the result with this command:
res[order(res[[1]], res[[2]]), ]

Resources