R Corrplot Text Spacing - r

Is there any way to remove the spaces between dashes? For example, removing the white space between "IL - 1alpha"? I'm not sure if there's any argument I could add in order to do this. Thanks in advance!
Here is a random dataset I created along with the code I use:
dat <- data.frame("Eotaxin" = sample(100, size = 66, replace = TRUE), "GRO-alpha" = sample(100, size = 66, replace = TRUE),
"IL-1alpha" = sample(100, size = 66, replace = TRUE), "IL-ra" = sample(100, size = 66, replace = TRUE),
"IL-8" = sample(100, size = 66, replace = TRUE), "IP-10" = sample(100, size = 66, replace = TRUE),
"MIP-1beta" = sample(100, size = 66, replace = TRUE),
"SDF-1alpha" = sample(100, size = 66, replace = TRUE))
library('corrplot')
matrix_cor <- cor(dat)
colnames(matrix_cor) <- c("Eotaxin",
":GRO*alpha",
":IL-1*alpha",
"IL-ra",
"IL-8",
"IP-10",
":MIP-1*beta",
":SDF-1*alpha")
rownames(matrix_cor) <- c("Eotaxin",
":GRO*alpha",
":IL-1*alpha",
"IL-ra",
"IL-8",
"IP-10",
":MIP-1*beta",
":SDF-1*alpha")
corrplot(matrix_cor, type = "upper",tl.col="black", tl.cex = 1)

Related

Loop inside a function, how to store function output to an existing dataframe

My goal is to run linear regressions with my defined equation, and then store the model residuals to my original dataset.
library(tidyverse)
library(stringr)
set.seed(5)
df <- data.frame(
id = c(1:100),
age = sample(20:80, 100, replace = TRUE),
sex = sample(c("M", "F"), 100, replace = TRUE, prob = c(0.7, 0.3)),
type = sample(letters[1:4], 100, replace = TRUE),
bmi = sample(15:35, 100, replace = TRUE),
sbp = sample(75:160, 100, replace = TRUE),
cat_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
cat_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
cat_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
cat_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55)),
dog_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
dog_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
dog_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
dog_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55))
)
outcome = colnames(df)[str_detect(colnames(df), "outcome")]
test_function = function(vars_dep, vars_indep, input_data){
for (z in vars_dep) {
formula = as.formula(paste0(z, " ~ ", vars_indep))
model = lm(formula, data = input_data, na.action = na.exclude)
# Take the residual from each model, create a new col with the suffix '.res'
input_data[, paste0(z, ".res")] = residuals(model)
}
}
Like shown above, I would like to save the residuals and give them a suffix depending on which y I use in the model, and finally save these residuals as columns in my original dataframe df. So I expected to see cat_outcome1.res, cat_outcome2.res as new columns but they were not saved in df. Any suggestions are greatly appreciated!
This function gives you what you want:
test_function <- function(vars_dep, vars_indep, input_data){
for (z in vars_dep) {
formula = as.formula(paste0(z, " ~ ", vars_indep))
model = lm(formula, data = input_data, na.action = na.exclude)
# Take the residual from each model, create a new col with the suffix '.res'
input_data[[paste0(z, ".res")]] <- residuals(model)
}
return(input_data)
}

For loop list of tibbles R

I want to create a list of random tibbles using a for loop. I have a large data set where I will need to apply functions to lists of tibbles and create lists of tibbles as the outputs. I understand there might be better ways to do this and would also appreciate hearing those but am trying to wrap my head around how for loops work.
I can create a list of random tibbles with each tibble in the list named:
tibble_random1 <- tibble(Number = sample((1:100), 10, replace = TRUE),
Letter = sample((LETTERS), 10, replace = TRUE),
Logical = sample(c("True", "False"), 10, replace = TRUE))
tibble_random2 <- tibble(Number = sample((1:100), 10, replace = TRUE),
Letter = sample((LETTERS), 10, replace = TRUE),
Logical = sample(c("True", "False"), 10, replace = TRUE))
tibble_random3 <- tibble(Number = sample((1:100), 10, replace = TRUE),
Letter = sample((LETTERS), 10, replace = TRUE),
Logical = sample(c("True", "False"), 10, replace = TRUE))
tibble_random <- list(tibble1 = tibble_random1,
tibble2 = tibble_random2,
tibble3 = tibble_random3)
I cannot figure out how to do this with a for loop or if a for loop is completely inappropriate for this.
Thanks.
Initialise a list and fill 1 tibble in every iteration using for loop.
tibble_random <- vector('list', 3)
for(i in seq_along(tibble_random)) {
tibble_random[[i]] <- tibble(Number = sample((1:100), 10, replace = TRUE),
Letter = sample((LETTERS), 10, replace = TRUE),
Logical = sample(c("True", "False"), 10, replace = TRUE))
}
You can also use replicate or lapply to do this without for loop.
tibble_random <- replicate(3, tibble(Number = sample((1:100), 10, replace = TRUE),
Letter = sample((LETTERS), 10, replace = TRUE),
Logical = sample(c("True", "False"), 10, replace = TRUE)), simplify = FALSE)
To assign the names of the list you can use :
names(tibble_random) <- paste0('tibble', seq_along(tibble_random))

Arguments must have same length when using tapply

data.frame(q1 = sample(c(1, 5), 200, replace = T, prob = c(1/2, 1/2)),
gender = sample(c("M", "F"), 200, replace = T, prob = c(2/3, 1/3))
) %>% tapply(.$q1,list(.$gender),FUN=sum)
I just want to use tapply to sum by gender, but got error as below:
Error in tapply(., .$q1, list(.$gender), FUN = sum) :
arguments must have same length
Where's the problem?
For the sum example, you can use data.table syntax:
library(data.table)
df <- data.frame(q1 = sample(c(1, 5), 200, replace = T, prob = c(1/2, 1/2)),
gender = sample(c("M", "F"), 200, replace = T, prob = c(2/3, 1/3)))
as.data.table(df)[, sum(q1), by = gender]
This will also work with a function that has multiple return values, unlike my previous example with summarize:
as.data.table(df)[, shapiro.test(q1), by = gender]

A lm() dynamic function - R

Let's assume I have this dataframe:
N <- 50
df <- data.frame(
LA1 = sample(1:10, size = N, replace = TRUE),
LA2 = sample(1:10, size = N, replace = TRUE),
LA3 = sample(1:10, size = N, replace = TRUE),
LA4 = sample(1:10, size = N, replace = TRUE),
LA5 = sample(1:10, size = N, replace = TRUE),
LA6 = sample(1:10, size = N, replace = TRUE),
LA7 = sample(1:10, size = N, replace = TRUE),
LA8 = sample(1:10, size = N, replace = TRUE),
LAY = sample(1:10, size = N, replace = TRUE),
UF1 = sample(1:10, size = N, replace = TRUE),
UF2 = sample(1:10, size = N, replace = TRUE),
UF3 = sample(1:10, size = N, replace = TRUE),
UF4 = sample(1:10, size = N, replace = TRUE),
UF5 = sample(1:10, size = N, replace = TRUE),
UF6 = sample(1:10, size = N, replace = TRUE),
UFY = sample(1:10, size = N, replace = TRUE),
EK1 = sample(1:10, size = N, replace = TRUE),
EK2 = sample(1:10, size = N, replace = TRUE),
EK3 = sample(1:10, size = N, replace = TRUE),
EK4 = sample(1:10, size = N, replace = TRUE),
EK5 = sample(1:10, size = N, replace = TRUE),
EK6 = sample(1:10, size = N, replace = TRUE),
EK7 = sample(1:10, size = N, replace = TRUE),
EK8 = sample(1:10, size = N, replace = TRUE),
EK9 = sample(1:10, size = N, replace = TRUE),
EK10 = sample(1:10, size = N, replace = TRUE),
EK11 = sample(1:10, size = N, replace = TRUE),
EK12 = sample(1:10, size = N, replace = TRUE),
EKY = sample(1:10, size = N, replace = TRUE),
Z1 = sample(1:10, size = N, replace = TRUE),
Z2 = sample(1:10, size = N, replace = TRUE),
Z3 = sample(1:10, size = N, replace = TRUE)
)
Where I want to compute this models:
m1=lm(formula = LAY ~ LA1+LA2+LA3+LA4+LA5+LA6+LA7+LA8, data = df)
m11=step(m1,direction="both")
m2=lm(formula = UFY ~ UF1+UF2+UF3+UF4+UF5+UF6,data = df)
m22=step(m2,direction="both")
m3=lm(formula = EKY ~ EK1+EK2+EK3+EK4+EK5+EK6+EK7+EK8+EK9+EK10+EK11+EK12, data = df)
m33=step(m3,direction="both")
m8=lm(formula = Z1 ~ LAY+UFY+EKY, data = df)
m88=step(m8,direction="both")
m9=lm(formula = Z2 ~ LAY+UFY+EKY, data = df)
m99=step(m9,direction="both")
m10=lm(formula = Z3 ~ LAY+UFY+EKY, data = df)
m100=step(m10,direction="both")
As you can see, if the dimensionality of the database increases (increasing the number of LA, UF, or EK independent variables) I will have to modify manually the input for the models). So, I'm looking for a way to:
Given a certain quantity of independent variables (could be 5, 10, 30 or more) for a given category (LA, UF, and EK), the input for the model changes automatically.
Even I have found different syntax to compute the models (like X*Z = [(X+Z)^3]), I can't find a way to make this computation more dynamic.
Considerations:
The number of independent variables (LA, UF, EK) can change.
The number of dependent variables (LAY, UFY, EKY) never changes.
From the output of this models is extracted the coefficient vector (just in case this one).

Sort a vector where the largest is at the center in r

I know this is a simple question, but I have searched everywhere and I am pretty sure that there is no answer to my question.
I want to sort a vector where the largest is in the middle and goes to to tails when the values go down.
For example:
c( 20, 30, 40, 50, 60)
I want to have:
c(20, 40, 60, 50, 30 ) or c(30, 50, 60, 40, 20 )
It does not matter.
Can anyone offer me a quick solution?
Thanks!
This is much easier to solve if you assume that you have 2n (n is a natural number) distinct observations. Here is one solution:
ints = sample.int(100, size = 30, replace = FALSE)
ints_o = ints[order(ints)]
ints_tent = c(ints_o[seq.int(from = 1, to = (length(ints) - 1), by = 2)],
rev(ints_o[seq.int(from = 2, to = length(ints), by = 2)]))
Edit:
Here is function that deals with both cases:
makeTent = function(ints) {
ints_o = ints[order(ints)]
if((length(ints) %% 2) == 0) {
# even number of observations
ints_tent = c(ints_o[seq.int(from = 1, to = (length(ints) - 1), by = 2)],
rev(ints_o[seq.int(from = 2, to = length(ints), by = 2)]))
} else {
# odd number of observations
ints_tent = c(ints_o[seq.int(from = 2, to = (length(ints) - 1), by = 2)],
rev(ints_o[seq.int(from = 1, to = length(ints), by = 2)]))
}
return(ints_tent)
}
# test the function
ints_even = sample.int(100, size = 30, replace = FALSE)
ints_odd = sample.int(100, size = 31, replace = FALSE)
makeTent(ints_odd)
makeTent(ints_even)

Resources