creating kendall correlation matrix - r

i have data that looks like this :
in total 38 columns .
data code sample :
df <- structure(
list(
Christensenellaceae = c(
0.010484508,
0.008641566,
0.010017172,
0.010741488,
0.1,
0.2,
0.3,
0.4,
0.7,
0.8,
0.9,
0.1,
0.3,
0.45,
0.5,
0.55
),
Date=c(27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28),
Treatment = c(
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 2"
)
),class = "data.frame",
row.names = c(NA,-9L)
)
whay i wish to do is to create kendall correlation matrix (the data doesnt have linear behavor) between the treatment types(10 in total but 2 in example)for every column (except treatment and date) so in total 36 correlation matrix with size 1010 (here will be 22) .
this is my code:
res2 <- cor(as.matrix(data),method ="kendall")
but i get the error:
Error in cor(data, method = "kendall") : 'x' must be numeric
is there any way to solve this ? thank you:)

You can do that using a tidyverse approach by first making some data wrangling and then using correlate to calculate the correlation in pairs for every combination of variables.
library(corrr)
library(tidyverse)
df |>
# Transform data into wide format
pivot_wider(id_cols = Date,
names_from = Treatment,
values_from = -starts_with(c("Treatment", "Date"))) |>
# Unnest lists inside each column
unnest(cols = starts_with("Treatment")) |>
# Remove Date from the columns
select(-Date) |>
# Correlate all columns using kendall
correlate(method = "kendall")
# A tibble: 2 x 3
# term `Treatment 1` `Treatment 2`
# <chr> <dbl> <dbl>
#1 Treatment 1 NA 0.546
#2 Treatment 2 0.546 NA

Related

How to italicize some words in a sentence in axis text in R

I would like to italicize a part of a term in axis text (not title) in R ggplot2.
I have some bacterial species names that I should write in italic and besides I have the strain name that should be in plain text.
Here is an example of what I have:
My data frame looks like this
MyDF <- data.frame(Activity=rep(c("Activity 1", "Activity 2"), each = 3),
Bacteria = c(sample(c("Escherichia coli Strain 1", "Escherichia coli Strain 2"), 3, TRUE, prob = c(0.3, 0.7)),
sample(c("Escherichia coli Strain 1", "Escherichia coli Strain 2"), 3, TRUE, prob = c(0.5, 0.5))))
MyDF
Activity Bacteria
1 Activity 1 Escherichia coli Strain 2
2 Activity 1 Escherichia coli Strain 2
3 Activity 1 Escherichia coli Strain 1
4 Activity 2 Escherichia coli Strain 1
5 Activity 2 Escherichia coli Strain 2
6 Activity 2 Escherichia coli Strain 1
And the code used to generate the plot is:
MyPlot <- ggplot(data = MyDF, mapping = aes(x =Activity , y =Bacteria )) +
xlab(label = "Activities") +
ylab(label = "Strains") +
theme(axis.text.y = element_text(face = "italic", size = 10, family = "serif"))
MyPlot
So my question is how to make "Escherichia coli" in italic and keep "Strain 1" in plain text.
Any help is really appreciated.
Best,
Najoua
You could use scale_y_discrete with expression and italic like this:
MyDF <- data.frame(Activity=rep(c("Activity 1", "Activity 2"), each = 3),
Bacteria = c(sample(c("Escherichia coli Strain 1", "Escherichia coli Strain 2"), 3, TRUE, prob = c(0.3, 0.7)),
sample(c("Escherichia coli Strain 1", "Escherichia coli Strain 2"), 3, TRUE, prob = c(0.5, 0.5))))
library(ggplot2)
MyPlot <- ggplot(data = MyDF, mapping = aes(x =Activity , y =Bacteria )) +
xlab(label = "Activities") +
ylab(label = "Strains") +
scale_y_discrete('Strains', labels = expression(~italic("Escherichia coli")~'Strain 1', ~italic("Escherichia coli")~'Strain 2'))
MyPlot
Created on 2022-10-12 with reprex v2.0.2

table1() Output Labeling all Data as "Missing"

I am trying to make a descriptive statistics table in R and my code functions properly (producing a table) but despite the fact that I have no missing values in my dataset, the table outputs all of my values as missing. I am still a novice in R, so I do not have a broad enough knowledge base to troubleshoot.
My code:
data <- read_excel("Data.xlsx")
data$stage <-
factor(data$stage, levels=c(1,2,3,4,5,6,7),
labels =c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
data$primary_language <-factor(data$primary_language, levels=c(1,2), labels = c("Spanish", "English"))
data$status_zipcode <- factor(data$status_zipcode, levels = (1:3), labels = c("Minority", "Majority", "Diverse"))
data$status_censusblock <- factor(data$status_censusblock, levels = c(0:2), labels = c("Minority", "Majority", "Diverse"))
data$self_identity <- factor(data$self_identity, levels = c(0:1), labels = c("Hispanic/Latina","White/Caucasian"))
data$subjective_identity <- factor(data$subjective_identity, levels = c(0,1,2,4), labels = c("Hispanic/Latina", "White/Caucasian", "Multiracial", "Asian"))
label (data$stage)<- "Stage at Diagnosis"
label(data$age) <- "Age"
label(data$primary_language) <- "Primary language"
label(data$status_zipcode)<- "Demographic Status in Zipcode Area"
label(data$status_censusblock)<- "Demographic Status in Census Block Group"
label(data$self_identity) <- "Self-Identified Racial/Ethnic Group"
label(data$subjective_identity)<- "Racial/Ethnic Group as Identified by Others"
table1(~ stage +age + primary_language + status_zipcode + status_censusblock + self_identity + subjective_identity| primary_language, data=data)
Table output:
enter image description here
Data set:
enter image description here
When I run the data set the values are there. It actually worked for me when I re-did the spacing:
data$stage <- factor(data$stage,
levels = c(1,2,3,4,5,6,7),
labels = c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
When I did it exactly as you typed it came up with NA's, too. Try the first and see if it works for you that way. Then check the spacing for the others. That may be all it is.
I do end up with one NA on the stage column because 0 is not defined in your levels.
Edit: Ran the rest so here are some other points.
You end up with an NA in stage because one of your values is 0 but it's not defined with a label
You end up with NA's in language because you have a 0 and a 1 but you define it as 1, 2. So you'd need to change to the values. You end up with NA's in other portions because of the :
Change your code to this and you should have the values you need except that initial 0 in "stage":
data$stage <- factor(data$stage,
levels=c(1,2,3,4,5,6,7),
labels =c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
data$primary_language <-factor(data$primary_language,
levels=c(0,1),
labels = c("Spanish", "English"))
data$status_zipcode <- factor(data$status_zipcode,
levels = c(0,1,2),
labels = c("Minority", "Majority", "Diverse"))
data$status_censusblock <- factor(data$status_censusblock,
levels = c(0,1,2),
labels = c("Minority", "Majority", "Diverse"))
data$self_identity <- factor(data$self_identity,
levels = c(0,1),
labels = c("Hispanic/Latina","White/Caucasian"))
data$subjective_identity <- factor(data$subjective_identity,
levels = c(0,1,2,4),
labels = c("Hispanic/Latina", "White/Caucasian", "Multiracial", "Asian"))
enter image description here

Meta analysis: transform forest plot output to percentage

I am a new R user. I am trying to transform proportions to percentages on a forest plot I have generated using metaprop.
I have looked here Quick question about transforming proportions to percentages - forest function in R and at the link this post refers to.
mytransf = function(x)
(x) * 100
studies <- c("Study 1", "Study 2", "Study 3")
obs <- c(104, 101,79670)
denom <- c(1146, 2613, 147766)
m1 <- metaprop(obs, denom, studies, comb.random=FALSE,
byseparator=": ")
forest(m1, print.tau2 = FALSE, col.by="black", text.fixed = "Total number of events",
text.fixed.w = "Subtotal", rightcols = c("effect","ci"),
leftlabs=c("Study","Events","Total"),
xlim=c(0,0.7),
transf=mytransf)
The output is remains as proportions, not as percentages. I tried "atransf" as well. Is anyone able to please help me with this? This is what I can generate currently: picture of output
You can use the pscale option of metaprop:
library(meta)
studies <- c("Study 1", "Study 2", "Study 3")
obs <- c(104, 101,79670)
denom <- c(1146, 2613, 147766)
m1 <- metaprop(obs, denom, studies, comb.random=FALSE,
byseparator=": ",
pscale=100)
forest(m1, print.tau2 = FALSE, col.by="black",
text.fixed = "Total number of events",
text.fixed.w = "Subtotal",
rightlabs = c("Prop. (%)","[95% CI]"),
leftlabs=c("Study","Events","Total"),
xlim=c(0,70))

How to change label fontsize on circlize chordDiagram in R

I have the following matrix:
> circ_mat
N chr Y N LG1 N LG2 N PA N chr X N other
chr X 1546 128758 109464 71862 6926164 524087
PA 17415 140985 190831 7156005 145783 953412
chr 2 73977 157666 6588917 151092 137082 1027603
chr 1 17258 4552095 1414285 184986 70962 541434
chr Y 39822 921 12621 1688 4811 39199
And produce the chordDiagram as follows:
circos.clear()
circos.par(start.degree = 90, clock.wise = FALSE)
chordDiagram(circ_mat, annotationTrack = c("name", "grid"),
order = c("chr Y", "chr X", "chr 1", "chr 2", "PA", "N other", "N PA", "N LG2", "N LG1", "N chr X", "N chr Y"))
Producing the attached diagram. My question is: how can I make all the labels "chr X", "chr Y", etc bigger?
You can specify par settings before calling chordDiagram
par(cex = 3, mar = c(0, 0, 0, 0))

using duplicate factor to plot using ggplot2

I am trying to plot a ggplot_dumbbell with the following code:
library(ggplot2)
library(ggalt)
theme_set(theme_classic())
df_senPhi <- structure(list(phi = c(0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 0.9, 1), W = c(7833625.7334, 8291583.0188, 8762978.0131,
8169317.158, 8460793.8918, 8765222.8718, 8266025.5499, 8311199.2075,
8265304.816, 8289392.5799, 8273733.0523, 8284554.5615), Type = c("A, B, C",
"A, B, C", "A, B, C", "D, E", "D, E", "D, E", "F, G", "F, G",
"H, I", "H, I", "I, J", "I, J"), pChange = c(-0.0533144181552553,
0.00202924695507283, 0.0589968453118437, -0.0127464560859453,
0.0224782062508261, 0.0592681341679742, -0.00105934677399903,
0.00439984310620854, -0.00114644672167306, 0.00176453467558519,
-0.000127903066776307, 0.00117986514708678)), class = "data.frame", row.names = c(NA,
-12L), .Names = c("phi", "W", "Type", "pChange"))
df_senPhi$phi <- factor(df_senPhi$phi, levels=as.character(df_senPhi$phi)) # for right ordering of the dumbells
gg <- ggplot(df_senPhi, aes(x=0, xend=pChange, y=phi, color = Type)) +
geom_dumbbell(#colour="#a3c4dc",
size=0.75,
colour_xend="#0e668b") +
scale_x_continuous(label=scales::percent)
plot(gg)
If you run this code, you will get a warning saying "duplicate levels in factors are deprecated".
If you look closely in the df_senPhi you can see 12 records. However while plotting, only 11 records are plotted. Also the 10th and the 11th records have the same phi value in the data frame which are associated in to the same level. That is also causing the overlapping of the two phi bars in the plot (probably that's why I'm seeing only 11 dumbbells).
I want all 12 records to be plotted such that the second 0.9 phi's dumbbell appears just above the first just like they were two different values.
Is there a way to achieve this ?
used a bit of dplyr
but it seems to get what you are looking for
df_senPhi %>%
mutate(row = 1:n()) %>%
ggplot(aes(0, row, color = Type)) +
geom_dumbbell(aes(xend = pChange)) +
scale_y_continuous(labels = factor(df_senPhi$phi),
breaks = 1:12)

Resources