How to rearrange the rows of a matrix in R - r

I have the following data frame:
> agg_2
# A tibble: 3 × 3
bcs default_flag pred_default
<chr> <dbl> <dbl>
1 high-score 0.00907 0.0121
2 low-score 0.0345 0.0353
3 mid-score 0.0210 0.0204
I plot it as a bar plot using the following code:
barplot(t(as.matrix(agg_2[,-1])),
main = "Actual Default vs Predicted Default",
xlab = "Score Category",
ylab = "Default Rate",
names.arg = c("High Score", "Low Score", "Mid Score"),
col = gray.colors(2),
beside = TRUE)
legend("topleft",
c("Default", "Pred. Default"),
fill = gray.colors(2))
and it gives me this:
How can I rearrange the data frame/matrix so that the pairs of bars in the bar plot are as follows: Low Score then Mid Score then High Score?

Here is one potential solution:
agg_2 <- read.table(text = "bcs default_flag pred_default
high-score 0.00907 0.0121
low-score 0.0345 0.0353
mid-score 0.0210 0.0204", header = TRUE)
agg_2$bcs <- factor(agg_2$bcs, levels = c("low-score", "mid-score", "high-score"), ordered = TRUE)
agg_2 <- agg_2[order(agg_2$bcs),]
barplot(t(as.matrix(agg_2[,-1])),
main = "Actual Default vs Predicted Default",
xlab = "Score Category",
ylab = "Default Rate",
names.arg = agg_2$bcs,
col = gray.colors(2),
beside = TRUE)
legend("topright",
c("Default", "Pred. Default"),
fill = gray.colors(2))
Created on 2022-06-21 by the reprex package (v2.0.1)

Related

Increasing font size of auto-generated R and p-value

This's a rather straightforward question where I'd like to increase the font size of the automatically generated R and p-value for my correlation plot via ggscatter. I've tried using cex but doesn't seems to work. Would appreciate any help on this, thanks.
My plot
1
My script
cpsbs <- read_csv("cpsbs.csv")
View(cpsbs)
psbs600 <-ggscatter(cpsbs, x = "npq600", y = "rd",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Max NPQ (600s)", ylab = "PsbS relative density")+ theme(text = element_text(size = 18))
My data drame
data.frame(cpsbs)
id gen line npq600 npq900 rd delcq
1 1 PsbS L1.1 3.053330 0.19666 1.2211420 4.862588
2 2 PsbS L1.2 3.133333 0.17000 1.5918041 5.470889
3 3 PsbS L1.3 2.756667 0.17000 2.1668718 4.773088
4 4 PsbS L1.4 3.160000 0.21000 2.6198157 3.809744
5 5 PsbS L1.5 3.306667 0.20700 1.5571007 4.169890
6 6 PsbS L1.6 0.480000 0.33000 0.0000000 0.000000
7 7 PsbS L1.7 2.960000 0.20000 1.0520551 4.485594
8 8 PsbS L1.8 2.946667 0.21000 0.4648043 3.900248
9 9 PsbS L1.9 2.986667 0.18000 1.9454836 3.782560
I think you're looking for the cor.coef.size argument:
ggscatter(cpsbs, x = "npq600", y = "rd",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
cor.coef.size = 10,
xlab = "Max NPQ (600s)", ylab = "PsbS relative density")+
theme(text = element_text(size = 18))

R loop xts plots

I am stuck on what is probably a simple problem: Loop on xts objects.
I would like to make four different plots for the elements in the basket: basket <- cbind(AAPLG, GEG, SPYG, WMTG)
> head(basket)
new.close new.close.1 new.close.2 new.close.3
2000-01-04 1.0000000 1.0000000 1.0000000 1.0000000
2000-01-05 1.0146341 0.9982639 1.0017889 0.9766755
2000-01-06 0.9268293 1.0115972 0.9856887 0.9903592
2000-01-07 0.9707317 1.0507639 1.0429338 1.0651532
2000-01-10 0.9536585 1.0503472 1.0465116 1.0457161
2000-01-11 0.9048780 1.0520833 1.0339893 1.0301664
This is my idea so far, as I cannot simply put in i as column name:
tickers <- c("AAPLG", "GEG", "SPYG", "WMTG")
par(mfrow=c(2,2))
for (i in 1:4){
print(plot(x = basket[, [i]], xlab = "Time", ylab = "Cumulative Return",
main = "Cumulative Returns", ylim = c(0.0, 3.5), major.ticks= "years",
minor.ticks = FALSE, col = "red"))
}
This is the error I get when running the script:
Error: unexpected ',' in " main = "Cumulative Returns","
> minor.ticks = FALSE, col = "red"))
Error: unexpected ',' in " minor.ticks = FALSE,"
> }
Error: unexpected '}' in "}"
Any help is very much appreciated.
As mentioned, remove the square brackets around i:
par(mfrow=c(2,2))
for (i in 1:4){
print(plot(x = basket[, i], xlab = "Time", ylab = "Cumulative Return",
main = "Cumulative Returns", ylim = c(0.0, 3.5), major.ticks= "years",
minor.ticks = FALSE, col = "red"))
}
But even better, assign names with cbind in building xts object or re-name your xts object like any data frame, then iterate across names for column referencing and titles:
Plot
# PASS NAMES WITH cbind
basket <- cbind(AAPLG=APPLG, GEG=GEG, SPYG=SPYG, WMTG=WMTG)
# RENAME AFTER cbind
# basket <- cbind(AAPLG, GEG, SPYG, WMTG)
# colnames(basket) <- c("AAPLG", "GEG", "SPYG", "WMTG")
par(mfrow=c(2,2))
sapply(names(basket), function(col)
print(plot(x = basket[, col], xlab = "Time", ylab = "Cumulative Return", data = basket,
main = paste(col, "Cumulative Returns"), ylim = c(0.0, 3.5),
major.ticks= "years", minor.ticks = FALSE, col = "red"))
)

Obtain Specific Column From Correlation Heatmap

I have a dataset called allDataNoNAs which has 19 columns for different variables.
First, using the packages:
library(corrplot)
library(corrgram)
library(GGally)
From dput(cor(allDataNoNAs) - my sample correlation
structure(c(1, 0.116349634765185, 0.547691763989625, 0.291991636906379,
0.52347996305183, 0.497643100595069, 0.0129815335193983, 0.418358158731718,
0.471373794854162, 0.505419557447448, 0.276128001065287, 0.114921357444725,
0.483335903285957, 0.0322484793148408, 0.360658177617753, 0.163989166178892,
0.145358618474009, 0.549222657694447, 0.0283182668409127, 0.116349634765185,
1, 0.542678597132992, 0.228195095236888, 0.341733815370385, 0.449234592784623,
0.040928188236085, 0.306532564182676, 0.246214540314882, 0.368735099181333,
0.0974107116463065, 0.118633970020044, 0.0663374870504325, 0.00324065971750887,
0.429993810524071, 0.0660128392326907, -0.208834964557656, 0.517351517191311,
0.00340750071414792, 0.547691763989625, 0.542678597132992, 1,
0.503509567685111, 0.834074832294578, 0.87458120333133, 0.11646402536793,
0.709723789822138, 0.545685105436571, 0.691116703644981, 0.251055925294139,
0.137145560677364, 0.677547477041307, 0.0138408591129587, 0.574449939471671,
0.289088705565296, -0.0151310469001056, 0.995636799856898, 0.00806307965229721,
0.291991636906379, 0.228195095236888, 0.503509567685111, 1, 0.5928306942291,
0.419860437848609, 0.202947501799892, 0.600369342626932, 0.3036531414462,
0.31218278418869, 0.0665676462597262, 0.0706549436236251, 0.463190217918095,
0.017439704947323, 0.20361820902537, 0.563054610829996, 0.367022482937022,
0.539278002253207, 0.0146950545295136, 0.52347996305183, 0.341733815370385,
0.834074832294578, 0.5928306942291, 1, 0.877884027429435, 0.249913906532112,
0.770346073267575, 0.581478562237408, 0.62684315599784, 0.158950811299692,
0.0709795609883571, 0.707727230043996, 0.0374999988906861, 0.36979003972634,
0.532230871495189, 0.237891979696682, 0.868052149324532, 0.0301272383779361,
0.497643100595069, 0.449234592784623, 0.87458120333133, 0.419860437848609,
0.877884027429435, 1, 0.0578337272432955, 0.625271696806798,
0.642882384190134, 0.742158234646655, 0.18412573265697, 0.0846354163480033,
0.636899685921357, 0.00136017420567482, 0.442530075276962, 0.166101818463978,
-0.122330359121607, 0.870582759035652, -0.00536057317986459,
0.0129815335193983, 0.040928188236085, 0.11646402536793, 0.202947501799892,
0.249913906532112, 0.0578337272432955, 1, 0.168170227241747,
0.0103942343836554, 0.0146416101891029, 0.0274638568337838, 0.0232209281980358,
0.438976017479895, 0.00664290788845518, 0.0558346558356874, 0.576321333713829,
0.205483416691572, 0.160939456560856, 0.00633413505889225, 0.418358158731718,
0.306532564182676, 0.709723789822138, 0.600369342626932, 0.770346073267575,
0.625271696806798, 0.168170227241747, 1, 0.421695218774506, 0.481156860252289,
0.109952341757847, 0.0400601095104961, 0.560225169205313, 0.0470119529030615,
0.311744196849895, 0.445382213345548, 0.237447342653341, 0.743416109744227,
0.0437634515476897, 0.471373794854162, 0.246214540314882, 0.545685105436571,
0.3036531414462, 0.581478562237408, 0.642882384190134, 0.0103942343836554,
0.421695218774506, 1, 0.809375500184827, 0.201944501698817, 0.098871956246993,
0.46496436444905, -0.00410066612855966, 0.34093890132072, 0.0955588133868073,
-0.0561387410393148, 0.542950578488189, -0.00611403179202383,
0.505419557447448, 0.368735099181333, 0.691116703644981, 0.31218278418869,
0.62684315599784, 0.742158234646655, 0.0146416101891029, 0.481156860252289,
0.809375500184827, 1, 0.166272569833104, 0.0642480288154233,
0.493094322495752, -0.0143825404077684, 0.420509020130084, 0.0763222806834054,
-0.137267266981321, 0.675599964220607, -0.0155210421858565, 0.276128001065287,
0.0974107116463065, 0.251055925294139, 0.0665676462597262, 0.158950811299692,
0.18412573265697, 0.0274638568337838, 0.109952341757847, 0.201944501698817,
0.166272569833104, 1, 0.803405447808051, 0.209386276142885, 0.019611871344881,
0.698294870666248, 0.024793538949468, 0.00921044459805193, 0.243573446480239,
0.0182042685108301, 0.114921357444725, 0.118633970020044, 0.137145560677364,
0.0706549436236251, 0.0709795609883571, 0.0846354163480033, 0.0232209281980358,
0.0400601095104961, 0.098871956246993, 0.0642480288154233, 0.803405447808051,
1, 0.0518698024423593, 0.0195654257050434, 0.534756730460756,
0.00851489725348713, -0.00157091125920201, 0.131294046914676,
0.0196406046872536, 0.483335903285957, 0.0663374870504325, 0.677547477041307,
0.463190217918095, 0.707727230043996, 0.636899685921357, 0.438976017479895,
0.560225169205313, 0.46496436444905, 0.493094322495752, 0.209386276142885,
0.0518698024423593, 1, 0.00595760440442105, 0.332127234258051,
0.402991372365854, 0.130619402830307, 0.702714128886842, 0.000759081836999778,
0.0322484793148408, 0.00324065971750887, 0.0138408591129587,
0.017439704947323, 0.0374999988906861, 0.00136017420567482, 0.00664290788845518,
0.0470119529030615, -0.00410066612855966, -0.0143825404077684,
0.019611871344881, 0.0195654257050434, 0.00595760440442105, 1,
0.0240839070381978, 0.0543455541899934, 0.121224926189405, 0.0181415673103803,
0.999560527964641, 0.360658177617753, 0.429993810524071, 0.574449939471671,
0.20361820902537, 0.36979003972634, 0.442530075276962, 0.0558346558356874,
0.311744196849895, 0.34093890132072, 0.420509020130084, 0.698294870666248,
0.534756730460756, 0.332127234258051, 0.0240839070381978, 1,
0.101917219961389, -0.0673808764564209, 0.55786516587572, 0.0226512629105265,
0.163989166178892, 0.0660128392326907, 0.289088705565296, 0.563054610829996,
0.532230871495189, 0.166101818463978, 0.576321333713829, 0.445382213345548,
0.0955588133868073, 0.0763222806834054, 0.024793538949468, 0.00851489725348713,
0.402991372365854, 0.0543455541899934, 0.101917219961389, 1,
0.562085375561417, 0.360237027957389, 0.0519977244267395, 0.145358618474009,
-0.208834964557656, -0.0151310469001056, 0.367022482937022, 0.237891979696682,
-0.122330359121607, 0.205483416691572, 0.237447342653341, -0.0561387410393148,
-0.137267266981321, 0.00921044459805193, -0.00157091125920201,
0.130619402830307, 0.121224926189405, -0.0673808764564209, 0.562085375561417,
1, 0.041068964081757, 0.119487910165712, 0.549222657694447, 0.517351517191311,
0.995636799856898, 0.539278002253207, 0.868052149324532, 0.870582759035652,
0.160939456560856, 0.743416109744227, 0.542950578488189, 0.675599964220607,
0.243573446480239, 0.131294046914676, 0.702714128886842, 0.0181415673103803,
0.55786516587572, 0.360237027957389, 0.041068964081757, 1, 0.0121897372730556,
0.0283182668409127, 0.00340750071414792, 0.00806307965229721,
0.0146950545295136, 0.0301272383779361, -0.00536057317986459,
0.00633413505889225, 0.0437634515476897, -0.00611403179202383,
-0.0155210421858565, 0.0182042685108301, 0.0196406046872536,
0.000759081836999778, 0.999560527964641, 0.0226512629105265,
0.0519977244267395, 0.119487910165712, 0.0121897372730556, 1), .Dim = c(19L,
19L), .Dimnames = list(c("RPE", "Duration", "Distance", "Max Speed",
"HML Distance", "HML Efforts", "Sprint Distance", "Sprints",
"Accelerations", "Decelerations", "Average Heart Rate", "Max Heart Rate",
"Average Metabolic Power", "Dynamic Stress Load", "Heart Rate Exertion",
"High Speed Running (Relative)", "HML Density", "Speed Intensity",
"Impacts"), c("RPE", "Duration", "Distance", "Max Speed", "HML Distance",
"HML Efforts", "Sprint Distance", "Sprints", "Accelerations",
"Decelerations", "Average Heart Rate", "Max Heart Rate", "Average Metabolic Power",
"Dynamic Stress Load", "Heart Rate Exertion", "High Speed Running (Relative)",
"HML Density", "Speed Intensity", "Impacts")))
Using the correlation data from above, I am trying to just obtain the first column where I see the correlation between RPE and all other 18 variables. I can do this by doing cor(allDataNoNAs)[,1] but then when I try and plot that as a correlogram using corrplot(corrgram(allDataNoNAs))[,1] it plots all 19x19 correlations and is a mess, when I just need the RPE correlation column.
Using ggcorr() as such:
ggcorr(allDataNoNAs, method = c("everything"), label = TRUE,label_size = 2, label_round = 4)
I obtain the cleaner looking heatmap that I want. But, switching the data parameter to allDataNoNAs[,1] or cor(allDataNoNAs)[,1] does not do the trick to only obtain that one RPE correlation column.
Is it possible to only return one column of a correlation heatmap?
I was able to figure out and answer my own question, though not exactly how I wanted it (wanted it from ggcorr() ), this version suffices, however:
With my same variable names as before
#x is the variable you want to be comparing the y variables with
myCorDF <- cor(x = allDataNoNAs$RPE, y = allDataNoNAs[2:19], use = "everything")
#just changing it to colors that seem better to me
col2 <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "white",
"yellow", "#FF7F00", "red", "#7F0000"))
#this is how I obtain the one column for RPE correlation against other all variables
corrplot(myCorDF, tl.srt = 45, method = "color", addCoef.col = "black",
cl.cex = 0.56, col = col2(50))
A generic code removing my colors would look like this:
corDF <- cor(x = DF$x, y = DF[2:5], use = "everything")
corrplot(corDF, tl.srt = 45, method = "color", addCoef.col = "black",
cl.cex = 0.56)

What is wrong with my custom colour palette in this plot?

Using ggsurvplot to draw some Kaplan-Meier curves.
5 curves should be plotted and I want control over their colours.
Here is the output of the survfit being plotted:
> elective_30Decadesurv
Call: survfit(formula = elective30Surv ~ electives$Decade)
n events median 0.95LCL 0.95UCL
electives$Decade=50 14 0 NA NA NA
electives$Decade=60 173 2 NA NA NA
electives$Decade=70 442 5 NA NA NA
electives$Decade=80 168 4 NA NA NA
electives$Decade=90 2 0 NA NA NA
Here is a working plot using the default colour palette, "hue":
> ggsurvplot(elective_30Decadesurv,
data = electives,
palette = "hue",
title = "30 day survival after elective EVAR",
legend = "none",
legend.title = "Decade",
legend.labs = c("5th",
"6th",
"7th",
"8th",
"9th"
),
censor.shape = 124,
ggtheme = survPlotTheme,
risk.table = "nrisk_cumevents",
risk.table.y.text.col = TRUE,
risk.table.fontsize = 3,
risk.table.height = 0.3,
break.time.by = 5,
ylim = c(0.95,
1
),
pval = TRUE,
pval.size = 3,
pval.coord = c(1,
0.96
)
)
See plot in section 3.1.4 of this webpage for the output of the above
The Decade group has 5 entries, so I'm trying to provide five colours to palette.
However, both:
> ggsurvplot(elective_30Decadesurv,
data = electives,
palette = c("#440154",
"#3B528B",
"#21908C",
"#5DC863",
"#5DC863"
),
title = "30 day survival after elective EVAR",
legend = "none",
legend.title = "Decade",
legend.labs = c("5th",
"6th",
"7th",
"8th",
"9th"
),
censor.shape = 124,
ggtheme = survPlotTheme,
risk.table = "nrisk_cumevents",
risk.table.y.text.col = TRUE,
risk.table.fontsize = 3,
risk.table.height = 0.3,
break.time.by = 5,
ylim = c(0.95,
1
),
pval = TRUE,
pval.size = 3,
pval.coord = c(1,
0.96
)
)
And:
> fiveColours <- c("#440154",
"#3B528B",
"#21908C",
"#5DC863",
"#5DC863"
)
> ggsurvplot(elective_30Decadesurv,
data = electives,
palette = fiveColours,
title = "30 day survival after elective EVAR",
legend = "none",
legend.title = "Decade",
legend.labs = c("5th",
"6th",
"7th",
"8th",
"9th"
),
censor.shape = 124,
ggtheme = survPlotTheme,
risk.table = "nrisk_cumevents",
risk.table.y.text.col = TRUE,
risk.table.fontsize = 3,
risk.table.height = 0.3,
break.time.by = 5,
ylim = c(0.95,
1
),
pval = TRUE,
pval.size = 3,
pval.coord = c(1,
0.96
)
)
Give the same error:
Error in names(.cols) <- grp.levels :
'names' attribute [5] must be the same length as the vector [4]
What vector is length [4]?
Is 'names' attribute my colour vector?
If I take one of the colours out of the custom palette, eg fiveColours <- c("#440154","#3B528B","#21908C","#5DC863") I get this error:
Error: Insufficient values in manual scale. 5 needed but only 4 provided.
Which implies the number of colours provided is correct but something else is causing the issue.
I've troubleshot to the limits of my own ability. Help please!
FYI:
> electives %>% select(Decade) %>% group_by(Decade) %>% summarise(n())
# A tibble: 5 x 2
Decade `n()`
<fct> <int>
1 50 14
2 60 173
3 70 442
4 80 168
5 90 2
Should prove the length of the Decade variable and here is how the survival object and survfit were generated:
> elective5Surv <- Surv(electives$surv5Y, electives$dead5Y)
> elective_5Decadesurv <- survfit(elective5Surv ~ electives$Decade)
Ok, I have sorted my own mistake by proof-reading!
Of the five hex colours I’d provided, two were identical (not on purpose.)
I changed the fifth colour to a different hex value (what it was meant to be in the first place) and it works now.
Thanks, Rui, for your response earlier, it helped me down the path!

barplot column for <NA>

I would like to have a column in my barplot for missing data.
adult <- read.csv(
"http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
header = FALSE,
na.strings = "?",
strip.white = TRUE
)
colnames(adult) <- c("age", "workClass", "fnlwgt", "education", "educationNum", "maritalStatus", "occupation", "relationship", "race", "sex", "capitalGain", "capitalLoss", "hoursPerWeek", "nativeCountry", "prediction")
barplot(table(adult$workClass), main="Job Distribution", xlab="Job", ylab="Count",las=2)
I know that in this dataset, there are 1836 missing values for workClass, from
length(which(is.na(adult$workClass)))
You can use the argument useNA = "ifany" in table.
tab <- table(adult$workClass, useNA = "ifany")
# Federal-gov Local-gov Never-worked Private
# 960 2093 7 22696
# Self-emp-inc Self-emp-not-inc State-gov Without-pay
# 1116 2541 1298 14
# <NA>
# 1836
By default, the name of the NA count is NA itself. You can change the name to the character string "NA" with the following command.
names(tab)[is.na(names(tab))] <- "NA"
Now, the plot displays the name "NA" on the x axis too.
barplot(tab, main = "Job Distribution", xlab = "Job", ylab = "Count", las = 2)
You can combine useNA = "ifany" in table() and names.arg in barplot()
barplot(table(adult$workClass, useNA = "ifany"),
names.arg = c(levels(adult$workClass),"NA's") )
c(levels(adult$workClass),"NA's") Is creating a vector that includes the names of all the levels/categories within the variable and the custom name NA's to represent the NA values

Resources