Related
Is it possible to choose your own color for a cell when creating a table using kable.
In the folowing code from the kable documentation, it shows how you can choose a spectrum (from a couple of options) of colors for a row or column, but I want 1 color for 1 cell. That I can export to a PDF, using knitr.
vs_dt <- iris[1:10, ]
vs_dt[5] <- cell_spec(vs_dt[[5]], color = "white", bold = T,
background = spec_color(1:10, end = 0.9, option = "D", direction = -1))
15
kbl(vs_dt, booktabs = T, escape = F, align = "c")
Thank you
If you would only need 1 color for 1 cell, you can specify the row and column of that cell to be modified.
vs_dt <- iris[1:10, ]
#needed as they are factors
vs_dt$Species <- as.character(vs_dt$Species)
#only the first entry of the Species column
vs_dt[1,5] <- cell_spec(vs_dt[1,5], color = "white", bold = T,
background = "red")
# you can also change some of them excluding the cell.
vs_dt[2:10,5] <- cell_spec(vs_dt[2:10,5], bold = T)
kbl(vs_dt, booktabs = T, escape = F, align = "c")
I have two data.frames (each originally a dimension from a larger parent 3-d array). One holds numeric values. The other has T/F values indicating whether the confidence interval for each value in the first array overlaps a reference confidence interval. The confidence intervals are different for every value in the array, so in formatting the table I can’t refer to constants, only to the array of T/F values.
I want to show a table of the first array, with background color of each cell based on the second array. So that formattable can see the columns with T/F values, I created a single data frame that binds the columns from both 3rd dimensions. In the real data there are ~20 columns of numeric values. Here is a simplified example:
orig.data <- array(dim = c(3, 4, 2))
dimnames(orig.data) <- list (c("site1", "site2", "site3"), c("model1", "model2", "model3",
"model4"), c("mean.val", "is.in.CI"))
orig.data[,,1] <- round(runif(12, 2, 10), 2)
orig.data[,,2] <- as.logical(round(runif(12, 0, 1)))
ft.data <- data.frame(orig.data[,,2], stringsAsFactors = F)
colnames(ft.data) <- paste0("match.", colnames(ft.data))
ft.data <- cbind(data.frame(orig.data[,,1], stringsAsFactors = F),
ft.data)
I can create the table formatting I want by calling each column by name. There are two special considerations. First, for the first four columns, the choice of background color is conditional on a second column. Second, the last four columns I would like to hide. Is there a way to do this with apply or some similar succinct dynamic syntax?
Here is the long version that I’d like to consolidate.
yes.color <- "lightgreen"
no.color <- "pink"
formattable::formattable(ft.data, list(
`model1` = formatter("span", style = ~ style(display = "block",
"border-radius" = "4px", "padding-right" = "4px",
"background-color" = ifelse(`match.model1`, yes.color, no.color))),
`model2` = formatter("span", style = ~ style(display = "block",
"border-radius" = "4px", "padding-right" = "4px",
"background-color" = ifelse(`match.model2`, yes.color, no.color))),
`model3` = formatter("span", style = ~ style(display = "block",
"border-radius" = "4px", "padding-right" = "4px",
"background-color" = ifelse(`match.model3`, yes.color, no.color))),
`model4` = formatter("span", style = ~ style(display = "block",
"border-radius" = "4px", "padding-right" = "4px",
"background-color" = ifelse(`match.model4`, yes.color, no.color))),
match.model1 = F,
match.model2 = F,
match.model3 = F,
match.model4 = F))
This question is similar to the second of my questions and is unanswered: Loop, hide columns and r formattable
Failed attempts to automate hiding the T/F columns follow. I don’t have any ideas for automating the 2-column references.
formattable(ft.data[, 1:4])
If I omit the columns with the T/F designations, the formatter doesn’t know they exist.
area(col = 5:8) = F)) # no effect
Outside the formattable command, create a string:
formatter.string <- paste( unlist(paste0("match.", c(“model1”, “model2”, “model3”, “model4”), " = #F,\n\t")), collapse='')
then within the list for formattable, add
eval(parse(formatter.string)))) # no effect OR
lapply(5:8, function(m.col){m.col = F}) # also no effect
This is the best I could come up with using some eval / parse magic:
format <- sapply(names(ft.data)[1:4],function(x)
{
eval( #evaluate the following expression
parse(text= #parse the following string to an expression
sub("_SUB_", #find "_SUB_"
paste0("`match.",x,"`"), #replace with name of column
"formatter(\"span\", style = ~ style(display = \"block\", #in the string containing the formatter call
\"border-radius\" = \"4px\", \"padding-right\" = \"4px\",
\"background-color\" = ifelse(_SUB_, yes.color, no.color)))")))
},simplify=F,USE.NAMES = T)
#hiding part. Same concept as above
hide <- sapply(names(ft.data[5:8]), function(x) eval(parse(text=sub("_SUB_",x,"_SUB_ = F"))),
simplify=F,USE.NAMES=T)
formattable::formattable(ft.data,c(format,hide))
I am wondering if it is possible to extract out the main hex colors from image files containing team sports logos. I have the following vector of logos:
dput(team.logos[1:5))
c("https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/399.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/2066.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/42.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/311.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/160.png")
Using the following website (https://html-color-codes.info/colors-from-image/) - I am able to see that the hex color values in the first image (UAlbany) are #FEBE10 for yellow, and #3F1E6B for the purple, as well as white.
My question is - is there any way to scrape these hex values for each image in my vector in R (so I don't have to manually load each image and click to find each hex value).
Thanks!
Another option using the imager package...
require('imager')
require('data.table')
team.logos <- c("https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/399.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/2066.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/42.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/311.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/160.png")
#this function takes an image in imager's cimg format and
#returns the hex colour codes for any colours covering more than
#a threshold proportion of pixels (default is set to 0.05)
getHexPrimaries <- function(img, pcnt.threshold = 0.05){
#convert cimg to workable format
channel.labels <- c('R','G','B','A')[1:dim(img)[4]]
img <- as.data.table(as.data.frame(img))
img[,channel := factor(cc ,labels=channel.labels)]
img <- dcast(img, x+y ~ channel, value.var = "value")
#sort by unique rgb combinations and identify the primary colours
colours.sorted <- img[, .N, by=list(R,G,B)][order(-N)]
colours.sorted[ , primary := N/sum(N) > pcnt.threshold]
#convert to hex
hex.primaries <-
apply(colours.sorted[primary==TRUE], 1, function(row){
hex <- rgb(row[1], row[2], row[3], maxColorValue=1)
hex
})
hex.primaries
}
hex.list <- lapply(team.logos, function(logo.url) {
download.file(logo.url,'temp.png', mode = 'wb')
img <- load.image('temp.png')
getHexPrimaries(img)
})
Give this a try. The png library allows one to load a RGB file and then it is a matter of converting the three channels into the Hex codes.
I confirmed the codes are correct for the first image, good luck with the rest.
logos<-c("https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/399.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/2066.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/42.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/311.png",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/160.png")
plot(NA, xlim = c(0, 2), ylim = c(0, 5), type = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "")
library(png)
for (filen in seq_along(logos)) {
#download and read file
#this will overwrite the file each time,
#create a list if you would like to save the files for the future.
download.file(logos[filen], "file1.png")
image1<-readPNG("file1.png")
#plot if desired
#plot(NA, xlim = c(0, 2), ylim = c(0, 5), type = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "")
rasterImage(image1, 0, filen-1, 1, filen)
#convert the rgb channels to Hex
outR<-as.hexmode(as.integer(image1[,,1]*255))
outG<-as.hexmode(as.integer(image1[,,2]*255))
outB<-as.hexmode(as.integer(image1[,,3]*255))
#paste into to hex value
hex<-paste0(outR, outG, outB)
#remove the white and black
hex<-hex[hex != "ffffff" & hex != "000000"]
#print top 5 colors
print(head(sort(table(hex), decreasing = TRUE)))
}
Here is the sample output, the hex color with the number of pixels with that color.
print(head(sort(table(hex), decreasing = TRUE)))
#hex
#c3c4c6 00275d 00265c c2c3c5 001e57 00255c
#67929 39781 838 744 649 633
Given a data frame containing mixed variables (i.e. both categorical and continuous) like,
digits = 0:9
# set seed for reproducibility
set.seed(17)
# function to create random string
createRandString <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
df <- data.frame(ID=c(1:10), name=sample(letters[1:10]),
studLoc=sample(createRandString(10)),
finalmark=sample(c(0:100),10),
subj1mark=sample(c(0:100),10),subj2mark=sample(c(0:100),10)
)
I perform unsupervised feature selection using the package FactoMineR
df.princomp <- FactoMineR::FAMD(df, graph = FALSE)
The variable df.princomp is a list.
Thereafter, to visualize the principal components I use
fviz_screeplot() and fviz_contrib() like,
#library(factoextra)
factoextra::fviz_screeplot(df.princomp, addlabels = TRUE,
barfill = "gray", barcolor = "black",
ylim = c(0, 50), xlab = "Principal Component",
ylab = "Percentage of explained variance",
main = "Principal Component (PC) for mixed variables")
factoextra::fviz_contrib(df.princomp, choice = "var",
axes = 1, top = 10, sort.val = c("desc"))
which gives the following Fig1
and Fig2
Explanation of Fig1: The Fig1 is a scree plot. A Scree Plot is a simple line segment plot that shows the fraction of total variance in the data as explained or represented by each Principal Component (PC). So we can see the first three PCs collectively are responsible for 43.8% of total variance. The question now naturally arises, "What are these variables?". This I have shown in Fig2.
Explanation of Fig2: This figure visualizes the contribution of rows/columns from the results of Principal Component Analysis (PCA). From here I can see the variables, name, studLoc and finalMark are the most important variables that can be used for further analysis.
Further Analysis- where I'm stuck at: To derive the contribution of the aforementioned variables name, studLoc, finalMark. I use the principal component variable df.princomp (see above) like df.princomp$quanti.var$contrib[,4]and df.princomp$quali.var$contrib[,2:3].
I've to manually specify the column indices [,2:3] and [,4].
What I want: I want to know how to do dynamic column index assignment, such that I do not have to manually code the column index [,2:3] in the list df.princomp?
I've already looked at the following similar questions 1, 2, 3 and 4 but cannot find my solution? Any help or suggestions to solve this problem will be helpful.
Not sure if my interpretation of your question is correct, apologies if not. From what I gather you are using PCA as an initial tool to show you what variables are the most important in explaining the dataset. You then want to go back to your original data, select these variables quickly without manual coding each time, and use them for some other analysis.
If this is correct then I have saved the data from the contribution plot, filtered out the variables that have the greatest contribution, and used that result to create a new data frame with these variables alone.
digits = 0:9
# set seed for reproducibility
set.seed(17)
# function to create random string
createRandString <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
df <- data.frame(ID=c(1:10), name=sample(letters[1:10]),
studLoc=sample(createRandString(10)),
finalmark=sample(c(0:100),10),
subj1mark=sample(c(0:100),10),subj2mark=sample(c(0:100),10)
)
df.princomp <- FactoMineR::FAMD(df, graph = FALSE)
factoextra::fviz_screeplot(df.princomp, addlabels = TRUE,
barfill = "gray", barcolor = "black",
ylim = c(0, 50), xlab = "Principal Component",
ylab = "Percentage of explained variance",
main = "Principal Component (PC) for mixed variables")
#find the top contributing variables to the overall variation in the dataset
#here I am choosing the top 10 variables (although we only have 6 in our df).
#note you can specify which axes you want to look at with axes=, you can even do axes=c(1,2)
f<-factoextra::fviz_contrib(df.princomp, choice = "var",
axes = c(1), top = 10, sort.val = c("desc"))
#save data from contribution plot
dat<-f$data
#filter out ID's that are higher than, say, 20
r<-rownames(dat[dat$contrib>20,])
#extract these from your original data frame into a new data frame for further analysis
new<-df[r]
new
#finalmark name studLoc
#1 53 b POTYQ0002N
#2 73 i LWMTW1195I
#3 95 d VTUGO1685F
#4 39 f YCGGS5755N
#5 97 c GOSWE3283C
#6 58 g APBQD6181U
#7 67 a VUJOG1460V
#8 64 h YXOGP1897F
#9 15 j NFUOB6042V
#10 81 e QYTHG0783G
Based on your comment, where you said you wanted to 'Find variables with value greater than 5 in Dim.1 AND Dim.2 and save these variables to a new data frame', I would do this:
#top contributors to both Dim 1 and 2
f<-factoextra::fviz_contrib(df.princomp, choice = "var",
axes = c(1,2), top = 10, sort.val = c("desc"))
#save data from contribution plot
dat<-f$data
#filter out ID's that are higher than 5
r<-rownames(dat[dat$contrib>5,])
#extract these from your original data frame into a new data frame for further analysis
new<-df[r]
new
(This keeps all the original variables in our new data frame since they all contributed more than 5% to the total variance)
There are a lot of ways to extract contributions of individual variables to PCs. For numeric input, one can run a PCA with prcomp and look at $rotation (I spoke to soon and forgot you've got factors here so prcomp won't work directly). Since you are using factoextra::fviz_contrib, it makes sense to check how that function extracts this information under the hood. Key factoextra::fviz_contrib and read the function:
> factoextra::fviz_contrib
function (X, choice = c("row", "col", "var", "ind", "quanti.var",
"quali.var", "group", "partial.axes"), axes = 1, fill = "steelblue",
color = "steelblue", sort.val = c("desc", "asc", "none"),
top = Inf, xtickslab.rt = 45, ggtheme = theme_minimal(),
...)
{
sort.val <- match.arg(sort.val)
choice = match.arg(choice)
title <- .build_title(choice[1], "Contribution", axes)
dd <- facto_summarize(X, element = choice, result = "contrib",
axes = axes)
contrib <- dd$contrib
names(contrib) <- rownames(dd)
theo_contrib <- 100/length(contrib)
if (length(axes) > 1) {
eig <- get_eigenvalue(X)[axes, 1]
theo_contrib <- sum(theo_contrib * eig)/sum(eig)
}
df <- data.frame(name = factor(names(contrib), levels = names(contrib)),
contrib = contrib)
if (choice == "quanti.var") {
df$Groups <- .get_quanti_var_groups(X)
if (missing(fill))
fill <- "Groups"
if (missing(color))
color <- "Groups"
}
p <- ggpubr::ggbarplot(df, x = "name", y = "contrib", fill = fill,
color = color, sort.val = sort.val, top = top, main = title,
xlab = FALSE, ylab = "Contributions (%)", xtickslab.rt = xtickslab.rt,
ggtheme = ggtheme, sort.by.groups = FALSE, ...) + geom_hline(yintercept = theo_contrib,
linetype = 2, color = "red")
p
}
<environment: namespace:factoextra>
So it's really just calling facto_summarize from the same package. By analogy you can do the same thing, simply call:
> dd <- factoextra::facto_summarize(df.princomp, element = "var", result = "contrib", axes = 1)
> dd
name contrib
ID ID 0.9924561
finalmark finalmark 21.4149175
subj1mark subj1mark 7.1874438
subj2mark subj2mark 16.6831560
name name 26.8610132
studLoc studLoc 26.8610132
And that's the table corresponding to your figure 2. For PC2 use axes = 2 and so on.
Regarding "how to programmatically determine the column indices of the PCs", I'm not 100% sure I understand what you want, but if you just want to say for column "finalmark", grab its contribution to PC3 you can do the following:
library(tidyverse)
# make a tidy table of all column names in the original df with their contributions to all PCs
contribution_df <- map_df(set_names(1:5), ~factoextra::facto_summarize(df.princomp, element = "var", result = "contrib", axes = .x), .id = "PC")
# get the contribution of column 'finalmark' by name
contribution_df %>%
filter(name == "finalmark")
# get the contribution of column 'finalmark' to PC3
contribution_df %>%
filter(name == "finalmark" & PC == 3)
# or, just the numeric value of contribution
filter(contribution_df, name == "finalmark" & PC == 3)$contrib
BTW I think ID in your example is treated as numeric instead of factor, but since it's just an example I'm not bothering with it.
I found a nice tutorial of self organizing map clustering in R in which, it is explained how to display your input data in the unit space (see below). In order to set up some rules for the labeling, I would like to compute the probability of each class in each neuron and plot it. Computing the probability is rather easy: take for each unit the number of observations of class i and divide it by the total number of observations in this unit. I end up with data.frame pc. Now I struggle to map this result, any clue on how to do it?
library(kohonen)
data(yeast)
set.seed(7)
yeast.supersom <- supersom(yeast, somgrid(8, 8, "hexagonal"),whatmap = 3:6)
classes <- levels(yeast$class)
colors <- c("yellow", "green", "blue", "red", "orange")
par(mfrow = c(3, 2))
plot(yeast.supersom, type = "mapping",pch = 1, main = "All", keepMargins = TRUE,bgcol = gray(0.85))
library(plyr)
pc <- data.frame(Var1=c(1:64))
for (i in seq(along = classes)) {
X.class <- lapply(yeast, function(x) subset(x, yeast$class == classes[i]))
X.map <- map(yeast.supersom, X.class)
plot(yeast.supersom, type = "mapping", classif = X.map,
col = colors[i], pch = 1, main = classes[i], add=TRUE)
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
}
OKay guys here is a solution:
Once I have computed the probability, it derives a color code from a defined gradient (rbPal). The gradient is defined by a upper and a lower bound and the shade of the colors are proportional to their interval. THis is done with the function findInterval.
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
rbPal <- colorRampPalette(c('blue','yellow','red'))
plot(yeast.supersom, type="mapping", bgcol = rbPal((100))[(findInterval(pc[,which(colnames(pc)==as.character(classes[i]))], seq(0:100))+1)], main = paste("Probabily Clusters:", classes[i]))