Related
I have a data like this
df<- structure(list(14, FALSE, c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12,
13, 6), c(0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 0), c(0, 1, 2,
3, 4, 12, 5, 6, 7, 8, 9, 10, 11), c(0, 1, 2, 3, 4, 12, 5, 6,
7, 8, 9, 10, 11), c(0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13), c(0, 6, 6, 6, 6, 6, 6, 13, 13, 13, 13, 13, 13, 13, 13
), list(c(1, 0, 1), structure(list(), names = character(0)),
list(name = c("Bestman", "Tera1", "Tera2", "Tera3", "Tera4",
"Tera5", "Tetra", "Brownie1", "Brownie2", "Brownie3", "Brownie4",
"Brownie5", "Brownie6", "Brownie7")), list()), <environment>), class = "igraph")
I can plot it like this
plot(df)
if I want to remove the label , I can do this
plot(df,vertex.label=NA)
but it removes it for all and I want to keep the core
I want to be able to plot it with ggplot , removing the label on each node but leave the main core label on , remove the line around the circles
Your dput is not directly reproducible (igraphs contain an environment which isn't included in a dput). The graph I can recover has two linked central nodes, each with several child nodes.
We can draw graphs with plentiful customization options using the tidygraph and ggraph packages:
library(ggraph)
library(tidygraph)
library(extrafont)
as_tbl_graph(g) %>%
activate(nodes) %>%
mutate(type = ifelse(name %in% c("Bestman", "Tetra"), "root", "branch")) %>%
mutate(group = ifelse(name == "Bestman" | grepl("Tera", name),
"Bestman", "Tera")) %>%
ggraph(layout = "igraph", algorithm = "nicely") +
geom_edge_link(width = 2, alpha = 0.1) +
geom_node_circle(aes(r = ifelse(type == "root", 0.4, 0.1), fill = group),
color = NA) +
geom_node_text(aes(label = ifelse(type == "root", name, "")), size = 5,
color = "gray40", family = "Roboto Condensed", fontface = 2) +
theme_graph() +
coord_equal() +
scale_fill_brewer(palette = "Pastel2", guide = "none")
I am quite new to R and I am trying to run a PCA for an incomplete data set with the code:
res.comp <- imputePCA(questionaire_results_PCA, ncp = nb$ncp)
but R tells me:
Error: Must use a vector in [, not an object of class matrix.
Run rlang::last_error() to see where the error occurred.
So I run:
rlang::last_error()
R says:
1. missMDA::imputePCA(questionaire_results_PCA, ncp = nb$ncp)
4. tibble:::`[.tbl_df`(X, !is.na(X))
5. tibble:::check_names_df(i, x)
Run `rlang::last_trace()` to see the full context
So I run:
rlang::last_trace()
And R Says:
Must use a vector in `[`, not an object of class matrix.
Backtrace:
█
1. └─missMDA::imputePCA(questionaire_results_PCA, ncp = nb$ncp)
2. ├─base::mean((res.impute$fittedX[!is.na(X)] - X[!is.na(X)])^2)
3. ├─X[!is.na(X)]
4. └─tibble:::`[.tbl_df`(X, !is.na(X))
5. └─tibble:::check_names_df(i, x)
Does anyone know what this means and how I could get it to work?
I have run:
dput(head(questionaire_results_PCA))
and I got:
structure(list(Active = c(6, 6, 5, 7, 5, 6), `Aggressive to people` = c(NA,
4, NA, 2, NA, 1), Anxious = c(NA, 4, NA, 3, NA, 2), Calm = c(NA,
5, NA, 5, NA, 6), Cooperative = c(7, 6, 7, 6, 6, 6), Curious = c(7,
2, 7, 7, 7, 6), Depressed = c(1, 3, 1, 1, 1, 1), Eccentric = c(1,
3, 1, 4, 1, 4), Excitable = c(5, 2, 5, 5, 4, 4), `Fearful of people` = c(1,
2, 1, 2, 1, 1), `friendly of people` = c(5, 6, 7, 7, 7, 7), Insecure = c(2,
5, 2, 3, 2, 2), Playful = c(4, 6, 2, 5, 6, 6), `Self assured` = c(7,
6, 7, 5, 6, 6), Smart = c(6, 2, 7, 5, 7, 3), Solitary = c(4,
4, 3, 4, 3, 2), Tense = c(1, 2, 1, 3, 1, 2), Timid = c(2, 2,
2, 2, 2, 2), Trusting = c(6, 6, 6, 6, 6, 6), Vigilant = c(7,
6, 5, 3, 5, 3), Vocal = c(2, 7, 1, 6, 1, 7)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I then ran the code:
dput(nb$ncp)
and got:
3L
Here's the answer in case anyone comes across the same issue. Using the data provided by OP:
class(questionaire_results_PCA)
[1] "tbl_df" "tbl" "data.frame"
The input of imputePCA requires a data.frame, but it does not work with a tribble. So we need to convert it back to a matrix or data.frame:
library(missMDA)
res.comp <- imputePCA(data.frame(questionaire_results_PCA), ncp = 2)
Error in eigen(crossprod(t(X), t(X)), symmetric = TRUE) :
infinite or missing values in 'x'
I get this error because it's a subset of the data and some of the columns have no deviation, we work around this first.
sel = which(apply(questionaire_results_PCA,2,sd)!=0)
# returns you a data.frame
res1 <- imputePCA(as.data.frame(questionaire_results_PCA[,sel]), ncp = 2)
# returns you a matrix
res2 <- imputePCA(as.matrix(questionaire_results_PCA[,sel]), ncp = 2)
My goal is to find much simpler code, which can generalize, that shows the relationships between responses to two survey questions. In the MWE, one question asked respondents to rank eight marketing selections from 1 to 8 and the other asked them to rank nine attribute selections from 1 to 9. Higher rankings indicate the respondent favored the selection more. Here is the data frame.
structure(list(Email = c("a", "b", "c", "d", "e", "f", "g", "h",
"i"), Ads = c(2, 1, 1, 1, 1, 2, 1, 1, 1), Alumni = c(3, 2, 2,
3, 2, 3, 2, 2, 2), Articles = c(6, 4, 3, 2, 3, 4, 3, 3, 3), Referrals = c(4,
3, 4, 8, 7, 8, 8, 6, 4), Speeches = c(7, 7, 6, 7, 4, 7, 4, 5,
5), Updates = c(8, 6, 6, 5, 5, 5, 5, 7, 6), Visits = c(5, 8,
7, 6, 6, 6, 6, 4, 8), `Business Savvy` = c(10, 6, 10, 10, 4,
4, 6, 8, 9), Communication = c(4, 3, 8, 3, 3, 9, 7, 6, 7), Experience = c(7,
7, 7, 9, 2, 8, 5, 9, 5), Innovation = c(2, 1, 4, 2, 1, 2, 2,
1, 1), Nearby = c(3, 2, 2, 1, 5, 3, 3, 2, 2), Personal = c(8,
10, 6, 8, 6, 10, 4, 3, 3), Rates = c(9, 5, 9, 6, 9, 7, 10, 5,
4), `Staffing Model` = c(6, 8, 5, 5, 7, 5, 8, 7, 8), `Total Cost` = c(5,
4, 3, 7, 8, 6, 9, 4, 6)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
If numeric rankings cannot be used for my solution to calculating relationships (correlations), please correct me.
Hoping they can be used, I arrived at the following plodding code, which I hope calculates the correlation matrix of each method selection against each attribute selection.
library(psych)
dataframe2 <- psych::corr.test(dataframe[ , c(2, 9:17)])[[1]][1:10] # the first method vs all attributes
dataframe3 <- psych::corr.test(dataframe[ , c(3, 9:17)])[[1]][1:10] # the 2nd method vs all attributes and so on
dataframe4 <- psych::corr.test(dataframe[ , c(4, 9:17)])[[1]][1:10]
dataframe5 <- psych::corr.test(dataframe[ , c(5, 9:17)])[[1]][1:10]
dataframe6 <- psych::corr.test(dataframe[ , c(6, 9:17)])[[1]][1:10]
dataframe7 <- psych::corr.test(dataframe[ , c(7, 9:17)])[[1]][1:10]
dataframe8 <- psych::corr.test(dataframe[ , c(8, 9:17)])[[1]][1:10]
# create a dataframe from the rbinded rows
bind <- data.frame(rbind(dataframe2, dataframe3, dataframe4, dataframe5, dataframe6, dataframe7, dataframe8))
Rename rows and columns:
colnames(bind) <- c("Sel", colnames(dataframe[9:17]))
rownames(bind) <- colnames(dataframe[2:8])
How can I accomplish the above more efficiently?
By the way, the bind data frame also allows one to produce a heat map with the DataExplorer package.
library(DataExplorer)
DataExplorer::plot_correlation(bind)
[Summary]
In the scope of our discussion, there are two ways to get the correlation data.
Use stats::cor, i.e., cor(subset(dataframe, select = -Email))
Use psych::corr.test, i.e., corr.test(subset(dataframe, select = -Email))[[1]]
Then you may subset the correlation matrix with the desired rows and columns.
In order to use DataExplorer::plot_correlation, you can simply do plot_correlation(dataframe, type = "c"). Note: the output heatmap will include correlations for all columns, so you can just ignore columns that are not of interests.
[Original Answer]
## Create data
dataframe <- structure(
list(
Email = c("a", "b", "c", "d", "e", "f", "g", "h", "i"),
Ads = c(2, 1, 1, 1, 1, 2, 1, 1, 1),
Alumni = c(3, 2, 2, 3, 2, 3, 2, 2, 2),
Articles = c(6, 4, 3, 2, 3, 4, 3, 3, 3),
Referrals = c(4, 3, 4, 8, 7, 8, 8, 6, 4),
Speeches = c(7, 7, 6, 7, 4, 7, 4, 5, 5),
Updates = c(8, 6, 6, 5, 5, 5, 5, 7, 6),
Visits = c(5, 8, 7, 6, 6, 6, 6, 4, 8),
`Business Savvy` = c(10, 6, 10, 10, 4, 4, 6, 8, 9),
Communication = c(4, 3, 8, 3, 3, 9, 7, 6, 7),
Experience = c(7, 7, 7, 9, 2, 8, 5, 9, 5),
Innovation = c(2, 1, 4, 2, 1, 2, 2, 1, 1),
Nearby = c(3, 2, 2, 1, 5, 3, 3, 2, 2),
Personal = c(8, 10, 6, 8, 6, 10, 4, 3, 3),
Rates = c(9, 5, 9, 6, 9, 7, 10, 5, 4),
`Staffing Model` = c(6, 8, 5, 5, 7, 5, 8, 7, 8),
`Total Cost` = c(5, 4, 3, 7, 8, 6, 9, 4, 6)
),
row.names = c(NA, -9L),
class = c("tbl_df", "tbl", "data.frame")
)
Following your example strictly, we can do the following:
## Calculate correlation
df2 <- subset(dataframe, select = -Email)
marketing_selections <- names(df2)[1:7]
attribute_selections <- names(df2)[8:16]
corr_matrix <- psych::corr.test(df2)[[1]]
bind <- subset(corr_matrix,
subset = rownames(corr_matrix) %in% marketing_selections,
select = attribute_selections)
DataExplorer::plot_correlation(bind)
WARNING
However, is this what you really want? psych::corr.test generates the correlation matrix, and DataExplorer::plot_correlation calculates the correlation again. It is like the correlation of the correlation.
I have a data frame, in wide format, with each column representing one questionnaire item for one particular version of a questionnaire for a particular time point (repeated measures design).
My data would look something like the following:
df <- data.frame(id = c(1:5), t1_QOL_child_Q1 = c(5, 3, 6, 2, 7), t1_QOL_child_Q2 = c(5, 2, 3, 7, 1), t1_QOL_child_Q3 = c(7, 7, 6, 2, 5), t1_QOL_child_joy = c(9,9, 5, 3, 6), t1_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t1_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t1_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t1_QOL_teen_joy = c(5, 7, 4, 7, 9), t1_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t1_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t1_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t1_QOL_adult_joy = c(6, 5, 3, 3, 2), t2_QOL_child_Q1 = c(5, 3, 6, 2, 7), t2_QOL_child_Q2 = c(5, 2, 3, 7, 1), t2_QOL_child_Q3 = c(7, 7, 6, 2, 5), t2_QOL_child_joy = c(9,9, 5, 3, 6), t2_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t2_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t2_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t2_QOL_teen_joy = c(5, 7, 4, 7, 9), t2_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t2_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t2_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t2_QOL_adult_joy = c(6, 5, 3, 3, 2))
For example, column t1_QOL_child_Q1 would mean Question 1 (Q1) of the child version (child) of Quality of Life (QOL) questionnaire, with time point 1 (t1) data.
I want to select only subscales/columns whose suffix are labelled differently. In the sample data above, it would be the columns ending with "joy".
I have over 3000 columns and many more suffixes and it would be a pain to use the following:
select(df, ends_with("joy"), ends_with(<another suffix>), ends_with(<another suffix>))
I have thought of putting all the potential suffixes in a string vector, and use the vector as an input to the ends_with function, but ends_with could only take a single string instead of a vector of strings.
I have searched on Stackoverflow and found a solution that could accommodate a small vector of strings, which is the following:
select(df, sapply(vector_of_strings, starts_with))
However, I have too many suffixes in my vector of strings and the following error message resulted from it: Error: sapply(vector_of_strings, ends_with) must resolve to integer column positions, not a list
Help appreciated. Thanks!
We can use a single matches with multiple patterns separated by | to match substrings at the end ($) of the string
df %>%
select(matches("(joy|Q2)$"))
Let say i have a SpatialPolygons object with 3 polygons data name groupexc:
library(raster)
p1 <- matrix(c(2, 3, 4, 5, 6, 5, 4, 3, 2, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p2 <- matrix(c(8, 9, 10, 11, 12, 11, 10, 9, 8, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p3 <- matrix(c(5, 6, 7, 8, 9, 8, 7, 6, 5, 9, 10, 11, 10, 9, 8, 7, 8, 9), ncol=2)
groupexc <- spPolygons(p1, p2, p3)
And a SpatialPolygons object zoneexc that represents a single zone:
zoneexc = spPolygons(matrix(c(2,1,3,4,6,8,10,13,14,14,12,10,8,6,4,2,1,3,7,10,12,14,12,6,4,3,1,1,1,1,1,1), ncol=2))
Is there a way for me to expand the output from groupexc until it reach points in zoneexc?
before
plot(zoneexc, border='red', lwd=3)
plot(groupexc, add=TRUE, border='blue', lwd=2)
text(groupexc, letters[1:3])
after:
Any help would be appreciated.
Here is an approximate solution. This approach might break for large problems, and it depends on having sufficient number of nodes in each polygon. But it may be good enough for your purpose.
# example data
library(raster)
p1 <- matrix(c(2, 3, 4, 5, 6, 5, 4, 3, 2, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p2 <- matrix(c(8, 9, 10, 11, 12, 11, 10, 9, 8, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p3 <- matrix(c(5, 6, 7, 8, 9, 8, 7, 6, 5, 9, 10, 11, 10, 9, 8, 7, 8, 9), ncol=2)
groups <- spPolygons(p1, p2, p3, attr=data.frame(name=c('a', 'b', 'c')))
zone <- spPolygons(matrix(c(2,1,3,4,6,8,10,13,14,14,12,10,8,6,4,2,1,3,7,10,12,14,12,6,4,3,1,1,1,1,1,1), ncol=2))
Now create nearest neighbor polygons. For this to work as below, you need dismo version 1.1-1 (or higher)
library(dismo)
# get the coordinates of the polygons
g <- unique(geom(groups))
v <- voronoi(g[, c('x', 'y')], ext=extent(zone))
# plot(v)
# assign group id to the new polygons
v$group <- g[v$id, 1]
# aggregate (dissolve) polygons by group id
a <- aggregate(v, 'group')
# remove areas outside of the zone
i <- crop(a, zone)
# add another identifier
i$name <- groups$name[i$group]
plot(i, col=rainbow(3))
text(i, "name", cex=2)
plot(groups, add=TRUE, lwd=2, border='white', lty=2)
To see how it works:
points(g[, c('x', 'y')], pch=20, cex=2)
plot(v, add=TRUE)