Related
I would like to aggregate, in order to reduce the number of constructs, its following data frame containing only binary variables that correspond to "yes/no", its following data frame (first 10 row). The original data frame contains 169 rows.
outcome <-
structure(list(Q9_Automazione.processi = c(0, 0, 0, 0, 0, 0,
1, 1, 1, 0), Q9_Velocita.Prod = c(1, 0, 0, 1, 0, 0, 1, 1, 1,
0), Q9_Flessibilita.Prod = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1), Q9_Controllo.processi = c(0,
0, 0, 1, 0, 0, 1, 1, 0, 0), Q9_Effic.Magazzino = c(0, 0, 0, 1,
0, 0, 0, 0, 0, 0), Q9_Riduz.Costi = c(0, 1, 0, 0, 0, 0, 0, 0,
0, 1), Q9_Miglior.Sicurezza = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1),
Q9_Connett.Interna = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0), Q9_Connett.Esterna = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Virtualizzazione = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Q9_Innov.Prod = c(0, 0, 0, 0, 0,
1, 0, 0, 0, 1), Q9_Person.Prod = c(0, 1, 0, 1, 0, 1, 0, 0,
0, 1), Q9_Nuovi.Mercati = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Q9_Nuovi.BM = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.Energ = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.SostAmb = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")
I have tried performing factor analysis via the tethracoric method on the obtained correlation matrix ( the obtained value from the KMO function turns out to be inadequate) both directly on the dataframe and then using tethracoric correletions in fafunction (using cor = "tet" I get a negative Tucker Lewis Index).
I have been reading up on this but cannot find a methodology that is adequate and of which I am certain of the correctness of the analysis.
So basically what I would like to achieve is to aggregate similar constructs, e.g., assess whether column 5 has value 1 (i.e., "yes") almost always when column 11 has value 1 and then aggregate.
Here the code that I try to used
library(psych)
tet <- tetrachoric(outcome)
corrplot(tet$rho, "ellipse", tl.cex = 0.75, tl.col = "black")
par(mfrow = c(1,2))
corr_matrix %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(corr_matrix)
cortest.bartlett(corr_matrix)
fa.parallel(corr_matrix, fm = "ml")
factor <- fa(corr_matrix, nfactors = 3, rotate = "oblimin", fm = "ml")
print(factor, cut = 0.3, digits = 3)
# -------- Pearson --------
cor(outcome, method = 'pearson', use = "pairwise.complete.obs") %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(outcome)
cortest.bartlett(outcome)
fa.parallel(outcome)
factor1 <- fa(outcome, nfactors = 3, rotate = "oblimin", cor = "tet", fm = "ml")
print(factor1, cut = 0.3, digits = 3)
I would like to have a multidimensional scaling plot according to the following table (this is just a shorter form of the whole table).
I have been trying to do it in R (am quite new here...) but now. I am not even sure about that this type of data is good for multidimensional scaling. The whole table should mirror a semantic (linguistic) map (Thats why I thought that MDS should be good) and the rows mean that informants saw some pictures and gave different expressions (columns) for the pictures, so they described them differently.
The numbers in the columns are no judgments in the sense that they are on a scale from 1 to 10 or something like that but they show how many people used the expression for pic1, pic2, and so forth.
Could anyone help me to explain that MDS is actually the appropriate model I am trying to use? (Sorry, I am just too much confused after reading a lot in the last days about different methods...)
If so, here is the coding I used (just to be sure).
Thanks a lot for any advice!
daten <- structure(list(photos = c("p1", "p5", "p8", "p13", "p19", "p23", "p29", "p34", "p36", "p40", "p59", "p2", "p14"), expression1 = c(18, 8, 11, 15, 14, 16, 10, 12, 15, 18, 18, 0, 0), expression2 = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0), expression3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1), expression4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 17), expression5 = c(0, 3, 5, 0, 0, 0, 1, 5, 1, 0, 0, 0, 0), expression6 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), expression7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), expression8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"))
library("tibble")
has_rownames(daten)
cr<-column_to_rownames(daten, var="photo")
has_rownames(cr)
matr_cr <- as.matrix(cr[,-1])
matr_cr
d<-dist(matr_cr)
fit <- cmdscale(d, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2",
main="Multidimensional Scaling", type="n")
text(x, y, labels = row.names(matr_cr), cex=.6, col="red")
cr
Plotting multidimensional data is difficult and depending on the type of data and analysis is what to do. First of all, if you have several variables, it may be useful to cluster your data, one possible method is k-means that you can find it in the package "ClusterR". Another possible thing to do is to transform your variable by rotating the axis in order to lower the dimension with a Principal Component Analysis (PCA), you can find more about PCA in R in http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/
If you opp to plot your data as it is without a previous analysis, you may use ggplot2 package to make more useful and elegant plots. And to plot your different data attributes you can try changing size, color, shape, etc scales representing different dimensions. The problem with this option is that you can not plot several dimensions.
If I understand you well, you got pictures and people (informants) that make a review of the pictures. And the critics are separated in different levels (dimensions). If it is like that, you got as dimensions pictures, reviewers, and each level of the reviews, that make 2+N variables. Note that you can easily plot up to 5 dimensions in this kind of data, by setting x-axis and y-axis you got 2 dimensions, then you can use size scale for another dimension, color scale for another dimension, and the depending on your data and your preference you can use text or shape scale for the fifth dimension. I do not see in the table you provide the informants (reviewers) dimension. Further below you will found two examples of these plot using ggplot2, note that for shape scale a discrete variable must be used. In order to get beautiful plots and with meaning, you will have to try wich type of scale is better for each of your variables and will strongly depend on your data. Lastly, if you have several dimensions normally you should try first to assess if your data is clusterized or do a PCA.
library(ggplot2)
daten <- structure(list(photos = c("p1", "p5", "p8", "p13", "p19", "p23", "p29", "p34", "p36", "p40", "p59", "p2", "p14"), expression1 = c(18, 8, 11, 15, 14, 16, 10, 12, 15, 18, 18, 0, 0), expression2 = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0), expression3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1), expression4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 17), expression5 = c(0, 3, 5, 0, 0, 0, 1, 5, 1, 0, 0, 0, 0), expression6 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), expression7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), expression8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"))
# with shape scale
ggplot(data = daten,aes(x=photos, y=expression1, col=expression2, size=expression3, shape=as.factor(expression4))) +
geom_point()
# with text scale
ggplot(data = daten,aes(x=expression4, y=expression1, col=expression2, size=expression3, label=photos)) +
geom_text()
I am trying to calculate robustness, a graph theory measure using R (braingraph package).
Robustness = robustness(my_networkgraph, type = c("vertex"), measure = ("btwn.cent"))
I get the following error, when I use the above robustness function:
Error in order(vertex_attr(g, measure), decreasing = TRUE) : argument 1 is not a vector
Any idea, what I am doing wrong here?
My network, which is a matrix has been converted to igraph object and robustness was calculated.
My network as a matrix:
mynetwork <- matrix(c(0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 1, 0, 1, 1,
0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0), nrow = 8)
This matrix was converted as igraph using the following code:
my_networkgraph <-graph_from_adjacency_matrix(mynetwork, mode = c("undirected"),weighted = NULL, diag = TRUE, add.colnames = NULL, add.rownames = NA)
Please help me to understand the above error
Thanks
Priya
There was a bug in the above function. To run the robustness code, you will need to supply a vertex attribute to your network: V(network)$degree <- degree(network) V(network)$btwn.cent <- centr_betw(network)$res
As the question states: I know there are several solutions (see output of GA and check that value and constraints are correct), but I can't get them out of Gurobi.
Edit after #Paleo13's answer: As he states, his answer is a good workround. However I would also love to see, if there is a more efficient option. Therefore, I added a bounty. See here and here for what I know.
Reproducible example:
my_fun <- function(x) {
f <- sum(model$obj * x)
penalty <- sum(abs(model$A %*% x - model$rhs))
return_value <- -f - 1e8 * penalty # sum(model$obj^2) * // 1e7 *
return(return_value)
}
model <- structure(
list(modelsense = "min",
obj = c(0, 40, 20, 40, 0, 20, 20, 20, 0),
A = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 1, 0, 0, 1,
1, 0, -1, 0, 0, 0, 0, -1, 1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 1, -1, 0, 0, 1, 0, -1, 0, 1, 0,
0, 1, 0, 0, -1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
.Dim = c(7L, 9L),
.Dimnames = list(
c("constraint1", "constraint2", "", "", "", "", ""),
NULL)),
rhs = c(1, 1, 0, 0, 0, 1, 1),
sense = c("=", "=", "=", "=", "=", "=", "="),
vtype = "B"),
.Names = c("modelsense", "obj", "A", "rhs", "sense", "vtype"))
# Gurobi:
params <- list(OutputFlag = 1, Presolve = 2, LogToConsole = 1, PoolSearchMode = 2, PoolSolutions = 10)
ilp_result <- gurobi::gurobi(model, params)
print(ilp_result$x)
# GA for cross-check
GA <- GA::ga(type = "binary", fitness = my_fun, nBits = length(model$obj),
maxiter = 3000, run = 2000, popSize = 10, seed = 12)
# Crosscheck:
summary(GA)
my_fun(ilp_result$x)
my_fun(GA#solution[1, ])
my_fun(GA#solution[2, ])
sum(abs(model$A %*% ilp_result$x - model$rhs))
sum(abs(model$A %*% GA#solution[1, ] - model$rhs))
sum(abs(model$A %*% GA#solution[2, ] - model$rhs))
What you describe can be done with the Solution Pool. Gurobi added the R API for the solution pool in version 8.0. You set parameters to control the solution pool; the multiple solutions are returned in the Solution Pool named components. This is illustrated in the poolsearch.R example, which can also be found in the examples\R subdirectory.
Disclaimer: I manage technical support for Gurobi.
Gurobi can indeed store feasible solutions it that encounters while searing for the optimal solution (or rather a solution that fits within a specified opitmality gap). These solutions are stored in a "solution pool". Unfortunately, the gurobi R package does not have the functionality to access the solutions in the solution pool, so if we are looking for a solution that just uses R then we cannot use the solution pool. Also, it's worth noting that the solution pool may not necessarily contain all the feasible solutions, it only contains the solutions that Gurobi found along the way, so if we require all the feasible solutions then we cannot just rely on the solution pool in a single run of Gurobi.
So, with regards to your question, one strategy is to use a method referred to as "Bender's cuts". This basically involves solving the problem, adding in constraints to forbid the solution we just obtained, and then solving the problem again, and repeating this process until there aren't any more feasible solutions. I have written a function that implements this method using the gurobi R package below and applied it to your example. This method may not scale very well to problems with a large number of feasible solutions, because ideally we would access the solution pool to reduce the total number of Gurobi runs, but this is the best approach to my knowledge (but I would love to hear if anyone has any better ideas).
# define functions
find_all_feasible_solutions <- function(model, params) {
# initialize variables
counter <- 0
solutions <- list()
objs <- numeric(0)
# search for feasible solutions until no more exist
while (TRUE) {
# increment counter
counter <- counter + 1
# solve problem
s <- gurobi::gurobi(model, params)
# break if status indicates that no feasible solution found
if (s$status %in% c("INFEASIBLE")) break()
# store set of solutions
solutions[[counter]] <- s$x
objs[[counter]] <- s$objval
# add constraint to forbid solution this solution
model$rhs <- c(model$rhs, sum(s$x) - 1)
model$sense <- c(model$sense, "<=")
model$A <- rbind(model$A, (s$x * 2) - 1)
}
# throw error if no feasible sets of solutions found
if (length(solutions) == 0) {
stop("no feasible solutions found.")
}
# return solutions as matrix
list(x = do.call(rbind, solutions), obj = objs)
}
# create initial model
model <- list(
modelsense = "min",
obj = c(0, 40, 20, 40, 0, 20, 20, 20, 0),
A = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 1, 0, 0, 1,
1, 0, -1, 0, 0, 0, 0, -1, 1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 1, -1, 0, 0, 1, 0, -1, 0, 1, 0,
0, 1, 0, 0, -1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
.Dim = c(7L, 9L),
.Dimnames = list(c("constraint1", "constraint2", "", "", "", "", ""),
NULL)),
rhs = c(1, 1, 0, 0, 0, 1, 1),
sense = c("=", "=", "=", "=", "=", "=", "="),
vtype = "B")
# create parameters
params <- list(OutputFlag = 1, Presolve = 2, LogToConsole = 1)
# find all feasible solutions
output <- find_all_feasible_solutions(model, params)
# print number of feasible solutions
print(length(output$obj))
I have this data frame
d1 <- c(1, 0, 0, 1, 0, 0, 0, 1)
d2 <- c(0, 1, 0, 1, 1, 0, 0, 0)
d3 <- c(0, 0, 1, 0, 0, 0, 1, 0)
d4 <- c(0, 0, 0, 1, 0, 0, 0, 0)
d5 <- c(0, 0, 0, 0, 0, 0, 1, 0)
d6 <- c(0, 0, 0, 1, 0, 1, 0, 1)
d7 <- c(0, 0, 1, 0, 0, 1, 0, 1)
d8 <- c(1, 0, 0, 0, 0, 0, 0, 1)
d9 <- c(0, 0, 0, 0, 0, 1, 0, 1)
d10 <- c(1, 1, 0, 0, 0, 1, 0, 1)
df <- as.data.frame(rbind(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10))
str(df)
I get all lines where V8 == 1, and find the relative frequencies for each column like this (for example column 2, V2):
table(df[which(df$V8==1),][2])/sum(as.numeric(df[which(df$V8==1),]$V8))
0 1
0.8333333 0.1666667
My question is how can I get each relative frequency individually, let's say set it into a new variable. I found this
How to extract value from table function in R
but it does not work in my case, since 0 and 1 are numericals.
table(df[which(df$V8==1),][2])/sum(as.numeric(df[which(df$V8==1),]$V8))["1"]
use as.numeric, and then, after that, change them to ratios
the numbers 0 and 1 are extracted with
as.numeric(names(table(data)))
and the numbers 64 and 17 are extracted with
counts<-as.numeric(table(data))
then
ratios<-counts/sum(counts)
Not completely sure about what you're trying to do but...
sapply(subset(df, V8==1), function(x) sum(x==1)/length(x))