There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems
I try to simulate a data for you
df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5,
5), var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3), var3 = c(6,
7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)), .Names = c("Group",
"var1", "var2", "var3"), row.names = c(NA, -15L), class = "data.frame")
then I do as follows:
fit <- lda(Group~., data=df)
plot(fit)
I end up with groups appearing in two different plots.
how to plot my results in one figure like e.g. Linear discriminant analysis plot
Linear discriminant analysis plot using ggplot2
or any other beautiful plot ?
The plot() function actually calls plot.lda(), the source code of which you can check by running getAnywhere("plot.lda"). This plot() function does quiet a lot of processing of the LDA object that you pass in before plotting. As a result, if you want to customize how your plots look, you will probably have to write your own function that extracts information from the lda object and then passes it to a plot fuction. Here is an example (I don't know much about LDA, so I just trimmed the source code of the default plot.lda and use ggplot2 package (very flexible) to create a bunch of plots).
#If you don't have ggplot2 package, here is the code to install it and load it
install.packages("ggplot2")
library("ggplot2")
library("MASS")
#this is your code. The only thing I've changed here is the Group labels because you want a character vector instead of numeric labels
df <- structure(list(Group = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b"),
var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5, 5),
var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3),
var3 = c(6, 7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)),
.Names = c("Group","var1", "var2", "var3"),
row.names = c(NA, -15L), class = "data.frame")
fit <- lda(Group~., data=df)
#here is the custom function I made that extracts the proper information from the LDA object. You might want to write your own version of this to make sure it works with all cases (all I did here was trim the original plot.lda() function, but I might've deleted some code that might be relevant for other examples)
ggplotLDAPrep <- function(x){
if (!is.null(Terms <- x$terms)) {
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
if (xint > 0L)
X <- X[, -xint, drop = FALSE]
}
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
rtrn <- data.frame(X,labels=as.character(g))
return(rtrn)
}
fitGraph <- ggplotLDAPrep(fit)
#Here are some examples of using ggplot to display your results. If you like what you see, I suggest to learn more about ggplot2 and then you can easily customize your plots
#this is similar to the result you get when you ran plot(fit)
ggplot(fitGraph, aes(LD1))+geom_histogram()+facet_wrap(~labels, ncol=1)
#Same as previous, but all the groups are on the same graph
ggplot(fitGraph, aes(LD1,fill=labels))+geom_histogram()
The following example won't work with your example because you don't have LD2, but this is equivalent to the scatter plot in the external example you provided. I've loaded that example here as a demo
ldaobject <- lda(Species~., data=iris)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
I didn't customize ggplot settings much, but you can make your graphs look like anything you want if you play around with it.Hope this helps!
Related
I want to calculate the sum of y along the x-axis. The range for summation is contained in the separate columns xmin and xmax.
df <- data.frame (group = c("A","A","A","A","A","B","B","B","B","B" ),
x = c(1,2,3,4,5,1,2,3,4,5),
y= c(1,2,3,2,1,4,5,6,5,4),
xmin=c(2,2,2,2,2,1,1,1,1,1),
xmax=c(4,4,4,4,4,5,5,5,5,5))
For group A that is a range x from 2 to 4, sum{2+3+2}=7
For group B, range x from 1 to 5 sum{4+5+6+5+4}=24
Is there a way to do it?
I have tried around a bit but I'm not sure if the following goes in the right direction
df %>% rowwise() %>% mutate(sumX=sum(df$y[df$x>=df$min & df$x<=df$max]))
Using between to subset, then just sum in tapply.
subset(df, do.call(data.table::between, c(list(x), list(xmin, xmax)))) |>
with(tapply(y, group, sum))
# A B
# 7 24
Note: R >= 4.1 used.
Data:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), x = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), y = c(1, 2, 3,
2, 1, 4, 5, 6, 5, 4), xmin = c(2, 2, 2, 2, 2, 1, 1, 1, 1, 1),
xmax = c(4, 4, 4, 4, 4, 5, 5, 5, 5, 5)), class = "data.frame", row.names = c(NA,
-10L))
I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)
My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.
To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.
(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]])
through print(myplots[[4]]) one at a time.)
Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.
(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)
Here is a reproducible example:
library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function
#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
When I look at a summary of a plot object in the plot list, this is what I see
> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping: x = data2[, i]
faceting: facet_null()
-----------------------------------
geom_histogram: fill = lightgreen
stat_bin:
position_stack: (width = NULL, height = NULL)
I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.
Thanks!
In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_histogram(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).
1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.
Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- eval(substitute(
ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
,list(i = i)))
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
Using lapply works too as x exists within the anonymous function environment (using mtcars as data):
plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
ggplot(data = mtcars) +
geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
theme_wsj() +
scale_colour_wsj("colors6")
})
I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.
Here is the code with the visualizations:
Question
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_bar(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Answer
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_bar(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
multiplot(plotlist = myplots, cols = 4)
Same result using lapply:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_bar(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Created on 2021-04-09 by the reprex package (v0.3.0)
Error in data - x : non-numeric argument to binary operator
My code is as follows:
x <- as.factor(c(2, 2, 8, 5, 7, 6, 1, 4))
y <- as.factor(c(10, 5, 4, 8, 5, 4, 2, 9))
coordinates <- data.frame(x, y)
colnames(coordinates) <- c("x_coordinate", "y_coordinate")
print(coordinates)
point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE,
method = c("hybrid", "raw", "dist"), seeds = TRUE,
showplot = 1, countmode = NULL)
point_clusters
But I'm getting following error while executing the above code:
> point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE, method = c("hybrid", "r ..." ... [TRUNCATED]
Error in data - x : non-numeric argument to binary operator
I don't know what is the problem with above code.
I solved the problem as per my need. I saw somewhere that the data needs to be numeric matrix, although I'm not sure about that. So, here is what I did:
x <- c(2, 2, 8, 5, 7, 6, 1, 4)
y <- c(10, 5, 4, 8, 5, 4, 2, 9)
coordinates <- matrix(c(x, y), nrow = 8, byrow = FALSE)
Remaining code is same as above. Now it works fine for me.
I want to extract pairs of data from a data frame, where they are paired with data that is not in their own column. Each number in column 1 is paired with all the numbers to the right of that column. Likewise numbers in column 2 are only paired with numbers in columns 3 or above.
I have created a script that does it using a bird's nest of 'for' loops but I feel there should be a more elegant way to do it.
Example data:
structure(list(A = 1:3, B = 4:6, C = 7:9), .Names = c("A", "B",
"C"), class = "data.frame", row.names = c(NA, -3L))
Desired output:
structure(list(X1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6), X2 = c(4, 5, 6, 7,
8, 9, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, 7,
8, 9)), .Names = c("X1", "X2"), row.names = c(NA, 27L), class = "data.frame")
Here's an approach using data.table package and its very efficient CJ and rbindlist functions (assuming your data set called df)
library(data.table)
res <- rbindlist(lapply(seq_len(length(df) - 1),
function(i) CJ(df[, i], unlist(df[, -(seq_len(i))]))))
You could then set your column names by reference (if you insist on "X1" and "X2") using setnames
setnames(res, 1:2, c("X1", "X2"))
You can also convert back to data.frame by reference (if you want to match your desired output "exactly") by using setDF()
setDF(res)
Here df is the input dataset
out1 <- do.call(rbind,lapply(1:(ncol(df)-1), function(i) {
x1 <- df[,i:(ncol(df))]
Un1 <-unique(unlist(x1[,-1]))
data.frame(X1=rep(x1[,1], each=length(Un1)), X2= Un1)}))
all.equal(out, out1) #if `out` is the expected output
#[1] TRUE
Another approach:
res <- do.call(rbind, unlist(lapply(seq(ncol(dat) - 1), function(x)
lapply(seq(x + 1, ncol(dat)), function(y)
"names<-"(expand.grid(dat[c(x, y)]), c("X1", "X2")))),
recursive = FALSE))
where dat is the name of your data frame.
You can sort the result with this command:
res[order(res[[1]], res[[2]]), ]