Related
Any idea why omitting N/A does not work with this code?
d <- density(Data$item2) %>%
na.omit()
I get the error Error in density.default(Data$item2) : 'x' contains missing values
This didn't work either
d <- Data %>% na.omit() %>%
density(Data$item2)
My data
structure(list(item1 = c(5, 5, 5, 5, 4, 4, 2, 1, 3,
4, 4, 3, 2, 5, 2, 4, 4, 3, 6, 5, 3, 2, 5, 3, 3, 1, 3, 5, 1, 3,
2, 6, 3, 5, 4, 4, 3, 5, 6, 3, 2, 6, 6, 5, 2, 2, 2, 3, 3, 3),
item2 = c(5, 4, 5, 1, 2, 2, 3, 2, 2, 2, 2, 3, 2,
5, 1, 4, 4, 3, 3, 5, 3, 2, 4, 4, 3, 4, 4, 3, 7, NA, 2, 4,
2, 4, 2, 3, 5, 3, 5, 3, 2, 6, 6, 7, 2, 3, 2, 3, 1, 4)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
I also tried to omit all the N/A in the beginning with this code, but it did not solve the problem
Data <- read_excel("C:/location/Data.xlsx") %>%
na.omit()
So, how to do this? Thanks for your help!
You need to remove the NA values from your data, not from the density object.
Data$item2 %>%
na.omit() %>%
density() %>%
plot()
Alternatively, use the na.rm = TRUE argument in density:
Data$item2 %>%
density(na.rm = TRUE) %>%
plot()
You can use:
`d <- Data %>% na.omit
density(d$item2)`.
I have a problem similar to what is found here. I have a loop which runs through some modelling for different pairs of variables. Probably should not have used loops to go through them, but right now that is too late. Then I want to create a plot for each run. At first nothing showed before looking at that post. Looking at the post and implementing the best answer i could at least print the plots, but they still were not stored. The idea is to generate the plots, and then use grid.arrange to plot them together. Could someone show how to fix it? Here is some random data and the loop from example:
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3)
data2 <- data.frame(col1,col2,col3)
data2[,1:3] <- lapply(data2[,1:3], as.factor)
colnames(data2)<- c("A","B","C")
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
I tried to change print to return, but to no avail. I get the plots printed in the View window in Rstudio, but the plots are not stored at all.
You can use the following code -
library(ggplot2)
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
myplots[[i]] <- ggplot(data2, aes(x = .data[[colnames(data2)[i]]])) +
geom_histogram(fill = "lightgreen")
}
However, using lapply would be easier.
myplots <- lapply(names(data2), function(x)
ggplot(data2, aes(x = .data[[x]])) + geom_histogram(fill = "lightgreen"))
Plot the list of plots with grid.arrange.
gridExtra::grid.arrange(grobs = myplots)
data
A <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3)
B <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3)
C <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3)
data2 <- data.frame(A,B,C)
Does this work for you?, With patchwork and purrr::reduce we can club these graphs to stack(horizontal or vertical) with each other. You can also use slashes(/) instead of plus(+) in reduce to make it appended vertically instead of horizontally. If you want to plot histogram you should have continuous data , In case you do want to plot counts for discrete data you should try geom_bar. If you do want to check for geom_bar then you need to convert the columns into factors. I am not so sure what plot you want to carry out, I am assuming that you have continuous data and you want to carry out histogram here. Please let me know if it doesn't work in your scenario.
library(tidyverse)
library(patchwork)
data2 <- data.frame(col1, col2, col3) ## No conversion of factors
nm <- names(data2)
g1 <- reduce(map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count')), `+`)
print(g1)
Or with slashes:
g2 <- reduce(map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count')), `/`)
print(g2)
Or if you want to have for loops then probably you can do this as well, you already have intialised myplots so not adding it here:
for (i in seq_along(data2)) {
myplots[[i]] <-
ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
}
Explanation:
Now you can use reduce with your myplots to arrange them, Note here myplots should be containing your 3 plots :
reduce(myplots, `+`)
for arranging it.
The map2 and reduce is similar solution, with map2 you are getting 3 plots saved into a list, so 3 objects are returned from below code:
plots <- map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count'))
To add them (arrange) them all you have to do is to use patchwork like below:
plots[[1]] + plots[[2]] + plots[[3]], but then its quite cumbersome, so we use reduce to make it happen like below:
reduce(plots, `+`)
Also like I mentioned earlier you can use slash instead of plus to make the arrangement vertical than horizontal. with plot_layout option in patchwork, you can create more flexible plots. You can check here .
with gridExtra : gridExtra::grid.arrange(grobs = (myplots)), again instead of myplots, it can be any list that contain ggplot objects.
I'm currently trying to develop a similar result as this link. I have a significant number of columns and several different labels for the x-axis.
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
> x.axis.list
[[1]]
expression(beta[paste(1, ",", 1L)])
[[2]]
expression(beta[paste(1, ",", 2L)])
[[3]]
expression(beta[paste(1, ",", 3L)])
[[4]]
expression(beta[paste(1, ",", 4L)])
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(x.axis.list[[i]])
print(p1)
})
}
In the past, I've been able to do something similar to this where I can just put x.axis.list[[i]] in my loop and change the symbols. However, I continue to get the term expression on the axis. So the symbol for Beta is correct as well as the subscript but the word "expression" remains. I'm not sure exactly what I'm doing wrong, for a moment, I was able to produce a plot without "expression" but it has since stayed in the ggplot.
I want to be able to produce this plot, or one with the title on the y-axis without the word "expression".
My image currently looks . I'm not worried about this example data and the result of the plot, I'm wondering how to get rid of "expression" so only the math symbol shows.
Thanks in advance.
You can do:
for (i in seq_along(data2)) {
df <- data2[i]
names(df)[1] <- "x"
myplots[[i]] <- local({
p1 <- ggplot(df, aes(x = x)) +
geom_bar(fill = "lightgreen", stat = "count") +
xlab(x.axis.list[[i]])
})
}
And we can show all the plots together:
library(patchwork)
(myplots[[1]] + myplots[[2]]) / (myplots[[3]] + myplots[[4]])
Note I created the expression list like this:
x.axis.list <- lapply(1:4, function(i){
parse(text = paste0("beta[paste(1, \",\", ", i, ")]"))
})
I have a 5 x 5 scatterplot matrix that I created using ggplot. I made histograms for X and Y axis, but I needed an additional histogram for the diagonals of the matrix as well.
Edited for data
data <- structure(c(5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4,
3, 5, 4, 5, 2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4,
4, 4, 3, 3, 5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1,
3, 4, 5, 3, 2, 4, 3, 4, 1, 4, 3, 5, 2, 3, 3, 4, 5, 5, 5, 4, 3,
1, 1, 4, 2, 5, 4, 4, 1, 5, 3, 4, 2, 4, 3, 4, 4, 5, 4, 5, 1, 4,
5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4, 3, 5, 4, 5,
2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4, 4, 4, 3, 3,
5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1, 3, 3, 5, 2,
1, 1, 4, 5, 4, 5, 1, 1, 5, 4, 5, 3, 1, 3, 5, 5, 5, 5, 2, 1, 1,
1, 2, 3, 5, 1, 2, 5, 3, 5, 4, 5, 2, 2, 5, 2, 3, 5), .Dim = c(101L,
2L))
Here is the code
library(ggplot2)
library(gridExtra)
data <- as.data.frame(data)
x <- data$V2
y <- data$V1
xhist <- qplot(x, geom="histogram", binwidth = 0.5)
yhist <- qplot(y, geom="histogram", binwidth = 0.5) + coord_flip()
none <- ggplot()+geom_point(aes(1,1), colour="white") +
theme(axis.ticks=element_blank(), panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
g1 <- ggplot(data, aes(x,y)) +
geom_point(size = 1, position = position_jitter(w=0.3, h=0.3))
grid.arrange(yhist, g1, none, xhist, ncol=2, nrow=2, widths=c(1, 4), heights=c(4,1))
Is there a way to directly plot z-axis histogram from this data alone? What I want is to remove the panel of 'none', and instead place a histogram for data points across the diagonal.
I have a dataset (dat), which I am hard-coding in here:
dat = c(5, 9, 5, 6, 5, 6, 8, 4, 6, 4, 6, 6, 4, 6, 4, 6, 5, 5, 6, 5, 6, 7, 4, 5, 4, 4, 6, 4, 4, 5, 7, 6, 3, 5, 5, 5, 5, 4, 6, 3, 6, 5, 4, 6, 5, 8, 4, 8, 5, 5, 4, 4, 6, 6, 4, 6, 4, 7, 4, 1, 4, 6, 3, 6, 3, 4, 6, 6, 3, 6, 6, 2, 5, 5, 4, 7, 6)
table(dat)
By doing the table function above on the data, I see that there should be a count of 1 for values of 1, and count of 1 for values of 2. However, when I plot the data using hist, I get a count of 2.
hist(dat, col="lightgreen", labels = TRUE, xlim=c(0,10), ylim=c(0,27))
This is the first problem. The other problem is that I am trying to plot the x label value for the corresponding bin (where there should be 11 bins, labeled 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10). Even though I have no 0 values or 10 values, I would like to illustrate that they had a count of 0, and have their bins - like the rest- labeled. How can I accomplish that?
Thanks.
am = hist(dat, col="lightgreen", labels = TRUE,
breaks=seq(min(dat)-2,max(dat)),
axes=F)
axis(2)
axis(1,at=am$mids,seq(min(dat)-1,max(dat)))
Did you mean like this:
hist(dat, col="lightgreen", labels = TRUE,
xlim=c(0,10), ylim=c(0,27), breaks = 0:10, at=0:10)