R: trying to omit missing values before plotting - r

Any idea why omitting N/A does not work with this code?
d <- density(Data$item2) %>%
na.omit()
I get the error Error in density.default(Data$item2) : 'x' contains missing values
This didn't work either
d <- Data %>% na.omit() %>%
density(Data$item2)
My data
structure(list(item1 = c(5, 5, 5, 5, 4, 4, 2, 1, 3,
4, 4, 3, 2, 5, 2, 4, 4, 3, 6, 5, 3, 2, 5, 3, 3, 1, 3, 5, 1, 3,
2, 6, 3, 5, 4, 4, 3, 5, 6, 3, 2, 6, 6, 5, 2, 2, 2, 3, 3, 3),
item2 = c(5, 4, 5, 1, 2, 2, 3, 2, 2, 2, 2, 3, 2,
5, 1, 4, 4, 3, 3, 5, 3, 2, 4, 4, 3, 4, 4, 3, 7, NA, 2, 4,
2, 4, 2, 3, 5, 3, 5, 3, 2, 6, 6, 7, 2, 3, 2, 3, 1, 4)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
I also tried to omit all the N/A in the beginning with this code, but it did not solve the problem
Data <- read_excel("C:/location/Data.xlsx") %>%
na.omit()
So, how to do this? Thanks for your help!

You need to remove the NA values from your data, not from the density object.
Data$item2 %>%
na.omit() %>%
density() %>%
plot()
Alternatively, use the na.rm = TRUE argument in density:
Data$item2 %>%
density(na.rm = TRUE) %>%
plot()

You can use:
`d <- Data %>% na.omit
density(d$item2)`.

Related

R: item name missing in the plot legend

With this code I get the plot I want
d <- density(mydata$item1)
plot(d)
This code is the same, but omits N/As. And there is a flaw in the plot's legend. As you can see, it doesn't tell what item is plotted, (x = .)
Can you tell where is the matter and how to fix it? Thank you for your help.
My data
structure(list(item1 = c(5, 5, 5, 5, 4, 4, 2, 1, 3, 4, 4, 3,
2, 5, 2, 4, 4, 3, 6, 5, 3, 2, 5, 3, 3, 1, 3, 5, 1, 3, 2, 6, 3,
5, 4, 4, 3, 5, 6, 3, 2, 6, 6, 5, 2, 2, 2, 3, 3, 3), item2 = c(5,
4, 5, 1, 2, 2, 3, 2, 2, 2, 2, 3, 2, 5, 1, 4, 4, 3, 3, 5, 3, 2,
4, 4, 3, 4, 4, 3, 7, NA, 2, 4, 2, 4, 2, 3, 5, 3, 5, 3, 2, 6,
6, 7, 2, 3, 2, 3, 1, 4), item3 = c(5, 5, 6, 7, 3, 4, 5, 2, 2,
6, 4, 2, 5, 7, 1, 2, 4, 5, 6, 6, 5, 2, 6, 5, 6, 4, 6, 4, 6, 4,
6, 5, 5, 6, 6, 6, 5, 6, 7, 5, 5, 7, 7, 6, 2, 6, 6, 6, 5, 3)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
Use the main = argument inside plot to make the title say whatever you want it to.
Data$item2 %>%
na.omit() %>%
density() %>%
plot(main = 'Density of Data$item2')
you had a little typo in your code as the density() call was piped into a plot call refering to the variable it was been written to ... this might have resulted in the strange plot.
In general the density() function won't work with NA values acording to the documentation so you have to set the argument na.rm = TRUE as the default is FALSE for the plot to work correctly... also as #AllanCameron pointed out in an earlier answer you can set the plot title manually.
d <- density(mydata$item2, na.rm = TRUE)
plot(d)
Possibly you can substitute, interpolate or impute the NA values so that you do not have to remove them for the denstiy() call. Though this obviously depends on your data, context and goals.

How can I change size of y-axis text labels on a likert() object in R?

I'm working with the likert() library to generate nice looking diverging stacked bar charts in R. Most of the formatting has come together, but I can't seem to find a way to shrink the text for the y-axis labels (e.g. "You and your family in the UK", "People in your local area..." etc.) which are too large for the plot. Any ideas here? I'm starting to wonder if I need to revert to ggplot, which will require more code, but have more customisability...
# Ingest data to make reproducible example:
climate_experience_data <- structure(list(Q25_self_and_family = c(4, 2, 3, 5, 3, 3, 4, 2,
4, 2, 4, 4, 3, 3, 2, 5, 3, 4, 1, 3, 3, 2, 4, 2, 2, 2, 4, 3, 3,
3, 2, 5, 5, 4, 2, 2, 2, 3, 1, 3, 2, 1, 2, 4, 2), Q25_local_area = c(3,
3, 3, 5, 3, 2, 4, 2, 4, 2, 4, 3, 2, 3, 2, 5, 4, 5, 1, 4, 3, 3,
4, 2, 3, 2, 3, 3, 2, 3, 2, 5, 5, 2, 2, 2, 2, 3, 1, 1, 2, 1, 2,
4, 3), Q25_uk = c(4, 3, 3, 5, 2, 3, 5, 2, 4, 2, 4, 3, 3, 3, 3,
5, 4, 5, 2, 3, 3, 2, 4, 2, 4, 3, 4, 3, 2, 4, 4, 5, 5, 4, 3, 3,
2, 4, 2, 5, 2, 2, 2, 3, 3), Q25_outside_uk = c(4, 4, 3, 5, 4,
4, 5, 2, 4, 3, 3, 3, 3, 4, 3, 5, 4, 5, 4, 3, 3, 2, 4, 2, 5, 3,
3, 2, 2, 3, 4, 4, 5, 4, 4, 3, 2, 4, 4, 5, 2, 3, 2, 2, 2)), row.names = c(NA,
-45L), class = c("tbl_df", "tbl", "data.frame"))
# load libraries:
require(tidyverse)
require(likert)
# Q25 - generate diverging stacked bar chart using likert()
q25_data <- select(climate_experience_data, Q25_self_and_family:Q25_outside_uk)
names(q25_data) <- c("You and your family in the UK", "People in your local area or city", "The UK as a whole", "Your family and/or friends living outside the UK")
# Set up levels text for question responses
q25_levels <- paste(c("not at all", "somewhat", "moderately", "very", "extremely"),
"serious")
q25_likert_table <- q25_data %>%
mutate(across(everything(),
factor, ordered = TRUE, levels = 1:5, labels=q25_levels)) %>%
as.data.frame %>%
# make plot:
plot(q25_likert_table, wrap=20, text.size=3, ordered=FALSE, low.color='#B18839', high.color='#590048') +
ggtitle(title) +
labs(title = "How serious a threat do you think \nclimate change poses to the following?", y="") +
guides(fill = guide_legend(title = NULL)) +
theme_ipsum_rc() +
theme()
Here's a sample of output:
As your plot is still a ggplot object you could adjust the size of the y axis labels via theme(axis.text.y = ...):
library(tidyverse)
library(likert)
library(hrbrthemes)
q25_likert_table <- q25_data %>%
mutate(across(everything(),
factor,
ordered = TRUE, levels = 1:5, labels = q25_levels
)) %>%
as.data.frame() %>%
likert()
plot(q25_likert_table, wrap = 20, text.size = 3, ordered = FALSE, low.color = "#B18839", high.color = "#590048") +
ggtitle(title) +
labs(title = "How serious a threat do you think \nclimate change poses to the following?", y = "") +
guides(fill = guide_legend(title = NULL)) +
theme_ipsum_rc() +
theme(axis.text.y = element_text(size = 4))

Plots are not stored in list during loop

I have a problem similar to what is found here. I have a loop which runs through some modelling for different pairs of variables. Probably should not have used loops to go through them, but right now that is too late. Then I want to create a plot for each run. At first nothing showed before looking at that post. Looking at the post and implementing the best answer i could at least print the plots, but they still were not stored. The idea is to generate the plots, and then use grid.arrange to plot them together. Could someone show how to fix it? Here is some random data and the loop from example:
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3)
data2 <- data.frame(col1,col2,col3)
data2[,1:3] <- lapply(data2[,1:3], as.factor)
colnames(data2)<- c("A","B","C")
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
I tried to change print to return, but to no avail. I get the plots printed in the View window in Rstudio, but the plots are not stored at all.
You can use the following code -
library(ggplot2)
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
myplots[[i]] <- ggplot(data2, aes(x = .data[[colnames(data2)[i]]])) +
geom_histogram(fill = "lightgreen")
}
However, using lapply would be easier.
myplots <- lapply(names(data2), function(x)
ggplot(data2, aes(x = .data[[x]])) + geom_histogram(fill = "lightgreen"))
Plot the list of plots with grid.arrange.
gridExtra::grid.arrange(grobs = myplots)
data
A <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3)
B <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3)
C <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3)
data2 <- data.frame(A,B,C)
Does this work for you?, With patchwork and purrr::reduce we can club these graphs to stack(horizontal or vertical) with each other. You can also use slashes(/) instead of plus(+) in reduce to make it appended vertically instead of horizontally. If you want to plot histogram you should have continuous data , In case you do want to plot counts for discrete data you should try geom_bar. If you do want to check for geom_bar then you need to convert the columns into factors. I am not so sure what plot you want to carry out, I am assuming that you have continuous data and you want to carry out histogram here. Please let me know if it doesn't work in your scenario.
library(tidyverse)
library(patchwork)
data2 <- data.frame(col1, col2, col3) ## No conversion of factors
nm <- names(data2)
g1 <- reduce(map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count')), `+`)
print(g1)
Or with slashes:
g2 <- reduce(map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count')), `/`)
print(g2)
Or if you want to have for loops then probably you can do this as well, you already have intialised myplots so not adding it here:
for (i in seq_along(data2)) {
myplots[[i]] <-
ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
}
Explanation:
Now you can use reduce with your myplots to arrange them, Note here myplots should be containing your 3 plots :
reduce(myplots, `+`)
for arranging it.
The map2 and reduce is similar solution, with map2 you are getting 3 plots saved into a list, so 3 objects are returned from below code:
plots <- map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count'))
To add them (arrange) them all you have to do is to use patchwork like below:
plots[[1]] + plots[[2]] + plots[[3]], but then its quite cumbersome, so we use reduce to make it happen like below:
reduce(plots, `+`)
Also like I mentioned earlier you can use slash instead of plus to make the arrangement vertical than horizontal. with plot_layout option in patchwork, you can create more flexible plots. You can check here .
with gridExtra : gridExtra::grid.arrange(grobs = (myplots)), again instead of myplots, it can be any list that contain ggplot objects.

Math Symbols within for loop of GGplots in R

I'm currently trying to develop a similar result as this link. I have a significant number of columns and several different labels for the x-axis.
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
> x.axis.list
[[1]]
expression(beta[paste(1, ",", 1L)])
[[2]]
expression(beta[paste(1, ",", 2L)])
[[3]]
expression(beta[paste(1, ",", 3L)])
[[4]]
expression(beta[paste(1, ",", 4L)])
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(x.axis.list[[i]])
print(p1)
})
}
In the past, I've been able to do something similar to this where I can just put x.axis.list[[i]] in my loop and change the symbols. However, I continue to get the term expression on the axis. So the symbol for Beta is correct as well as the subscript but the word "expression" remains. I'm not sure exactly what I'm doing wrong, for a moment, I was able to produce a plot without "expression" but it has since stayed in the ggplot.
I want to be able to produce this plot, or one with the title on the y-axis without the word "expression".
My image currently looks . I'm not worried about this example data and the result of the plot, I'm wondering how to get rid of "expression" so only the math symbol shows.
Thanks in advance.
You can do:
for (i in seq_along(data2)) {
df <- data2[i]
names(df)[1] <- "x"
myplots[[i]] <- local({
p1 <- ggplot(df, aes(x = x)) +
geom_bar(fill = "lightgreen", stat = "count") +
xlab(x.axis.list[[i]])
})
}
And we can show all the plots together:
library(patchwork)
(myplots[[1]] + myplots[[2]]) / (myplots[[3]] + myplots[[4]])
Note I created the expression list like this:
x.axis.list <- lapply(1:4, function(i){
parse(text = paste0("beta[paste(1, \",\", ", i, ")]"))
})

Histogram for diagonal axis in scatterplot

I have a 5 x 5 scatterplot matrix that I created using ggplot. I made histograms for X and Y axis, but I needed an additional histogram for the diagonals of the matrix as well.
Edited for data
data <- structure(c(5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4,
3, 5, 4, 5, 2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4,
4, 4, 3, 3, 5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1,
3, 4, 5, 3, 2, 4, 3, 4, 1, 4, 3, 5, 2, 3, 3, 4, 5, 5, 5, 4, 3,
1, 1, 4, 2, 5, 4, 4, 1, 5, 3, 4, 2, 4, 3, 4, 4, 5, 4, 5, 1, 4,
5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4, 3, 5, 4, 5,
2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4, 4, 4, 3, 3,
5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1, 3, 3, 5, 2,
1, 1, 4, 5, 4, 5, 1, 1, 5, 4, 5, 3, 1, 3, 5, 5, 5, 5, 2, 1, 1,
1, 2, 3, 5, 1, 2, 5, 3, 5, 4, 5, 2, 2, 5, 2, 3, 5), .Dim = c(101L,
2L))
Here is the code
library(ggplot2)
library(gridExtra)
data <- as.data.frame(data)
x <- data$V2
y <- data$V1
xhist <- qplot(x, geom="histogram", binwidth = 0.5)
yhist <- qplot(y, geom="histogram", binwidth = 0.5) + coord_flip()
none <- ggplot()+geom_point(aes(1,1), colour="white") +
theme(axis.ticks=element_blank(), panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
g1 <- ggplot(data, aes(x,y)) +
geom_point(size = 1, position = position_jitter(w=0.3, h=0.3))
grid.arrange(yhist, g1, none, xhist, ncol=2, nrow=2, widths=c(1, 4), heights=c(4,1))
Is there a way to directly plot z-axis histogram from this data alone? What I want is to remove the panel of 'none', and instead place a histogram for data points across the diagonal.

Resources