How to apply ggplot2 to each row in a data frame - r

I want to code a ggplot2 visualization as a function, and then apply the function on each row of a dataframe (I want to use apply to avoid a for loop, as suggested here.)
The data:
library(ggplot2)
point1 <- c(1,2)
point2 <- c(2,2)
points <-as.data.frame(rbind(point1,point2))
I saved points as a data frame and it runs fine in ggplot2:
ggplot(data = points) +
geom_point(aes(x = points[, 1], y = points[, 2])) +
xlim(-3, 3) +
ylim(-3, 3) +
theme_bw()
That's not really the plot I want though: I would like two plots, each one with one point.
Now I build a function that will loop through the rows of the data frame:
plot_data <- function(data) {
ggplot(data) +
geom_point(aes(x = data[, 1], y = data[, 2])) +
xlim(-3, 3) +
ylim(-3, 3) +
theme_bw()
}
I create a list to store the plots:
myplots <- list()
And here is the call to apply, following this suggestion:
myplots <- apply(points, 1, plot_data)
But I get the following error:
#> Error: `data` must be a data frame, or other object coercible by `fortify()`,
not a numeric vector
But my data are a data frame.
Is this because: "apply() will try to convert the data.frame into a matrix (see the help docs). If it does not gracefully convert due to mixed data types, I'm not quite sure what would result" as noted in a comment to the answer I referred to?
Still, if I check the data class after the call to apply, the data are still a dataframe:
class(points)
#> [1] "data.frame"
Created on 2021-04-09 by the reprex package (v0.3.0)

As suggested by Gregor Thomas in the comment:
library(ggplot2)
point1 <- c(1, 2)
point2 <- c(2, 2)
points <- as.data.frame(rbind(point1, point2))
plot_data <- function(data) {
ggplot(data) +
geom_point(aes(x = data[, 1], y = data[, 2])) +
xlim(-3, 3) +
ylim(-3, 3) +
theme_bw()
}
myplots <- list()
myplots <- lapply(1:nrow(points), function(i) plot_data(points[i, ]))
myplots
#> [[1]]
#>
#> [[2]]
Created on 2021-04-09 by the reprex package (v0.3.0)

Related

problem plotting time series graph in r with date

I need to plot a time series graph but the data that I'm using is proving to be quite challenging.
Ideally, I'd like a graph that looks something like this:
But mine looks like this:
I have tried a series of different things but none of them have worked.
The dataset can be found here and I'll attach a picture of what the dataset itself looks like:
some code I have tried includes
ggplot( aes(x=date, y=northEast)) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("test") +
theme_ipsum()
ggplot(covidData2) +
geom_line(
mapping = aes(x = weekBeginning, y=northEast, group=northEast)
)
Any help would be greatly appreciated!
You need to tidy your data up before plotting it. If you look at your data frame, all of the "numeric" columns have been interpreted as character vectors because the column names are nested and therefore appear in the first couple of rows. You need to consolidate these and convert them to column names. Then, you need to convert the numeric columns to numbers. Finally, you need to parse the dates, as ggplot will simply read the periods as character vectors:
library(readxl)
library(lubridate)
library(ggplot2)
library(hrbrthemes)
wb <- read_xlsx(path.expand("~/covid.xlsx"), sheet = "Table 9")
df <- as.data.frame(wb)
df[1, 1] <- ""
for(i in 2:length(df)) {
if(is.na(df[1, i])) df[1, i] <- df[1, i - 1]
}
nms <- trimws(paste(df[1,], df[2,]))
df <- df[-c(1:2),]
names(df) <- nms
df <- df[sapply(df, function(x) !all(is.na(x)))]
df[-1] <- lapply(df[-1], as.numeric)
df <- head(df, -3)
df$Period <- dmy(substr(df$Period, 1, 10))
Now we can plot:
ggplot(df, aes(x = Period, y = `North East Rate`)) +
geom_area(fill = "#69b3a2", alpha=0.5) +
geom_line(color = "#69b3a2") +
ylab("Rate per 100,000") +
xlab("") +
theme_ipsum()
Created on 2022-03-08 by the reprex package (v2.0.1)

Inserting horizontal line to line chart in ggplot2

I must plot 25 plots, each with its own dataset. I need to insert a horizontal line into each plot. Problem is, the coordinates cannot be hardcoded as each dataset's range varies.
I need to have the horizontal line always to be at the first value of the according dataset
This is my geom for the line that I tried (the y-axis intercept is hardcoded in this case and doesnt help).
+ geom_hline(yintercept=c(75,0), linetype="dotted")
I can grab the value (which is at the identical position in each dataset for each plot) for each line's y-intersepction with this:
dataset[1, 6]
which I could also store in a vector like this
coord <- dataset[1, 6]
But not having any success bringing this together
I tried with no luck:
+ geom_hline(yintercept=coord, linetype="dotted")
Example Code:
a <- c(10,40,30,22)
b <- c(1,2,3,4)
df <- data.frame(a,b)
try <- df %>% ggplot(aes(x = b, y = a)) + geom_line() + scale_y_continuous(expand = c(0,0), limits = c(0, NA)) + geom_hline(yintercept=c(30,0), linetype="dotted") + theme_tq()
Thanks in advance
I don't understand what exactly is causing you trouble. If I loop through a list of dataframes, I can set the yintercept of each corresponding plot without too much trouble. Example below:
library(ggplot2)
library(patchwork)
# Split the economics dataset as an example
datasets <- split(economics, cut(seq_len(nrow(economics)), 9))
# Loop through list of dataframes, set hline to [1, 6] (drop because tibble)
plots <- lapply(datasets, function(df) {
ggplot(df, aes(date, unemploy)) +
geom_line() +
scale_y_continuous(limits = c(0, NA)) +
geom_hline(yintercept = c(df[1, 6, drop = TRUE], 0),
linetype = "dotted")
})
# For visualisation purposes
wrap_plots(plots)
Created on 2020-12-04 by the reprex package (v0.3.0)

List of ggplot density plots across datasets

I have several data-sets which are simple transformations of one another, e.g.
iris0 <- iris ; iris1 <- iris; iris2 <- iris
iris1[,1:4] <- sqrt(iris0[,1:4])
iris2[,1:4] <- log(iris0[,1:4])
I want to visualise how the densities of distributions of each attribute are affected by transformations, using density plots in ggplot2.
I could use code of the following kind:
ggplot() + geom_density(aes(x=Attr), fill="red", data=vec_from_dataset1, alpha=.5) + geom_density(aes(x=Attr), fill="blue", data=vec_from_dataset2, alpha=.5)
or, for example, bind the attributes together and then consider them as one dataset. What is the best, cleanest/most efficient way of (using Map probably) to generate a list of density plots, where iris0 is compared to each other dataset (iris1and iris2), across each numerical attribute i.e. columns 1-4? (So in this case, there would be 4*2 = 8 total density plots.)
(I should clarify--no package except base R+ggplot2 please, dplyr if absolutely necessary)
Edit:
Based on the top answer here: Creating density plots from two different data-frames using ggplot2, I had the following go:
combs = expand.grid(Attributes=names(X),Datasets=c("iris1","iris2"))
plots <-
Map(function(.x, .y, ds2) {
ggplot(data=iris0, aes(x=.x)) +
geom_density(fill="red") +
geom_density(data=get(ds2), fill="purple") +
xlab(.y) + ggtitle(label=paste0("Density plot for the ",.y))
}, X[names(X)], names(X), as.character(combs[[2]]))
But the output is just the density from the first dataset for each attribute (iris0), filled in purple. Can anyone help?
Here's one approach leveraging rbindlist() from package data.table that gives you a list of ggplot objects you can print or do whatever with downstream.
library(data.table)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.3
iris0 <- iris ; iris1 <- iris; iris2 <- iris
iris1[,1:4] <- sqrt(iris0[,1:4])
iris2[,1:4] <- log(iris0[,1:4])
dt <- rbindlist(list(iris0 = iris0, iris1 = iris1, iris2 = iris2), idcol = TRUE)
plot_list <- expand.grid(dat = c("iris1", "iris2"),
var = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
stringsAsFactors = FALSE)
zz <- lapply(1:nrow(plot_list), function(i) {
plot_dat <- dt[.id %in% c("iris0", plot_list[i, "dat"]), c(".id", plot_list[i, "var"]), with = FALSE]
plot_names <- names(plot_dat)
ggplot(plot_dat, aes_string(x = plot_names[[2]], fill = plot_names[[1]])) +
geom_density(alpha = .5) +
scale_fill_manual("", values = c("red", "blue")) +
theme_bw() +
theme(legend.position = c(.8, .8))
})
zz[[3]]
Created on 2020-05-14 by the reprex package (v0.3.0)

how to loop a geographic mapping function over a list of dataframes (or a subsetted dataframe)

I have a dataframe consisting of species names, longitude and latitude coordinates. there are 115 different species with 25000 lat/long coordinates. I need to make individual maps that show observations for each specific species.
first, I created a function that would generate the kind of map that I want, called platmaps. when I call the function for my full dataset (platmaps(df1)), it creates a map displaying all lat long observations.
Then I constructed a for loop which was supposed to subset my df by species name, and insert that subsetted dataframe into my platmaps function. It runs for a couple of minutes and then nothing happens.
so I then I split the dataframe by species name, and created a list of dataframes(out1), and used lapply(out1, platmaps) but it only returned a list of the names of my dfs.
Then I tried a variation of an example that I saw here, but it also did not work.
function
platmaps<-function(df1){
wm <- wm <- borders("world", colour="gray50", fill="gray50")
ggplot()+
coord_fixed()+
wm +
geom_point(data =df1 , aes(x = decimalLongitude, y = decimalLatitude),
colour = "pink", size = 0.5)
subset
for(i in 1:nrow(PP)){
query<-paste(PP$species[i])
p<-subset(df1, df1$species== query))
platmaps(p)
}
list
for (i in 1:length(out1)){
pp<-out1[[i]]
platmaps(pp)
}
applied example
p =
wm <- wm <- borders("world", colour="gray50", fill="gray50")
ggplot()+
coord_fixed()+
wm +
geom_point(data =df1 , aes(x = decimalLongitude, y = decimalLatitude),
colour = "pink", size = 0.5)
plots = df1 %>%
group_by(species) %>%
do(plots = p %+% . + facet_wrap(~species))
the error for the applied example is:
Error: Cannot add ggproto objects together. Did you forget to add this
object to a ggplot object?
As I'm new to R (and coding), I assume I'm getting the syntax wrong, or am not applying my function correctly to/within either of my loops, or I fundamentally misunderstand the way looping works.
data frame sample
species decimalLongitude decimalLatitude
Platanthera lacera -71.90000 42.80000
Platanthera lacera -90.54861 40.12083
Platanthera lacera -71.00889 42.15500
Platanthera lacera -93.20833 45.20028
Platanthera lacera -72.45833 41.91666
Platanthera bifolia 5.19800 59.64310
Platanthera sparsiflora -117.67472 34.36278
fixed platmaps function
ggplot(data=df1 %>% filter(species == s))+
coord_fixed()+
borders("world", colour="gray50", fill="gray50")+
geom_point(aes(x = decimalLongitude, y = decimalLatitude),
colour = "pink", size = 0.5)+
labs(title=as.character(s))
Because you didn't provide a test data set, let me give you a general idea how to make multiple plots you can inspect later. The code below will plot a parameter for a number of countries and save plot pdfs to a given path. You can replace the code behind the pl variable in the loop with your function.
library(ggplot2)
library(dplyr)
df <- data.frame(country = c(rep('USA',20), rep('Canada',20), rep('Mexico',20)),
wave = c(1:20, 1:20, 1:20),
par = c(1:20 + 5*runif(20), 21:40 + 10*runif(20), 1:20 + 15*runif(20)))
countries <- unique(df$country)
plot_list <- list()
i <- 1
for (c in countries){
pl <- ggplot(data = df %>% filter(country == c)) +
geom_point(aes(wave, par), size = 3, color = 'red') +
labs(title = as.character(c), x = 'wave', y = 'value') +
theme_bw(base_size = 16)
plot_list[[i]] <- pl
i <- i + 1
}
pdf('path/to/pdf')
pdf.options(width = 9, height = 7)
for (i in 1:length(plot_list)){
print(plot_list[[i]])
}
dev.off()
After the plots are obtained (the plot_list variable), we turn on the pdf terminal and print them. In the end, we turn off the pdf terminal.
there is a neat way to apply any function to a list of items. I have outlined a way to do this with the data you added. I cannot get platmaps to work so I have just made a scatter plot.
The method is to split your data frame into individual subsets using split() and then apply the plotting function to the resulting list using lapply(). Since lapply() returns a list, this can be passed directly to a function such as ggpubr::ggarrange() for visualizing.
library(ggplot2)
plot_function <- function(x){
p <- ggplot(x, aes(x = decimalLongitude, y = decimalLatitude)) + geom_point()
p
}
plot_list <-
df %>%
split(.$species) %>% # Separate df into subset dfs based on species column
lapply(., plot_function) # map plot_function to list
# Display on a grid (many ways to do this - I just find this package simple)
ggpubr::ggarrange(plotlist = plot_list)

How to plot three point lines using ggplot2 instead of the default plot in R

I have three matrix and I want to plot the graph using ggplot2. I have the data below.
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
w <- matrix(W[4,])
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
I want to add the three plots into one plot using a beautiful ggplot2.
Moreover, I want to make the points with different values have different colors.
I'm not quite sure what you're after, here's a guess
Your data...
max <- c(175523.9, 33026.97, 21823.36, 12607.78, 9577.648, 9474.148, 4553.296, 3876.221, 2646.405, 2295.504)
min <- c(175523.9, 33026.97, 13098.45, 5246.146, 3251.847, 2282.869, 1695.64, 1204.969, 852.1595, 653.7845)
w <- c(175523.947, 33026.971, 21823.364, 5246.146, 3354.839, 2767.610, 2748.689, 1593.822, 1101.469, 1850.013)
Slight modification to your base plot code to make it work...
plot(1:10,max,type='b',xlab='Number',ylab='groups',col=3)
points(1:10,min,type='b', col=2)
points(1:10,w,type='b',col=1)
Is this what you meant?
If you want to reproduce this with ggplot2, you might do something like this...
# ggplot likes a long table, rather than a wide one, so reshape the data, and add the 'time' variable explicitly (ie. my_time = 1:10)
require(reshape2)
df <- melt(data.frame(max, min, w, my_time = 1:10), id.var = 'my_time')
# now plot, with some minor customisations...
require(ggplot2); require(scales)
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
UPDATE after the question was edited and the example data changed, here's an edit to suit the new example data:
Here's your example data (there's scope for simplification and speed gains here, but that's another question):
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
wss <- NULL
W=matrix(data=NA,ncol=10,nrow=100)
for(j in 1:100){
k=10
for(i in 1: k){
wss[i]=kmeans(x,i)$tot.withinss
}
W[j,]=as.matrix(wss)
}
max_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
max_Wmk[,i]=max(W[,i],na.rm=TRUE)
}
min_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
min_Wmk[,i]=min(W[,i],na.rm=TRUE)
}
w <- matrix(W[4,])
Here's what you need to do to make the three objects into vectors so you can make the data frame as expected:
max_Wmk <- as.numeric(max_Wmk)
min_Wmk <- as.numeric(min_Wmk)
w <- as.numeric(w)
Now reshape and plot as before...
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
And here's the result:

Resources