Function for formatting and plotting in R

Function for formatting and plotting in R - r

I am currently trying to create a function that will format my data and properly and return a bar plot that is sorted. Yet for some reason I keep getting this error:
Error in `$<-.data.frame`(`*tmp*`, "Var1", value = integer(0)) :
replacement has 0 rows, data has 3
I have tried debugging it, but have had no luck. I have an example of what I expect down at the bottom. Can anyone spot what I am doing wrong?
x <- rep(c("Mark","Jimmy","Jones","Jones","Jones","Jimmy"),2)
y <- rnorm(12)
df <- data.frame(x,y)
plottingfunction <- function(data, name,xlabel,ylabel,header){
newDf <- data.frame(table(data))
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
newDf$Var1 <- factor(newDf$Var1,order)
colnames(newDf)[1] <- name
plot <- ggplot(newDf, aes(x=name, y=Freq)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(stat="identity", fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingfunction(df$x, "names","xlabel","ylabel","header")

A few comments, your function didn't work, because this part isn't correct:
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
Since we have no idea if there will be any columns in data which has the column name Var1. What looks like happend is when you were trying your code you ran:
newDf <- data.frame(table(df$x))
which immediately renamed your column to Var1, but when you ran your function, the name changed. So to get around this I would recommend being explicit with your column names. In this example, I used the dplyr library to make my life easier. So following your code and logic it would look like this:
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
Then within your ggplot we can use aes_string to refer to the column name of the data frame instead. So then the whole function would look like this:
plottingFunction <- function(data, col_name, xlabel, ylabel, header) {
#' create a dataframe with the data that we're interested in
#' make sure that you preserve the anme of the column that you're
#' counting on...
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
plot <- ggplot(data, aes_string(col_name)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingFunction(df, "x", "xlabel","ylabel","header")
Which would have output like:
I think for your plot having stat="identity" is redundant since you can just use your original data frame rather than having a transformed one.

Related

Changing title of plots in a loop with colnames() in R

I am creating a for loop which creates a ggplot2 plot for each of the first six columns in a dataframe. Everything works except for the looping of the title names. I have been trying to use title = colnames(df[,i]) and title = paste0(colnames(df[,i]) to create the proper title but it simply ends up repeating the 2nd column name. The plots themselves produce the data correctly for each column, but the title is for some reason not looping. For the first plot it produces the correct title, but then for the second plot and beyond it just keeps on repeating the third column name, completely skipping over the second column name. I even tried creating a variable within the loop to store the respective title name to then use within the ggplot2 title labels: changetitle <- colnames(df[,i]) and then using title = changetitle but that also loops incorrectly.
Here is an example of what I have so far:
plot_6 <- list()
for(i in df[1:6]){
plot_6[i] <- print(ggplot(df, aes(x = i, ...) ...) +
... +
labs(title = colnames(df[,i]),
x = ...) +
...)
}
Thank you very much.

df[1:6] is a data frame with six columns. When used as a loop variable, this results in i being a vector of values each time through the loop. This might "work" in the sense that ggplot will prroduce a plot, but it breaks the link between the data frame provided to ggplot (df in this case) and the mapping of df's columns to ggplot's aesthetics.
Here are a few options, using the built-in mtcars data frame:
library(tidyverse)
library(patchwork)
plot_6 <- list()
for(i in 1:6) {
var = names(mtcars)[i]
plot_6[[i]] <- ggplot(mtcars, aes(x = !!sym(var))) +
geom_density() +
labs(title = var)
}
# Use column names directly as loop variable
for(i in names(mtcars)[1:6]) {
plot_6[[i]] <- ggplot(mtcars, aes(x = !!sym(i))) +
geom_density() +
labs(title = var)
}
# Use map, which directly generates a list of plots
plot_6 = map(names(mtcars)[1:6],
~ggplot(mtcars, aes(x = !!sym(.x))) +
geom_density() +
labs(title = .x)
)
Any of these produces the same list of plots:
wrap_plots(plot_6)

How to use column names starting with numbers in ggplot functions

I have a huge dataframe, whose variables/ column names start with a number such as `1_variable`. Now I am trying to create a function that can take these column names as arguments to then plot a few boxplots using ggplot. However I need the string but also need to to use its input with `` to use the arguments in ggplot. However I am not sure how to escape the character string such as "1_variable" to give ggplot an input that is `1_variable`.
small reproducible example:
dfx = data.frame(`1ev`=c(rep(1,5), rep(2,5)), `2ev`=sample(10:99, 10),
`3ev`=10:1, check.names = FALSE)
If I were to plot the figure manually, the input would look like this:
dfx$`1ev` <- as.factor(dfx$`1ev`)
ggplot(dfx, aes(x = `1ev`, y = `2ev`))+
geom_boxplot()
the function I'd like to be able to run for the dataframe is this one:
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes(x = group, y = value))+
geom_boxplot()
return(plot)
}
1. Try
plot_boxplot(dfx, `1ev`, `2ev`)
which gives me an error saying Error in [.data.frame(data, c(group, value)) : object '1ev' not found
2. Try
entering the arguments with double quotes "" gives me unexpectedly this:
plot_boxplot(dfx, "1ev", "2ev")
3. Try
I also tried to replace the double quotes of the string with gsub in the function
gsub('\"', '`', group)
but that does not change anything abut its output.
4. Try
finally, I also tried to make use of aes_string , but that just gives me the same errors.
plot_boxplot <- function(data, group, value){
data = data[c(as.character(group), as.character(value))]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_string(x= group, y=value))+
geom_boxplot()
return(plot)
}
plot_boxplot(dfx, `1ev`, `2ev`)
plot_boxplot(dfx, "1ev", "2ev")
Ideally I would like to run the function to produce this output:
plot_boxplot(dfx, group = "1ev", value = "2ev")
[can be produced with this code manually]
ggplot(dfx, aes(x= `1ev`, y=`2ev`)) +
geom_boxplot()
Any help would be greatly appreciated.

One way to do this is a combination of aes_ and as.name():
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_(x= as.name(group), y=as.name(value))) +
geom_boxplot()
return(plot)
}
And passing in strings for group and value:
plot_boxplot(dfx, "1ev", "2ev")
It's not the same plot you show above, but it looks to align with the data.

Plot all columns from a data.frame in a subplot with ggplot2

as the title suggest, I want to plot all columns from my data.frame, but I want to do it in a generic way. All my columns are factor.
Here is my code so far:
nums <- sapply(train_dataset, is.factor) #Select factor columns
factor_columns <- train_dataset[ , nums]
plotList <- list()
for (i in c(1:NCOL(factor_columns))){
name = names(factor_columns)[i]
p <- ggplot(data = factor_columns) + geom_bar(mapping = aes(x = name))
plotList[[i]] <- p
}
multiplot(plotList, cols = 3)
where multiplot function came from here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
And my dataset came from Kaggle (house pricing prediction): https://www.kaggle.com/c/house-prices-advanced-regression-techniques
What I get from my code is the image below, which appears to be the last column badly represented.
This would be the last column well represented:
EDIT:
Using gridExtra as #LAP suggest also doesn't give me a good result. I use this instead of multiplot.
nCol <- floor(sqrt(length(plotList)))
do.call("grid.arrange", c(plotList, ncol=nCol))
but what I get is this:
Again, SaleCondition is the only thing printed and not very well.
PD: I also tried cowplot, same result.

Using tidyr you can do something like the following:
factor_columns %>%
gather(factor, level) %>%
ggplot(aes(level)) + geom_bar() + facet_wrap(~factor, scales = "free_x")

creating a subset of data frame when running a loop

I'm quite new in R, trying to find my why around. I have created a new data frame based on the "original" data frame.
library(dplyr)
prdgrp <- as.vector(mth['MMITCL'])
prdgrp %>% distinct(MMITCL)
When doing this, then the result is a list of Unique values of the column MMITCL. I would like to use this data in a loop sequence that first creates a new subset of the original data and the prints a graph based on this:
#START LOOP
for (i in 1:length(prdgrp))
{
# mth[c(MMITCL==prdgrp[i],]
mth_1 <- mth[c(mth$MMITCL==prdgrp[i]),]
# Development of TPC by month
library(ggplot2)
library(scales)
ggplot(mth_1, aes(Date, TPC_MTD))+ geom_line()
}
# END LOOP
Doing this gives me the following error message:
Error in mth$MMITCL == prdgrp[i] :
comparison of these types is not implemented
In addition: Warning:
I `[.data.frame`(mth, c(mth$MMITCL == prdgrp[i]), ) :
Incompatible methods ("Ops.factor", "Ops.data.frame") for "=="
What am I doing wrong.

If you just want to plot the outputs there is no need to subset the dataframe, it is simpler to just put ggplot in a loop (or more likely use facet_wrap). Without seeing your data it is a bit hard to give you a precise answer. However there are two generic iris examples below - hopefully these will also show where you made the error in sub setting your dataframe. Please let me know if you have any questions.
library(ggplot2)
#looping example
for(i in 1:length(unique(iris$Species))){
g <- ggplot(data = iris[iris$Species == unique(iris$Species)[i], ],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}
#facet_wrap example
g <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
facet_wrap(~Species)
g
However if you need to save the data frames for later use, one option is to put them into a list. If you only need to save the data frame with in the loop you can just remove the list and use whatever variable name you wish.
myData4Later <- list()
for(i in 1:length(unique(iris$Species))){
myData4Later[[i]] <- iris[iris$Species == unique(iris$Species)[i], ]
g <- ggplot(data = myData4Later[[i]],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}

Passing through data frames into functions and into ggplot by column

I'm trying to do my first function in R. I have a dataframe of inderminate columns, and I want to create a ggplot of each set of columns. For example, columns, 1&2, 1&3, 1&4 etc.
However, when I try the following function I get the object not found error, but only when we get the the ggplot portion.
Thanks,
BrandPlot=function(Brand){
NoCol=ncol(Brand)
count=2
while (count<=NoCol){
return(ggplot(Brand, aes(x=Brand[,1], y=Brand[,count]))+geom_point())
count=(count+1)
}
}
To clarify,
I'm trying to get the effect (also, I plan on adding additional things like geom_smooth() but I want to get it working first
ggplot(Brand, aes(x=Brand[,1], y=Brand[,2]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,3]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,4]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,5]))+geom_point
(also, I plan on adding additional things like geom_smooth() ) but I want to get it working first

Per the note above, something like this may be what you're looking for...
brandplot <- function(x){
require(reshape2)
require(ggplot2)
x_melt <- melt(x, id.vars = names(x)[1])
ggplot(x_melt,
aes_string(x = names(x_melt)[1],
y = 'value',
group = 'variable')) +
geom_point() +
facet_wrap( ~ variable)
}
dat <- data.frame(a = sample(1:10, 25, T),
b = sample(20:30, 25, T),
c = sample(40:50, 25, T))
brandplot(dat)

[Note: #maloneypatr's solution is a better way to use ggplot for your application].
To answer your question directly, there are a couple of problems.
Your function returns after the first run through the loop (e.g., count=2), so you will never get more than one plot from this.
ggplot evaluates arguments to aes(...) in the context of the data frame defined in data=..., so it is looking for something like Brand$Brand (e.g., a column named Brand in the dataframe Brand). Since there is no such column, you get the Object not found error.
The following code will generate a series of n-1 plots where n = ncol(Brand).
BrandPlot=function(Brand){
for (count in 2:ncol(Brand)){
ggp <- ggplot(Brand, aes_string(x=names(Brand)[1], y=names(Brand)[count]))
ggp <- ggp + geom_point()
ggp <- ggp + ggtitle(paste(names(Brand)[count], " vs. ", names(Brand)[1]))
plot(ggp)
}
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Function for formatting and plotting in R - r

Related

Changing title of plots in a loop with colnames() in R

How to use column names starting with numbers in ggplot functions

Plot all columns from a data.frame in a subplot with ggplot2

creating a subset of data frame when running a loop

Passing through data frames into functions and into ggplot by column

Categories

Resources