R for loop overwriting variable data - r

I am trying to use a for loop to create a ggplot for each column in a dataframe. I am pretty new to this so my approach may be very wrong here.
I have written a function to create the ggplot:
create_scatter <- function(df, x, y) {
ggplot(df, aes(x, y)) +
geom_point() +
xlab(name) +
ylab("quality")
}
And a for loop to iterate through the Dataframe columns by name (to get the name of the column for use later) then get the contents of the column for the plotting function.
for (name in names(whiteWines)) {
for (column in whiteWines[name]) {
assign(paste0(name, "_scatter"),
create_scatter(whiteWines, column, whiteWines$quality))
}
}
Using assign() I am able to create a variable name from the column name on the fly and assign the results of ggplot to it.
I am then using grid.arrange to arrange the resulting plots in a 3 x 4 grid.
grid.arrange(fixed.acidity_scatter,
volatile.acidity_scatter,
citric.acid_scatter,
residual.sugar_scatter,
chlorides_scatter,
free.sulfur.dioxide_scatter,
total.sulfur.dioxide_scatter,
density_scatter,
pH_scatter,
sulphates_scatter,
alcohol_scatter,
layout_matrix = rbind(c(1,2,3), c(4,5,6), c(7,8,9), c(10,11,12)))
When executed all scatter plots are created, however they all contain the data from the last scatter plot in the loop.
Undesired Results
If I wrap the assign statement in a print() statement then I do get the desired outcome in the grid, but each individual plot gets printed as well.
Desired Results
Dataset

You're probably looking for something more like this:
library(readr)
library(tidyr)
library(dplyr)
library(ggplot2)
ww <- read_delim(file = "~/Downloads/winequality-white.csv",delim = ";")
ww_long <- ww %>%
gather(key = measure,value = value,`fixed acidity`:`alcohol`)
ggplot(data = ww_long,aes(x = quality,y = value)) +
facet_wrap(~measure,scales = "free_y") +
geom_point()
R has some tools that can be very tempting for beginners as they think through solving a problem. Among them are assign(), get() and eval(parse(text = )). It is usually the case that a solution using those will cause more problems than they solve; there's typically a better way, but will require digging a little deeper into the "normal" way of doing things in R.

The followings are the variables of the data
"fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
the followings are sample rows
7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
6.3;0.3;0.34;1.6;0.049;14;132;0.994;3.3;0.49;9.5;6
8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;10.1;6
7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;10.1;6
6.2;0.32;0.16;7;0.045;30;136;0.9949;3.18;0.47;9.6;6
7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
6.3;0.3;0.34;1.6;0.049;14;132;0.994;3.3;0.49;9.5;6
8.1;0.22;0.43;1.5;0.044;28;129;0.9938;3.22;0.45;11;6
All form the excel sheet.

Related

Omitting NA values from ggplot when using multiple dataframes to plot multiple lines

My dataframes sometimes contain NA values. These were previously blanks, characters like 'BAD' or actual 'NA' characters from the imported .csv file. I have changed everything in my dataframes to numeric - this changes all non-numeric characters to NA. So far, so good.
I am aware I can use the following using dataframe 'df' to ensure a line is always drawn between data points, ensuring there are no gaps:
ggplot(na.omit(df), aes(x=Time, y=pH)) +
geom_line()
However, sometimes I wish to plot 2 or more dataframes using ggplot2 to get a single plot. I do this because my x axis (Time) is indeed the same for all dataframes, but the specific numbers are different. I was having immense trouble merging these dataframes because the rows are not equal. Otherwise I would merge, melt the data and use ggplot2 as normal to make a multiple-lined line plot.
I have since learnt you can plot multiple dataframes manually on ggplot at the 'geom level':
ggplot() +
geom_line(df1, aes(x=Time1, y=pH1), colour='green') +
geom_line(df2, aes(x=Time2, y=pH2), colour='red') +
geom_line(df3, aes(x=Time3, y=pH3), colour='blue') +
geom_line(df4, aes(x=Time4, y=pH4), colour='yellow')
However, how can I now ensure NA values are omitted and the lines are connected?! It all seems to work, but my 4 plots have gaps in them where the NA values are!
I am new to R, but enjoying it so far and realise there are usually multiple solutions to an issue. Any help or advice appreciated.
EDIT (for anyone who later sees this)
So, after playing around for 30 mins I realised I could first use the no.omit function separately on each dataframe, name these new objects and then just these plot these instead on ggplot. This works fine. Also, the above code was incorrect anyway if I wanted a suitable legend.
New, correct code:
df1.omit <- na.omit(df1)
df2.omit <- na.omit(df2)
df3.omit <- na.omit(df3)
df4.omit <- na.omit(df4)
ggplot() +
geom_line(df1.omit, aes(x=Time1, y=pH1, colour="Variable 1") +
geom_line(df2.omit, aes(x=Time2, y=pH2, colour="Variable 2") +
geom_line(df3.omit, aes(x=Time3, y=pH3, colour="Variable 3") +
geom_line(df4.omit, aes(x=Time4, y=pH4, colour="Variable 4")
So, after playing around for 30 mins I realised I could first use the no.omit function separately on each dataframe, name these new objects and then just these plot these instead on ggplot. This works fine. Also, the above code was incorrect anyway if I wanted a suitable legend.
df1.omit <- na.omit(df1)
df2.omit <- na.omit(df2)
df3.omit <- na.omit(df3)
df4.omit <- na.omit(df4)
ggplot() +
geom_line(df1.omit, aes(x=Time1, y=pH1, colour="Variable 1") +
geom_line(df2.omit, aes(x=Time2, y=pH2, colour="Variable 2") +
geom_line(df3.omit, aes(x=Time3, y=pH3, colour="Variable 3") +
geom_line(df4.omit, aes(x=Time4, y=pH4, colour="Variable 4")

R: Producing several barcharts with ggplot2 and lapply: how to insert the subtitles according to a list?

My aim is to produce and save several bar plots with lapply and ggplot2. For this Purpose, I have created a list out of my data. Now everything works fine apart from the subtitles: I would like to insert the names of the elements of my list into the graphs. So far I could only insert the name of the first element.
I have found another post, which helped me a lot to get so far. I'm new here, so I hope I'm posting this question in the right way (I haven't found an option to relate to this other post).
I adapted this Code from this question
because I have a follow-up question to the case provided there.
###creating some random data:
df <- data.frame(value = floor(runif(20,min=0,max=30)),
Intervall = paste("Intervall",rep(1:10,2)), type = rep(c("a", "b")))
list1 <- split(df, df$type)
###producing plots with lapply and ggplot
plots <- lapply(list1, function(x) {
ggplot(x, aes(Intervall, value)) +
geom_bar(stat="identity") +
labs(title="Intervalle", subtitle =names(list1))})
lapply(names(plots),
function(x) ggsave(filename=paste(x,".emf",sep=""), plot=plots[[x]]))
The elements of my list are called a and b. Now the first graph should have the subtitle "a", and the second graph the subtitle "b".
How can I do so? (also how can I first see my plots in the console before saving them?)
With names(list1) "a" becomes the subtitle for both graphs…
The issue is not related to the second command. It comes from the creation of 'plots'. In the subtitle, we are passing the whole names(list1) instead of corresponding elements. If we loop through the names of 'list1', it becomes easier to get the corresponding name for each list element, also, the list can be subsetted based on the same names
plots <- lapply(names(list1), function(nm) {
ggplot(list1[[nm]], aes(Intervall, value)) +
geom_bar(stat="identity") +
labs(title="Intervalle", subtitle =nm)})
names(plots) <- names(list1)
Now, we use the same command as in the OP (changed the .emf to .png
lapply(names(plots), function(nm) ggsave(filename =
paste(path, nm, ".png", sep=""), plot = plots[[nm]]))
-plots
Consider by in place of split + lapply and use type in subtitle and filename arguments.
# NAMED LIST OF PLOTS
plots <- by(df, df$type, function(sub) {
p <- ggplot(sub, aes(Intervall, value)) +
geom_bar(stat="identity") +
labs(title="Intervalle", subtitle = sub$type[1])
ggsave(filename=paste0(sub$type[1],".emf"), plot=p)
return(p)
})

How to `dput` a `ggplot` object?

I am looking for a way to save some ggplot objects for later use. The dput function creates a string that when passed to dget() would return the errors of unexpected <:
The first one is here: .internal.selfref = <. This can be easily solved by setting .internal.selfref to NULL.
The remaining seven are distributed across different attributes, with the arguments being <environment>. I tried to change the <environment>'s to something like NULL or environment(), but none of them works - the environment is not set right and the object not found error is returned.
Some searches led me to the function ggedit::dput.ggedit. But it gives me the error:
# Error in sprintf("%s = %s", item, y) :
# invalid type of argument[2]: 'symbol'
I am thinking, either I set the environments right in using the dput function, or I figure out why ggedit::dput.ggedit does not work...
Any idea?
Not using dput(), but to save your ggplot objects for later use, you could save them as .rds files (just like any R objects).
Example:
my_plot <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
saveRDS(my_plot, "my_plot.rds")
And to restore your object in another session, another script, etc.
my_plot <- readRDS("my_plot.rds")
You can try a tidyverse
Save the plot beside the data in a tibble using nest and map.
library(tidyverse)
res <- mtcars %>%
as.tibble() %>%
nest() %>%
mutate(res=map(data, ~ggplot(.,aes(mpg, disp)) + geom_point()))
Then save the data.frame using save or saveRDS.
Finally, call the plot:
res$res
The size is 4kb for tibble(mtcars) vs. 21kb with plot.

R, from a list create plots and save it with his name

I have a list, which contains 75 matrix with their names, and I want to do a plot for each matrix, and save each plot with the name that the matrix have.
My code do the plots with a loop and it works, I get 75 correct plots, but the problem is that the name of the plot file is like a vector "c(99,86,94....)",too long and I don´t know which one is.
I´m ussing that code, probably isn´t the best. I´m a beginner, and I have been looking for a solution one week, but it was impossible.
for (i in ssamblist) {
svg(paste("Corr",i,".svg", sep=""),width = 45, height = 45)
pairs(~CDWA+CDWM+HI+NGM2+TKW+YIELD10+GDD_EA,
data=i,lower.panel=panel.smooth, upper.panel=panel.cor,
pch=0, main=i)
dev.off()}
How put to a each plot his name?.
I try change "i" for names(i), but the name was the name of the first column,and only creates one plot. I try to do it with lapply but I could't.
PS: the plots are huge, and I have to expand the margins. I´m using Rstudio.
Thank you¡
Using for loop or apply:
# dummy data
ssamblist <- list(a = mtcars[1:10, 1:4], b = mtcars[11:20, 1:4], c = mtcars[21:30, 1:4])
# using for loop
for(i in names(ssamblist)) {
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()}
# using apply
sapply(names(ssamblist), function(i){
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()})

Reading from CSV and Plotting Boxes in R

I am looking for the most convenient way of creating boxplots for different values and groups read from a CSV file in R.
First, I read my Sheet into memory:
Sheet <- read.csv("D:/mydata/Table.csv", sep = ";")
Which just works fine.
names(Sheet)
gives me correctly the Headlines of the different columns.
I can also access and filter different groups into separate lists, like
myData1 <- Sheet[Sheet$Group == 'Group1',]$MyValue
myData2 <- Sheet[Sheet$Group == 'Group2',]$MyValue
...
and draw a boxplot using
boxplot(myData1, myData2, ..., main = "Distribution")
where the ... stand for more lists I have filled using the selection method above.
However, I have seen that using some formular could do these steps of selection and boxplotting in one go. But when I use something like
boxplot(Sheet~Group, Sheet)
it won't work because I get the following error:
invalid type (list) for variable 'Sheet'
The data in the CSV looks like this:
No;Gender;Type;Volume;Survival
1;m;HCM;150;45
2;m;UCM;202;103
3;f;HCM;192;5
4;m;T4;204;101
...
So i have multiple possible groups and different values which I'd like to represent as a box plot for each group. For example, I could group by gender or group by type.
How can I easily draw multiple boxes from my CSV data without having to grab them all manually out of the data?
Thanks for your help.
Try it like this:
Sheet <- data.frame(Group = gl(2, 50, labels=c("Group1", "Group2")),
MyValue = runif(100))
boxplot(MyValue ~ Group, data=Sheet)
Using ggplot2:
ggplot(Sheet, aes(x = Group, y = MyValue)) +
geom_boxplot()
The advantage of using ggplot2 is that you have lots of possibilities for customizing the appearance of your boxplot.

Resources