I want to create a series of x-y scatter charts, where y is always the same variable and x are the variables I want to check if they are correlated with. As an example lets use the mtcars dataset.
I am relatively new to R but getting better.
The code below works, the list charts contains all the charts, except that the X axis shows as "x", and I want it to be the name of the variable. I tried numerous combinations of xlab= and I do not seem to get it
if I use names(data) I see the names I want to use. I guess I want to reference the first of names(data) the first iteration of apply, the second the second time, etc. How can I do that?
Th next step would be to print them in a lattice together, I assume an lapply or sapply will do the trick with the print function - I appreciate idea for this too, just pointers I do not need a solution.
load(mtcars)
mypanel <- function(x,y,...) {
panel.xyplot(x,data[,y],...)
panel.grid(x=-1,y=-1)
panel.lmline(x,y,col="red",lwd=1,lty=1)
}
data <- mtcars[,2:11]
charts <- apply(data,2,function(x) xyplot (mtcars[,1] ~ x, panel=mypanel,ylab="MPG"))
This all started because I was not able to use the panel function to cycle.
I did not find that this code "worked". Modified it to do so:
mypanel <- function(x,y,...) {
panel.xyplot(x, y, ...)
panel.grid(x=-1, y=-1)
panel.lmline(x,y,col="red",lwd=1,lty=1)
}
data <- mtcars[,2:11]
charts <- lapply(names(data), function(x) { xyplot (mtcars[,1] ~ mtcars[,x],
panel=mypanel,ylab="MPG", xlab=x)})
Needed to remove the 'data[,y]' from the panel function and pass names instead of column vectors so there was something to use for a x-label.
Related
I am supposed to create these plots for my course, but I'm very confused with using the looping function. I added an image of my dataframe and the plot I was asked to create. Does anyone know how I can get started?
I named my dataframe 'gutan' and I tried to do this so that I can loop through the dataframe:
gutan
gd <- gutan[ , i]
Here I tried to just see how it would work if I looped through it, but it did not work because I'm not sure how I can code for it to use one column as x and the other as y on the plot
i = 1
for (i in 3:8) {
plot(gd) }
I'm writing a function where I should get 2 ggplots objects returned to me in RStudio based on two different dataframes generated within my function. However, instead I get a plot with all the dataframe values "printed" in it returned and not a normal scatterplot.
I tried:
return(list(df1, df2))
Plots<- list(df1, df2), return(Plots)
View(df1) View(df2)
ggplot without storing it into an object
Just return a single ggplot and not using list() to return two.
Print() instead of return or view.
Every result has the same outcome (picture):
As you can see on the bottom right, I do not get a scatter plot. The console does show output [1] and [[2]], but nothing else. The code itself is working perfectly.
I ran debug, I've got no errors and above all when I replaced ggplot with plot(), this DID return the prefered scatterplot to me. So I assume the problem is not related to the code itself.
However, I am much more familiar with customizations with ggplot than plot(), so if anyone knows how to solve this issue it would be amazing. Provided below I added some sample data and some sample code, although I'm not sure whether that is relevant with this issue.
The code I used within my function to create and return the ggplots is:
MD_filter_trial<- function(dataframe, mz_col, a = 0.00112, b = 0.01953){
MZ<- mz_col
MZR<- trunc(mz_col, digits = 0)#Either floor() or trunc() can be used for this part.
MD<- as.numeric(MZ-MZR)
MD.limit<- b + a*mz_col
dataframe<- dataframe%>%
dplyr::mutate(MD, MZ, MD.limit)%>%
dplyr::select(MD, MZ, MD.limit)
highlight_df <- dataframe %>% filter(MD >= MD.limit) #Notice how this is the exact opposite from the
MD_plot<- ggplot(data=dataframe, aes(x=MZ, y=MD))+
geom_point()+
geom_point(data=highlight_df, aes(x=MZ,y=MD), color='red')+#I added this one, so the data which will be removed will be highlighted in red.
ggtitle(paste("Unfiltered MD data - ", dataframe))
filtered<- dataframe%>%
filter(MD <= MD.limit)# As I understood: Basically all are coordinates. The maxima equation basically gives coordinates
MD_plot_2<- ggplot(data=filtered, aes(x=MZ, y=MD))+ #Filtered is basically the second dataframe, #which subsets datapoints with an Y value (which is the MD), below the linear equation MD...
geom_point()+
ggtitle(paste("Filtered MD data - ", dataframe))
N_Removed_datapoints <- nrow(dataframe) - nrow(filtered)
print(paste("Number of peaks removed:", N_Removed_datapoints))
MD_PLOTS<-list(dataframe, filtered, MD_plot, MD_plot_2)
return(MD_PLOTS)
}
Sample data:
structure(list(mz_col= c(99.0001, 99.0056, 99.0079, 99.0097, 99.0105,
99.0116, 99.0158, 99.0169, 99.019, 99.0196, 99.0207, 99.0215,
99.0239, 99.0252, 99.026, 99.0269, 99.0288, 99.0295, 99.0302,
99.0311, 99.0318, 99.0332, 99.034, 99.0346, 99.0355, 99.0376,
99.039, 99.04, 99.0405, 99.0414, 99.0421, 99.043, 99.0444, 99.0473,
99.048, 99.0517, 99.0536, 99.0547, 99.0556, 99.057, 99.0575,
99.0586, 99.0599, 99.0606, 99.0621, 99.0637, 99.0652, 99.0661,
99.0668, 99.0686, 99.0694, 99.0699, 99.0707, 99.0714, 99.072,
99.075, 99.0762, 99.0794, 99.0808, 99.0836, 99.0888, 99.0901,
99.0911, 99.092, 99.095, 99.0962, 99.1001, 99.1064, 99.1173,
99.4889, 99.5059, 99.5084, 99.5126, 99.5158, 99.5165, 99.5173,
99.5183, 99.526, 99.5266, 99.5315, 99.5345, 99.5358, 99.5402,
99.543, 99.5472, 99.548, 99.5529, 99.5572, 99.5577, 99.9408,
99.9551, 99.9599, 99.9646, 99.9718, 99.9887)), row.names = c(NA,
-95L), class = c("tbl_df", "tbl", "data.frame"))
In your ggtitles calls perhaps you mean:
ggtitle(paste("Filtered MD data -", deparse(substitute(dataframe)))
Within a function this takes the name of the object passed to the dataframe argument and pastes it into a string, rather than putting the whole dataframe in.
I have a question about creating a function that creates ggplots. I want to create my own function to graph values in multiple data frames quickly instead of writing a whole ggplot with each argument filled out each time. What I want to do is to input a vector of the names of the data frames, have the function create the graphs and have each saved as a new object with a different name. Example of my idea is…
myfunction <- function(x) {
ggplot(x, aes(x = time, y = result)) +
geom_point()
}
I want to be able to do something like
myfunction(c(testtype1, testtype2, testtype3))
and have the function create objects plot1, plot2, plot3. As of now, I can only do
plot1 <- myfunction(testtype1)
plot2 <- myfunction(testtype2)
plot3 <- myfunction (testtype3)
I don’t want to keep typing that over and over, especially if I have a lot of test types. Is there a way that the function can be modified to use the function to name the objects according to some formula?
With this, you can provide any number of (appropriate) data frames, and the l_my_fun would return a list containing the plots.
l_my_fun <- function(x, ...) {
l <- list(x, ...)
ps <- lapply(l, myfunction)
ps
}
out <- l_my_fun(testtype1, testtype2, testtype3)
For example, now access the second plot as
out[[2]]
I was wondering if anyone could help me use a variable name within a function.
I've put together a dot plot that sorts variables and then produces a bitmap, but I can't get R to pass the variable name to the plot title.
Example data
id<-c(1,2,3)
blood<-c(1,2,10)
weight<-c(1,2,13)
mydata<-as.data.frame(cbind(id,blood,weight))
mydata$blood
#######SORTED DOT PLOT####
Dplotter<-function (id,x,Title=""){
if (is.null(Title)) {Title=""} else {Title=Title}
DIR<-paste("C:/temp/WholePlots/New/",Title,".bmp",sep="")
D<-as.data.frame(cbind(id,x))
x1<-as.data.frame(D[order(x),])
bmp(DIR)
dotchart(x1$x,labels=id,main=Title,pch=16)
dev.off()
}
###############
Dplotter(mydata$id,mydata$blood,"Blood")
Dplotter(mydata$id,mydata$weight,"Weight")
In the second line of the function, I'd like to pass on the variable
name, something like
`if (is.null(Title)) {Title=varname(x)} else {Title=Title}`
so that I don't have to put "Blood" in the function Title field
(e.g. Dplotter(mydata$id,mydata$blood)
Basically, how does one paste in the variable name in a function? It
would be even better if one could take out the dataset name from the
Title (without attaching the dataset, which I've been told is bad
practice), so that instead of getting mydata$blood, you just get
"blood" in the title.
I've failed to find an easy solution to paste in a variable name in
a function. As you can guess, putting the variable name in a
paste() function returns the values of the variable (so that the
plot title is filled with values rather the variable name).
I'd also like to automate the function even further, so that I can
just put the dataset and the ID,and then have the function repeated
for each variable in the dataset. Obviously this requires solving
question 1 first, otherwise both title and filenames will encounter
problems.
The general answer is deparse(substitute(x)). E.g.
fooPlot <- function(x, main, ...) {
if(missing(main))
main <- deparse(substitute(x))
plot(x, main = main, ...)
}
Here it is in use:
set.seed(42)
dat <- data.frame(x = rnorm(1:10), y = rnorm(1:10))
fooPlot(dat, col = "red")
Which produces:
In your particular example though, this won't work because you don't want dat$x as the title, you want just x. We could do a little more manipulation however:
fooPlot <- function(x, main, ...) {
if(missing(main)) {
main <- deparse(substitute(x))
if(grepl("\\$", main)) {
main <- strsplit(main, "\\$")[[1]][2]
}
}
plot(x, main = main, ...)
}
Which for fooPlot(dat$x, col = "red") gives:
Note this code makes some assumptions, that main is not a vector, that there will only ever be one $ in the object passed to plot (i.e. you couldn't use a nested list for example with the above code).
You need to retrieve a set of strings, the variable names, and use them for the title of your plots and filenames as well.
I will use the longley dataset to illustrate the trick.
data(longley, package="datasets")
#return a vector with variable names
colnames(longley)
names(longley) #equivalent
#get the name of a specific variable (column number):
names(longley)[1]
To plot each variable, get two sets of strings: variable names and filenames:
var.names=names(longley)
file.names=paste(var.names, "bmp", sep=".")
#with an extra step to prefix a directory to those filenames
for (i in 1:ncol(longley) ) {
bmp(file=file.names[i])
plot(longley[[i]], main=var.names[i], ylab="")
dev.off()
}
ylab="", since otherwise it gives a silly "longley[[i]]" as y-label, and if I use var.name[i] as ylab, it would be redundant.
I'm an R newbie and I'm trying to understand the xyplot function in lattice.
I have a dataframe:
df <- data.frame(Mean=as.vector(abc), Cycle=seq_len(nrow(abc)), Sample=rep(colnames(abc), each=nrow(abc)))
and I can plot it using
xyplot(Mean ~ Cycle, group=Sample, df, type="b", pch=20, auto.key=list(lines=TRUE, points=FALSE, columns=2), file="abc-quality")
My question is, what are Mean and Cycle? Looking at ?xyplot I can see that this is some kind of function and I understand they are coming from the data frame df, but I can't see them with ls() and >Mean gives Error: object 'Mean' not found. I tried to replicate the plot by substituting df[1] and df[2] for Mean and Cycle respectively thinking that these would be equal but that doesn't seem to be the case. Could someone explain what data types these are (objects, variables, etc) and if there is a generic way to access them (like df[1] and df[2])?
Thanks!
EDIT: xyplot works fine, I'm just trying to understand what Mean and Cycle are in terms of how they relate to df (column labels?) and if there is a way to put them in the xyplot function without referencing them by name, like df[1] instead of Mean.
These are simply references to columns of df.
If you'd like access them by name without mentioning df every time, you could write with(df,{ ...your code goes here... }). The ...your code goes here... block can access the columns as simply Mean and Cycle.
A more direct way to get to those columns is df$Mean and df$Cycle. You can also reference them by position as df[,1] and df[,2], but I struggle to see why you would want to do that.
The reason your xyplot call works is it that implicitly does the equivalent of with(df), where df is your third argument to xyplot. Many R functions are like this, for example lm(y~x,obs) would also correctly pick up columns x and y from dataframe obs.
You need to add , data=df to your call to xyplot():
xyplot(Mean ~ Cycle, data=df, # added data= argument
group=Sample, type="b", pch=20,
auto.key=list(lines=TRUE, points=FALSE, columns=2),
file="abc-quality")
Alternatively, you can with(df, ....) and place your existing call where I left the four dots.