I'm an R newbie and I'm trying to understand the xyplot function in lattice.
I have a dataframe:
df <- data.frame(Mean=as.vector(abc), Cycle=seq_len(nrow(abc)), Sample=rep(colnames(abc), each=nrow(abc)))
and I can plot it using
xyplot(Mean ~ Cycle, group=Sample, df, type="b", pch=20, auto.key=list(lines=TRUE, points=FALSE, columns=2), file="abc-quality")
My question is, what are Mean and Cycle? Looking at ?xyplot I can see that this is some kind of function and I understand they are coming from the data frame df, but I can't see them with ls() and >Mean gives Error: object 'Mean' not found. I tried to replicate the plot by substituting df[1] and df[2] for Mean and Cycle respectively thinking that these would be equal but that doesn't seem to be the case. Could someone explain what data types these are (objects, variables, etc) and if there is a generic way to access them (like df[1] and df[2])?
Thanks!
EDIT: xyplot works fine, I'm just trying to understand what Mean and Cycle are in terms of how they relate to df (column labels?) and if there is a way to put them in the xyplot function without referencing them by name, like df[1] instead of Mean.
These are simply references to columns of df.
If you'd like access them by name without mentioning df every time, you could write with(df,{ ...your code goes here... }). The ...your code goes here... block can access the columns as simply Mean and Cycle.
A more direct way to get to those columns is df$Mean and df$Cycle. You can also reference them by position as df[,1] and df[,2], but I struggle to see why you would want to do that.
The reason your xyplot call works is it that implicitly does the equivalent of with(df), where df is your third argument to xyplot. Many R functions are like this, for example lm(y~x,obs) would also correctly pick up columns x and y from dataframe obs.
You need to add , data=df to your call to xyplot():
xyplot(Mean ~ Cycle, data=df, # added data= argument
group=Sample, type="b", pch=20,
auto.key=list(lines=TRUE, points=FALSE, columns=2),
file="abc-quality")
Alternatively, you can with(df, ....) and place your existing call where I left the four dots.
Related
I am trying to plot 9 barplots in a 3X3 matrix in R using base-R wrapped inside a for loop. (I am working on a workhorse solution for visualizing every column before I begin working on manipulating data) Below is the code:
library(ISLR);
library(ggplot2);
# load wage data
data(Wage)
par(mfrow=c(3,3))
for(i in 1:(dim(Wage)[2]-2)){
plot(Wage[,i],main = paste0(names(Wage)[i]),las = 2)
}
But unfortunately can't do properly for first 2 columns because they are numeric and actually needs a histogram. I get it that I need to fit if-else condition somewhere inside for() statement but that is giving me errors. below is the output where first 2 columns are plotted wrong. (Age and year are actually numeric and I may need to use them in X-axis instead of defaulting them to y).
Kindly requesting to suggest an edit/hack? I also learnt that I cant' use par() when I am wrapping ggplot inside for so I had to use base-R otherwise ggplot would have been great aesthetically.
Im an R beginner, and spent almost two days to figure out how one can draw two time-series inside one graph using "ts.plot". This should be a very simple task, but for some reason there was always something wrong.
My Dataset looks like this:
Data
I figured it out, and there are several ways to accomplish the task.
This is the most straightworward way: assign "variable_1" to "x" and "variable_2" to "y". Then use "ts.plot" to plot the graph:
x <- usa$central_bank_assets_gdp_percent
y <- usa$domestic_credit_private_sector_gdp
ts.plot(ts(x), ts(y), col=1:2)
Defining the location of the master dataset first, and then including the real variable names in the code:
attach(usa)
ts.plot(ts(central_bank_assets_gdp_percent), ts(domestic_credit_private_sector_gdp), col=1:2)
detach(usa)
Using the "$" sign as an alternative to specifying the location of the data:
ts.plot(ts(usa$central_bank_assets_gdp_percent), ts(usa$domestic_credit_private_sector_gdp), col=1:2)
Using "data.frame()" one can include the variables:
ts.plot(data.frame(usa$central_bank_assets_gdp_percent, usa$financial_system_deposits_gdp_percent), col=1:2)
This is the way specified in the help: Using "ts.plot(..., gpars = list())". In this case "..." are the variables, and all other functions go in the "gpars=list()":
ts.plot(ts(usa$central_bank_assets_gdp_percent), ts(usa$financial_system_deposits_gdp_percent), gpars = list(col=1:2))
I have a data matrix with approximately one hundred variables and I want to do box plots of these variables. Doing them one by one is possible, but tedious. The code I use for my box plots is:
boxplot(myVar ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T,las=2, ylab='Counts', at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
I started doing them one by one, but realized there must be better options. So, the boxplot call will take only one variable at at time (I may be wrong), so I am looking for a way to get it done in one go. A for loop? Next, I would like to print the name of the current variable (= the colName) on the plot in order to keep them apart.
Appreciate suggestions.
Thank you.
jd
Why not try the following:
data(something)
panel.bxp <- function(x, ...)
{
a <- par("a"); on.exit(par(a))
par(a = c(0, 2, a[3:4]))
boxplot(x, add=TRUE)
}
Then, to run the function, you can try something like the following:
pairs(something, diag.panel = panel.bxp, text.panel = function(...){})
EDIT: There is also a nice link to an article here on R-bloggers which you might want to have a look at.
Being very new to R, I've tried to follow my 'old' thinking - making a for-loop. Here is what I came up with. Probably very primitive, and therefore, I'd appreciate comments/suggestions. Anyway: the loop:
for (i in 1:ncol(final)) {
#print(i)
c <- colnames(final)[i]
#print(c)
b <- final[,i]
#b <- t(b)
#dim(b)
#print(b)
exp <- data.frame(Group,Trt,Time,b)
#dim(exp)
#print(exp)
boxplot(b ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T, las=2, ylab='Counts',main=c, at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
}
The loop runs through the data matrix 'final', (48rows x 67cols). Picks up the column header, c, which is used in the boxplot call as main title. Picks up the data column, b. Sets up the experiment using the Group, Trt, and Time factors established outside the loop, and calls the boxplot.
This seem to do what I want. Oddly, Rstudio does not allow more than 25 (approx) plots to be stored in the plots console, so I have to run this loop in a couple of rounds.
Anyway, sorry for answering my own question. Better solutions are greatly appreciated since my way is pretty amateourish, I suspect.
I want to create a series of x-y scatter charts, where y is always the same variable and x are the variables I want to check if they are correlated with. As an example lets use the mtcars dataset.
I am relatively new to R but getting better.
The code below works, the list charts contains all the charts, except that the X axis shows as "x", and I want it to be the name of the variable. I tried numerous combinations of xlab= and I do not seem to get it
if I use names(data) I see the names I want to use. I guess I want to reference the first of names(data) the first iteration of apply, the second the second time, etc. How can I do that?
Th next step would be to print them in a lattice together, I assume an lapply or sapply will do the trick with the print function - I appreciate idea for this too, just pointers I do not need a solution.
load(mtcars)
mypanel <- function(x,y,...) {
panel.xyplot(x,data[,y],...)
panel.grid(x=-1,y=-1)
panel.lmline(x,y,col="red",lwd=1,lty=1)
}
data <- mtcars[,2:11]
charts <- apply(data,2,function(x) xyplot (mtcars[,1] ~ x, panel=mypanel,ylab="MPG"))
This all started because I was not able to use the panel function to cycle.
I did not find that this code "worked". Modified it to do so:
mypanel <- function(x,y,...) {
panel.xyplot(x, y, ...)
panel.grid(x=-1, y=-1)
panel.lmline(x,y,col="red",lwd=1,lty=1)
}
data <- mtcars[,2:11]
charts <- lapply(names(data), function(x) { xyplot (mtcars[,1] ~ mtcars[,x],
panel=mypanel,ylab="MPG", xlab=x)})
Needed to remove the 'data[,y]' from the panel function and pass names instead of column vectors so there was something to use for a x-label.
I just discovered the great plyr package and am taking it for a spin.
A question I have is the following: is there some way to access the grouping variables from within d_ply?
Say I have a dataframe df with columns x,y,z, and I would like to plot for each z the values x versus y. If I do the following:
plotxy = function(df, ...) {plot(df$x, df$y, ...)}
d_ply(df, .(z), plotxy(df, main=.(z)))
then the titles that show up on the plots are all "z", and not the values of the z variable. Is there a way to access those values from within d_ply?
EDIT: As #Justin pointed out, the above formulation is wrong because I am passing the whole of df to plotxy. Hence the line
d_ply(df, .(z), plotxy(df, main=.(z)))
should be
d_ply(df, .(z), plotxy, main=.(z))
in order to make sense in terms of my original question (I guess that's also what #joran was hinting at).
However, I realized something else. Even though df gets sliced along z by d_ply, the sub-dataframe that the function receives still has a z column -- simply with always the same value. Hence the problem can apparently be solved as follows:
plotxy = function(df, ...) {plot(df$x, df$y, main=df$z[1])}
d_ply(df, .(z), plotxy)
By way of example, I'll expand on Joran's concern.
df <- data.frame(x=rnorm(100), y=rnorm(100), z=letters[1:10])
lets use your function and see what we get without plyr:
plotxy(df, main=.(z))
versus the maybe more expected(?):
plotxy(df, main=df$z)
However, in you code, you are splitting your data frame on z then sending the whole data.frame df to your function again. Instead you could make a wrapper function:
d_ply(df, .(z), function(ply.df) plotxy(ply.df, main=unique(ply.df$z)))
This way the plotxy function is only seeing the smaller split data.frame ply.df that you pass through the wrapper function.