Argument for "n" when plotting with ggplot2 - r

I am fairly new to R (coming from a Stata-background) and I am finding difficult to deal with some arguments when plotting with ggplot2. Please consider the following:
test <- data.frame(
time=c(1,2,3,1,2,3,1,2,3),
experiment=c(2,1,2,1,1,2,1,2,2)
)
test$time2 <- factor(test$time,
levels=c("1","2","3"),
labels=c("R1", "R2", "R3")
)
test$experiment2 <- factor(test$experiment,
levels=c(1,2),
labels=c("Yes", "No")
)
ggplot(test, aes(test$time2, ..count../3))+
geom_bar(aes(fill=test$experiment2))+
scale_y_continuous(labels=percent)
The above is just a silly example I just made up to ask about how to use "n" (number of observations) properly. If you reproduce the code above you will see that it graphs a stacked barplot (percentages). However, to make it I had to manually do: ..count../3
What I would like to find out in R is how to substitute that "3" by a generic argument. Looking on the Internet could not find anything, and tentatively I tried "N" and "n" to no avail. Thanks a lot for your help, the move from Stata to R is exciting but not as easy as one would think.

Related

R: Cleaning GGally Plots

I am using the R programming language and I am new the GGally library. I followed some basic tutorials online and ran the following code:
#load libraries
library(GGally)
library(survival)
library(plotly)
I changed some of the data types:
#manipulate the data
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
Now I visualize:
#make the plots
#I dont know why, but this comes out messy
ggparcoord(data, groupColumn = "sex")
#Cleaner
ggparcoord(data)
Both ggparcoord() code segments successfully ran, however the first one came out pretty messy (the axis labels seem to have been corrupted). Is there a way to fix the labels?
In the second graph, it makes it difficult to tell how the factor variables are labelled on their respective axis (e.g. for the "sex" column, is "male" the bottom point or is "female" the bottom type). Does anyone know if there is a way to fix this?
Finally, is there a way to use the "ggplotly()" function for "ggally" objects?
e.g.
a = ggparcoord(data)
ggplotly(a)
Thanks
Looks like your data columns get converted to a factor when adding the groupColumn. To prevent that you could exclude the groupColumn from the columns to be plotted:
BTW: Not sure about the general case. But at least for ggparcoord ggplotly works.
library(GGally)
library(survival)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
#I dont know why, but this comes out messy
ggparcoord(data, seq(ncol(data))[!names(data) %in% "sex"], groupColumn = "sex")

error (unused argument) using plyr with lattice xyplot

Hello everybody on stackoverflow,
it's my first question asked here... (well, actually the first one no one had already replied to!).
I'm trying to use lattice xyplot function to plot a big df (2362422 rows), that should be splitted by a variable in several subplots (each of them with about 52 panels).
This is a highly simplified reproduction of the df and of the code I'm using:
library(lattice)
library(plyr)
set.seed(1)
df <- as.data.frame(cbind(x = rnorm(30), y=(1:2), z=rnorm(30), q = c("a","b","c","d","e")))
grpro <- function () {xyplot (x ~ z| q, data=df)}
grpro()
When I try to call the grpro function with d_ply to plot all the subplots based on the y variable, with the following code
d_ply(df, .(y), grpro)
I get the following error
Error in .fun(.data[[i]], ...) : unused argument (.data[[i]])
For what I understand, d_ply function splits the df in several dataframes, in this case two dfs based on the values "1" and "2" of y.
I assume that my code is working on that, and any other argument used in my grpro seems to be useful also when I split the df by y.
So, where am I wrong?
Thanks a lot for your help,
MZ

rCharts Polychart: Adding horizontal or vertical lines to a plot

I'm having some trouble understanding how to customize graphs using the rPlot function in the rCharts Package. Say I have the following code
#Install rCharts if you do not already have it
#This will require devtools, which can be downloaded from CRAN
require(devtools)
install_github('rCharts', 'ramnathv')
#simulate some random normal data
x <- rnorm(100, 50, 5)
y <- rnorm(100, 30, 2)
#store in a data frame for easy retrieval
demoData <- data.frame(x,y)
#generate the rPlot Object
demoChart <- rPlot(y~x, data = demoData, type = 'point')
#return the object // view the plot
demoChart
This will generate a plot and that is nice, but how would I go about adding horizontal lines along the y-axis? For example, if I wanted to plot a green line which represented the average y-value, and then red lines which represented +/- 3 standard deviations from the average? If anybody knows of some documentation and could point me to it then that would be great. However, the only documentation I could find was on the polychart.js (https://github.com/Polychart/polychart2) and I'm not quite sure how to apply this to the rCharts rPlot function in R.
I have done some digging and I feel like the answer is going to have something to do with adding/modifying the layers parameter within the rPlot object.
#look at the slots in this object
demoChart$params$layers
#doing this will return the following output (which will be different for
#everybody because I didn't set a seed). Also, I removed rows 6:100 of the data.
demoChart$params$layers
[[1]]
[[1]]$x
[1] "x"
[[1]]$y
[1] "y"
[[1]]$data
x y
1 49.66518 32.75435
2 42.59585 30.54304
3 53.40338 31.71185
4 58.01907 28.98096
5 55.67123 29.15870
[[1]]$facet
NULL
[[1]]$type
[1] "point"
If I figure this out I will post a solution, but I would appreciate any help/advice in the meantime! I don't have much experience playing with objects in R. I feel like this is supposed to have some similarity to ggplot2 which I also don't have much experience with.
Thanks for any advice!
You can overlay additional graphs onto your rCharts plot using layers. Add values for any additional layers as columns on to your original data.frame. copy_layer lets you use the values from the data.frame in the extra layers.
# Regression Plots using rCharts
require(rCharts)
mtcars$avg <- mean(mtcars$mpg)
mtcars$sdplus <- mtcars$avg + sd(mtcars$mpg)
mtcars$sdneg <- mtcars$avg - sd(mtcars$mpg)
p1 <- rPlot(mpg~wt, data=mtcars, type='point')
p1$layer(y='avg', copy_layer=T, type='line', color=list(const='red'))
p1$layer(y='sdplus', copy_layer=T, type='line', color=list(const='green'))
p1$layer(y='sdneg', copy_layer=T, type='line', color=list(const='green'))
p1
Here are a couple of examples: one from the main rCharts website and the other showing how to overlay a regression line.

R : Bad graphic of ordered boxplot according to median

Here is what I am trying to do : I have a data.frame (data) of 160 rows with 2 variables (fact (8 groups) and response) and I want to do a boxplot of response ~ fact, ordered in increasing order of the medians.
Code :
data <- read.table("box.txt",header=T)
attach(data)
index <- order(tapply(response,fact,median))
ordered <- factor(rep(index,rep(20,8)))
boxplot(response~ordered,notch=T,names=as.character(index),xlab="treatments",ylab="response")
but on the graphic the boxes are badly plotted (not in the right order and with "false" Min, Max, etc...).
I'm using RStudio with R 3.0.2 on Windows 7.
Any clue about what does that mean?
One reproducible and seemingly correct answer would be :
set.seed(1)
data <- data.frame(response=10*rnorm(160), fact=factor(rep(1:8), labels=letters[1:8]))
data$fact <- reorder(data$fact, data$response, median)
boxplot(response~fact, data=data, notch=TRUE, xlab="treatments", ylab="response")
Names on the ticks of the x axis are correct, without further ado.
No idea why it looks 'bad', but the order is wrong because you use order instead of rank to find the index. For the other issues you probably have to make a reproducible example.
The reproducible example is as follows, with two boxplots to compare. In my case the plot (possibly) looks bad because of the devil's ears. Regarding the OP's question, I interpret his phrasing as bad referring to the fact that using order() instead of rank() resulted in other mishap as well (although I wouldn't know why).
data <- data.frame(response=rnorm(160), fact=factor(rep(1:8), labels=letters[1:8]))
boxplot(response~fact, data=data, notch=TRUE, xlab="treatments", ylab="response")
data$ordered <- rank(tapply(data$response, data$fact, median))
boxplot(response~ordered, data=data, notch=TRUE, xlab="treatments", ylab="response")

How to name the "superscriptions" of dimensions in r?

perhaps a dumb question, yet I cannot find an answer.
If I make a mosaic plot with a vcd package so:
library(vcd)
test<-matrix(c(65,31,495,651), ncol=2,byrow=T)
colnames(test)<-c("2010", "2011")
rownames(test)<-c("yes", "now")
mosaic(test, shade=T, legend=T)
it works like a charm except that the superscriptions over the years and the outputs (yes/no) are shown "A" and "B".
I would like to name these "Years" and "Outputs" but I cannot find a parameter for this.
How could I do this? Thanks in advance.
You can specify dimnames this way :
dimnames(test) <- list(foo=colnames(test),bar=rownames(test))
mosaic(test, shade=T, legend=T)
In fact, mosaic is better suited to be applied to contingency tables, where the labels are determined by the table function :
color <- sample(c("red","blue"),10,replace=TRUE)
color2 <- sample(c("yellow","green"),10,replace=TRUE)
tab <- table(color,color2)
mosaic(tab, shade=T)

Resources