Plotting Two Factors on the same Graph - r

Say I have two factors and I want to graph them on the same plot, both factors have the same levels.
s1 <- c(rep("male",20), rep("female", 30))
s2 <- c(rep("male",10), rep("female", 40))
s1 <- factor(s1, levels=c("male", "female"))
s2 <- factor(s2, levels=c("male", "female"))
I would have thought that using the table function would have produced the correct result for graphing but it pops out.
table(s1, s2)
s2
s1 male female
male 10 10
female 0 30
So really two questions, what is the table function doing to get this result and what other function can i use to create a graph with 2 series using functions with the same levels?
Also if it is a factor I'm using barplot2 in the gplots package to graph it.

You can achieve slightly more detailed results with lattice package:
s1 <- factor(c(rep("male",20), rep("female", 30)))
s2 <- factor(c(rep("male",10), rep("female", 40)))
D <- data.frame(s1, s2)
library(lattice)
histogram(~s1+s2, D, col = c("pink", "lightblue"))
Or if you want males/females side by side for easier comparison:
t1 <- table(s1)
t2 <- table(s2)
barchart(cbind(t1, t2), stack = F, horizontal = F)

From ?table:
‘table’ uses the cross-classifying factors to build a contingency
table of the counts at each combination of factor levels.
When you do table(s1,s2) what happens is that the function considers s1 and s2 as paired results. Effectively it tells you that if you were to take cbind(s1,s2) then there would be 10 rows of male-male, 10 of male-female and so on.
To understand this consider a very trivial example:
a <- c("M","M","F","F")
b <- c("F","F","M","M")
table(a,b)
b
a F M
F 0 2
M 2 0
What you should do is:
t1 <- table(s1)
t2 <- table(s2)
barplot(cbind(t1,t2), beside=TRUE, col=c("lightblue", "salmon"))

Two options producing slightly different forms of plots are
plot(s1, s2)
and
plot(table(s1,s2))
The former is a spineplot a special case of the mosaic plot, which the plot method for table produces (the second example). See ?spineplot and ?mosaicplot for more details and you can use these functions directly, rather than the generic plot() if you wish.
Also take a look at the mosaic() function in the vcd package on CRAN by Meyer et al (Link to vcd on CRAN)
table() is producing the contingency table for the two factors.

Hmm.. I don't think creating a contingency table is what Cameron was looking for. If I understood him correctly, I think he wanted to create a data frame with two variables in it, where s1 and s2 seems to be vectors of the same size. (length(s1)==length(s2)).
In this case, he would simply need to create a "table" (I think he meant data.frame) using:
df = data.frame(s1=s1, s2=s2);
And then plot the 2 series in the same plot.
So as for the second question of plotting these things, I'd use matplot. For example:
matplot(1:10, data.frame(a=rnorm(10), b=rnorm(10)), type="l", lty=1, lwd=1, col=c("blue","red"))
Given that he has his data of 2 vectors organized in a single data.frame named "df", he can just do something like:
matplot(df, type="l", lty=1, lwd=1, col=c("blue","red"))
Hope this helps.

Related

Combining dotplot R

Im trying to combine two plots into the same plot in R.
My code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd("/Users/mikkelastrup/Dropbox/Master/RING R")
#### Read data & Converting factors ####
dat <- read.table("R SUM kopi.txt", header=TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$rep <- as.factor(dat$rep)
dat$fly <- as.factor(dat$fly)
str(dat)
mtdata <- droplevels(dat[dat$Line=="20",])
mt1data <- droplevels(mtdata[mtdata$rep=="1",])
tdata <- melt(mt1data, id=c("rep","Conc","Sex","Line","Vial", "fly"))
tdata$variable <- as.factor(tdata$variable)
tfdata <- droplevels(tdata[tdata$Sex=="f",])
tmdata <- droplevels(tdata[tdata$Sex=="m",])
####Plotting####
d1 <- dotplot(tfdata$value~tdata$variable|tdata$Conc,
main="Y Position over time Line 20 Female",
xlab="Time", ylab="mm above buttom")
d2 <- dotplot(tmdata$value~tdata$variable|tdata$Conc,
main="Y Position over time Line 20 Male",
xlab="Time", ylab="mm above buttom")
grid.arrange(d1,d2,ncol=2)
And that looks like this:
Im trying to combine it into one plot, with two different colors for male and female, i have tried to write it into one dotplot separated by a , and or () but that dosen't work and when i dont split the data and use tdata instead of tfdata and tfmdata i get all the dots in the same color. Im open to suggestions, using another package or another way of plotting the data that still looks somewhat like this since im new to R
All you need to do is to use the group parameter.
dotplot(value~variable|Conc, group=Sex, data=tdata,
main="Y Position over time Line 20 All",
xlab="Time", ylab="mm above buttom")
Also, don't use the $ notation in these functions; notice that you're using value from tfdata but value and variable from tdata. This is a problem because there's twice as many rows in tdata! Instead, use the data argument to specify which data frame to get the variables from.

R: plot() uses lines in a scatterplot after as.data.frame()

I want to create a simple scatterplot using a table with 2 variables.
The Table looks like this:
> freqs
Var1 Freq
1 1 200
2 2 50
3 3 20
I got it using freqs <- as.data.frame(table(data$V2)) to compute the frequency of numbers in another table.
What I do right now is:
plot(freqs, log="xy", main="Frequency of frequencies",
xlab="frequencies", ylab="frequency of frequencies")
The problem is, that I get a plot with lines, not dots, and I don't know why.
For another list plot() behaved differently and used dots.
It looks like this:
I know that plot depends on the datatype that it gets.
So is the problem in the way I generate freqs?
Edit:
Here is the data as requested: link
Steps were:
data <- read.csv(file="out-kant.txt",head=FALSE,sep="\t")
freqs <- as.data.frame(table(data$V2))
plot(freqs,log="xy",main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
It seems like the type of one of your variables is not set as an integer. You get a scatterplot when both x and y are integers. For example, when you run this code you will get a scatterplot, because it automatically sets both variables as integers:
freqs <- read.table(header=TRUE, text='Var1 freq
1 200
2 50
3 20')
plot(freqs, log="xy", main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
Check what type your variables are with:
typeof(freqs$freq)
typeof(freqs$Var1)
Then, if it is not an integer, fix it with:
freqs$freq <- as.integer(freqs$freq)
freqs$Var1 <- as.integer(freqs$Var1)
EDIT: So I managed to reproduce your issue when I ran:
freqs$Var1 <- as.factor(freqs$Var1)
plot(freqs, log="xy", main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
Perhaps your Var1 variable is specified as a factor. Try running:
freqs$Var1 <- as.numeric(freqs$Var1)
EDIT2: Used the above code to make freqs$Var1 numeric on the data provided on the main question's edit, which fixed the issue.

How can I make 3d plot with stacked 2d plot?

I want to plot as below. I tried to search several packages and plot functions but I couldn't find a solution.
My data has four columns.
ID F R M
1 2 3 4
2 4 6 7
...
I want to see the relationship between M and R with respect to each F value (1, 2, 3, ...). So, I'd like F along the x-axis, R along the y-axis, and M as the z-axis as in the below graph.
Thanks.
You can do this kind of thing with lattice cloud plots, using panel.3dpolygon from latticeExtra.
library(latticeExtra)
# generating random data
d <- data.frame(x=rep(1:40, 7), y=rep(1:7, each=40),
z=c(sapply(1:7, function(x) runif(40, 10*x, 10*x+20))))
# define the panel function
f <- function(x, y, z, groups, subscripts, ...) {
colorz <- c('#8dd3c7', '#ffffb3', '#bebada', '#fb8072', '#80b1d3',
'#fdb462', '#b3de69')
sapply(sort(unique(groups), decreasing=TRUE), function(i) {
zz <- z[subscripts][groups==i]
yy <- y[subscripts][groups==i]
xx <- x[subscripts][groups==i]
panel.3dpolygon(c(xx, rev(xx)), c(yy, yy),
c(zz, rep(-0.5, length(zz))),
col=colorz[i], ...)
})
}
# plot
cloud(z~x+y, d, groups=y, panel.3d.cloud=f, scales=list(arrows=FALSE))
I'm sure I don't need to loop over groups in the panel function, but I always forget the correct incantation for subscripts and groups to work as intended.
As others have mentioned in comments, this type of plot might look snazzy, but can obscure data.

ploting large number of time series with xyplot

Here is a minimal example of the type of data I'm
strugling to plot:
These curves are drawn from two processes.
library("lattice")
x0<-matrix(NA,350,76)
for(i in 1:150) x0[i,]<-arima.sim(list(order=c(1,0,0),ar=0.9),n=76)
for(i in 151:350) x0[i,]<-arima.sim(list(order=c(1,0,0),ar=-0.9),n=76)
I'd like to plot them as line plots in a lattice made of two boxes. The box located above
would contain the first 150 curves (in orange) and the box below should display
the next 200 curves (which should be in blue). I don't need a
label or legend. I've tried to use the example shown on the man-page:
aa<-t(x0)
colnames(aa)<-c(paste0("v0_",1:150),paste0("v1_",1:200))
aa<-as.ts(aa)
xyplot(aa,screens=list(v0_="0","1"),col=list(v0_="orange",v1_="blue"),auto.key=FALSE)
but somehow it doesn't work.
This will do without additional factors (yet agstudy's solution is not so much of a hack like this one):
# This is equivalent to your for-loops, use whatever you prefer
x0 <- do.call(rbind, lapply(1:350, function(i) {
arima.sim(list(order=c(1,0,0), ar=ifelse(i <= 150, 0.9, -0.9)), n=76)
}))
plotStuff <- function(indices, ...) {
plot.new()
plot.window(xlim=c(1, ncol(x0)), ylim=range(x0[indices,]))
box()
for (i in indices)
lines(x0[i,], ...)
}
par(mfrow=c(2,1), mar=rep(1,4)) # two rows, reduced margin
plotStuff(1:150, col="orange")
plotStuff(151:350, col="blue")
You should put your data in the long format like this:
Var1 Var2 value group
1 1 v0_1 2.0696016 v0
2 2 v0_1 1.3954414 v0
..... ..........
26599 75 v1_200 0.3488131 v1
26600 76 v1_200 0.2957114 v1
For example using reshape2 :
library(reshape2)
aa.m <- melt(aa)
aa.m$group <- gsub('^(v[0-9])_(.*)','\\1',aa.m$Var2)
xyplot(value~Var1|group,data=aa.m,type='l',groups=group)

Densityplots using colwise - different colors for each line?

I need a plot of different density lines, each in another color. This is an example code (but much smaller), using the built-in data.fame USArrests. I hope it is ok to use it?
colors <- heat.colors(3)
plot(density(USArrests[,2], bw=1, kernel="epanechnikov", na.rm=TRUE),col=colors[1])
lines1E <- function(x)lines(density(x,bw=1,kernel="epanechnikov",na.rm=TRUE))
lines1EUSA <- colwise(lines1E)(USArrests[,3:4])`
Currently the code produces with colwise() just one color. How can I get each line with another color? Or is there ab better way to plot several density lines with different colors?
I don't quite follow your example, so I've created my own example data set. First, create a matrix with three columns:
m = matrix(rnorm(60), ncol=3)
Then plot the density of the first column:
plot(density(m[,1]), col=2)
Using your lines1E function as a template:
lines1E = function(x) {lines(density(x))}
We can add multiple curves to the plot:
colwise(lines1E)(as.data.frame(m[ ,2:3]))
Personally, I would just use:
##Added in NA for illustration
m = matrix(rnorm(60), ncol=3)
m[1,] = NA
plot(density(m[,1], na.rm=T))
sapply(2:ncol(m), function(i) lines(density(m[,i], na.rm=T), col=i))
to get:

Resources