How to plot three categorical values? - r

I have data like the following (three categorical values):
data <- data.frame(Comp = (rep(c('Oral','Text'), each = 8)),
Cat = rep(c('Declative','Non-declative'), 4),
Type = rep(c('Free','Used'), each = 4))
I want to have an interaction.plot or barplot with these three categorical values in R. Would you give me any tip?

Although your supplied data don't produce a good display since the cells are all equal sized you can use base r to make a mosaic plot.
data <- data.frame(Comp = (rep(c('Oral','Text'), each = 8)),
Cat = rep(c('Declative','Non-declative'), 4),
Type = rep(c('Free','Used'), each = 3))
mosaicplot(table( data$Comp,data$Cat,data$Type))
Here's a variant on your data that shows it a little better.
data <- data.frame(Comp = (rep(c('Oral','Text'), each = 8)),
Cat = rep(c('Declative','Non-declative'), 4),
Type = c(rep(c('Free','Used'), each = 3), c('Used', 'Used')))
mosaicplot(table( data$Comp,data$Cat, data$Type))
Of course you can go to specialized packages to get other variations, vcd is one but if you search you will find others.

Related

How do I display box-plots of different data sets above each other in R?

new to R and just wondering is it possible to display these two box plots either side by side or above each other to allow for comparison, rather then producing two seperate box plots.
PBe <- PB$`Enterococci (cfu/100ml)`
BRe <- BR$`Enterococci (cfu/100ml)`
boxplot(BRe, horizontal = TRUE, col = "3", outline=FALSE)
boxplot(PBe, horizontal = TRUE, col = "4", outline=FALSE)
You could use the boxplot function directly:
boxplot(list(BRe = BRe, PBe = PBE), col = c(3, 4))
You could add all the other parameters as you wish
We obviously don't have your data, so let's make a minimal reproducible example.
First we create two data frames, one called PB and one called BR. Each has a numeric column called Enterococci (cfu/100ml) containing random numbers between 100 and 1000:
set.seed(1)
PB <- data.frame(a = sample(100:1000, 100, TRUE))
BR <- data.frame(a = sample(100:1000, 50, TRUE))
names(PB) <- "Enterococci (cfu/100ml)"
names(BR) <- "Enterococci (cfu/100ml)"
Now, if we extract these columns as per your code, we can concatenate them together using c
PBe <- PB$`Enterococci (cfu/100ml)`
BRe <- BR$`Enterococci (cfu/100ml)`
value <- c(PBe, BRe)
Now, the trick is to create another vector that labels which data frame these numbers originally came from as a factor variable. We can do that with:
dataset <- factor(c(rep("PB", nrow(PB)), rep("BR", nrow(BR))))
And now we can just call plot on these two vectors. This will automatically give us a side-by-side boxplot:
plot(dataset, value, xlab = "Data set", ylab = "Enterococci (cfu/100ml)")
If you would prefer it to be horizontal, we can do:
boxplot(value ~ dataset, horizontal = TRUE,
ylab = "Data set",
xlab = "Enterococci (cfu/100ml)")

Saving multiple pairwise correlation plots in a pdf via for loop

I am trying create multiple pairwise correlation plots (pairs.panels from the psych package is the function i use) by looping over a list of datasets and saving that in multiple pdfs with changing names. It can also be one pdf with titles (I dont get the main = "xyz" function in pairs.panels to run).
It can also be another format (jpeg, png...whatever)
My dataset (called "df" here) has 8 variables, one of which is a grouping variable (hence the "group" variable). I want a correlation matrix of the variables 2 to 7, for each group one plot (I know these look terrible here, my skills in simulating data are terrible). The dataset (df2) is a list of datasets (here its only 3), I just devided the dataset df by group.
I tried using the pdf() function inside and outside the loop, using two loops, one for creating the graphs and one for the pdf device, I tried making a list of graphs and using that...
I can get the plot to run, I can save the plot, if I use the corresponding numbers 1,2,3 etc in the brackets instead of i, which makes me think its something with the loop.
If you have some kind of explanation for what I need to do, that would be great!
# shitty data simulation
set.seed(16)
df <- data.frame(group = rep(letters[1:3], each = 3),
x = rnorm(n = 9, mean = 0, sd = 1),
y = rnorm(n = 9, mean = 0, sd = 1),
z = rnorm(n = 9, mean = 0, sd = 1),
a = rnorm(n = 9, mean = 0, sd = 1),
b = rnorm(n = 9, mean = 0, sd = 1),
d = rnorm(n = 9, mean = 0, sd = 1))
df2 <- as.list(split(df, df$group))
for(i in 1:3){ # setting the loop for the pdf device
pdf(paste("myplot", i, ".pdf"), onefile = F) # opening the device, plots are calles plot1, plot2 etc.
for(i in 1:length(df2)){
pairs.panels(df2[[i]][,c(2:7)], stars = TRUE, pch = ".")
mtext(side = 3, line = 3, df2[[i]]$group) # the only option I manage to get a header in there
}
dev.off()
}
I either cannot open the file (sometimes it takes hours to finish and then I only get 2 plots). I usually dont get error messages pertaining the loop (only the correlations, but that I know already and some of the variables have missing data, it still can print the plot).
Sometimes I get a big file with one plot multiple times and no title. and sometimes its just corrupted.

Simulating a discrete distribution on a different scale in R

I'm new to R and have this question. As mentioned in the title, I have a distribution of reported dice number from students. In this task, they are given a dice with 6 faces (from 1-6) and are asked to throw it in private. The data are plotted as in the picture.
However, I wonder if it's possible that I can use this data to simulate the situation where they are given a dice with 10 faces instead (from 1-10)? How can I achieve this in R?
Ok second attempt if you want to use your existing six-sided die data. I use the snpackage to fit a skewed normal distribution to your existing data and then scale it to represent a ten-sided die and make it discrete using round.
First I will simulate your data
set.seed(9999)
n=112
a = rnorm( 42, 3, 1 )
b = rnorm( 70, 5, 0.5 )
dat = round(c( a, b))
dat[!(dat %in% 1:6)] = NA
dat=dat[complete.cases(dat)]
hist(dat,breaks = seq(0.5, 6.5,1), col = rgb(0,0,1,0.25))
Just set dat as your existing data if you want.
Now to parametise the distribution using the sn package. (You can try to fit other distributions if you prefer)
require(sn)
cp.est = sn.mple(y=dat,opt.method = "nlminb")$cp
dp.est = cp2dp(cp.est,family="SN")
##example to sample from the distribution and compare to existing
sim = rsn(n, xi=dp.est[1], omega=dp.est[2], alpha=dp.est[3])
sim = round(sim)
sim[!(sim %in% 1:6)] = NA
hist(sim,breaks = seq(0.5, 6.5,1), col = rgb(1,0,0,0.25), add=T)
Now scale the distribution to represent a ten-sided die.
sim = rsn(n, xi=dp.est[1], omega=dp.est[2], alpha=dp.est[3])/6*10
sim <- round(sim)
sim[!(sim %in% 1:10)] = NA
hist(sim,breaks = seq(0.5, 10.5,1), col = rgb(0,1,0,0.25))
To simulate 112 students rolling a ten-sided die and plotting the results in histogram:
n=112
res = sample(1:10, size = n, replace = T)
hist(res)

Printing plot depending on variable conditions on 2 pdf pages

I'am trying to print a plot, depending on a variable with 12 terms. This plot is the result of cluster classification on sequences, using OM distance.
I print this plot on one pdf page :
pdf("YYY.pdf", height=11,width=20)
seqIplot(XXX.seq, group=XXX$variable, cex.legend = 2, cex.plot = 1.5, border = NA, sortv =XXX.om)
dev.off()
But the printing is to small ... so i try to print this on 2 pages, like this :
pdf("YYY.pdf", height=11,width=20)
seqIplot(XXX.seq, group=XXX$variable, variable="1":"6", cex.legend = 2, cex.plot = 1.5, border = NA, sortv =XXX.om)
seqIplot(XXX.seq, group=XXX$variable, variable="7":"12", cex.legend = 2, cex.plot = 1.5, border = NA, sortv = XXX.om)
dev.off()
But it doesn't work ... Do you know how I can ask R to separate terms' variables into two groups, so as to print 6 graphics per pdf page ?
The solution is to plot separately the subset of groups you want on each page. Here is an example using the biofam data provided by TraMineR. The group variable p02r04 is religious participation which takes 10 different values.
library(TraMineR)
data(biofam)
bs <- seqdef(biofam[,10:25])
group <- factor(biofam$p02r04)
lv <- levels(group)
sel <- (group %in% lv[1:6])
seqIplot(bs[sel,], group=group[sel], sortv="from.end", withlegend=FALSE)
seqIplot(bs[!sel,], group=group[!sel], sortv="from.end")
If you are sorting the index plot with a variable you should indeed take the same subset of the sort variable, e.g. sortv=XXX.om[sel] in your case.
I don't know if I understood your question, you could post some data in order to help us reproduce what you want, maybe this helps. To plot six graphs in one page you should adjust the mfrow parameter, is that what you wanted?
pdf("test.pdf")
par(mfrow=c(3,2))
plot(1:10, 21:30)
plot(1:10, 21:30, pch=20)
hist(rnorm(1000))
barplot(VADeaths)
...
dev.off()

Making a 3D surface from time series data in R

I have a large data set which I would like to make a 3D surface from. I would like the x-axis to be the date, the y-axis to be the time (24h) and the z-axis (height) to be a value I have ($). I am a beginner with R, so the simpler the better!
http://www.quantmod.com/examples/chartSeries3d/ has a nice example, but the code is way to complicated for my skill level!
Any help would be much appreciated - anything I have researched so far needs to have the data sorted, which is not suitable I think.
Several options present themselves, persp() and wireframe(), the latter in package lattice.
First some dummy data:
set.seed(3)
dat <- data.frame(Dates = rep(seq(Sys.Date(), Sys.Date() + 9, by = 1),
each = 24),
Times = rep(0:23, times = 10),
Value = rep(c(0:12,11:1), times = 10) + rnorm(240))
persp() needs the data as the x and y grid locations and a matrix z of observations.
new.dates <- with(dat, sort(unique(Dates)))
new.times <- with(dat, sort(unique(Times)))
new.values <- with(dat, matrix(Value, nrow = 10, ncol = 24, byrow = TRUE))
and can be plotted using:
persp(new.dates, new.times, new.values, ticktype = "detailed", r = 10,
theta = 35, scale = FALSE)
The facets can be coloured using the col argument. You could do a lot worse than study the code for chartSeries3d0() at the page you linked to. Most of the code is just drawing proper axes as neither persp() nor wireframe() handle Date objects easily.
As for wireframe(), we
require(lattice)
wireframe(Value ~ as.numeric(Dates) + Times, data = dat, drape = TRUE)
You'll need to do a bit or work to sort out the axis labelling as wireframe() doesn't work with objects of class "Date" at the moment (hence the cast as numeric).

Resources