adding strip to single dotplot from lattice package - r

I have a situation where I have two different kind of plots and I need to plot them together along with strips on their top stating their heading. I have to use lattice package only thus ggplot2 is not an option. Thanks
I have a sample code here :
library(lattice)
library(latticeExtra)
X<-rnorm(100)
Y<-rnorm(100)
S<- rnorm(500)
df1<-data.frame(X,Y,S)
p1<-dotplot(X~Y, data=df1)
p2<-dotplot(X~S, data=df1)
#combining plots
c(p1,p2)

As mentioned in the comments, you need to stack the variables Y and S and create an additional column indicating what variable each value comes from.
> df2 <- reshape(df1, varying=2:3, v.names='Value', timevar='Var', times=c('Y','S'), direction='long')
> head(df2)
X Var Value id
1.Y -2.3720450 Y 2.3965643 1
2.Y 0.8082862 Y 0.0215850 2
3.Y 0.3774736 Y -0.6385176 3
4.Y 0.7161986 Y -0.3908185 4
5.Y -0.3633583 Y -0.9611222 5
6.Y -0.3484920 Y 3.3387813 6
Then
> dotplot(X ~ Value | Var, data = df2)
should do what you want.

Either:
p1 <- dotplot(X~Y, data=df1)
p2 <- dotplot(X~S, data=df1)
c(
"Here is plot 1" = p1
,"And here second one" = p2
)
or:
p1 <- dotplot(X~Y|"Here is plot 1", data=df1)
p2 <- dotplot(X~S|"And here second one", data=df1)
c(p1, p2)
First one seems better cause you create the strip where you need it.

Related

Boxplot in for-loop over multiple columns in r

I have a dataset with 7 columns and want to plot the 6 latter each with the first one. If done for one, everything works, but apparently I miss something when looping.
Here is my dataframe:
colnames(df) <- c("real", "est1", "est2", "est3", "est4", "est5", "est6")
head(df)
real est1 est2 est3 est4 est5 est6
1 6 1.040217e-05 7.693853e-05 0.0006782929 0.002676282 0.033385059 0.9631730251
2 6 1.065455e-05 7.880501e-05 0.0006947352 0.002740934 0.034161665 0.9623132055
3 5 1.037427e-03 7.607541e-03 0.0624143732 0.185340034 0.536009785 0.2075908392
4 1 2.345527e-01 4.855757e-01 0.2374464964 0.032691816 0.008846185 0.0008870667
5 5 3.506084e-04 2.585847e-03 0.0222474072 0.079120851 0.458854341 0.4368409455
6 3 1.710639e-03 1.247417e-02 0.0978889632 0.250555703 0.500355545 0.1370149767
and the code
boxplot( est1 ~ real, data=df, main="Estimated Probability for Category 1 Given the Real Categories")
works fine, but if I do the exactly same as a loop, it doesn't:
looper <- c("est1", "est2", "est3", "est4", "est5", "est6") #to get the column names to loop over
counter <- 0 # for the boxplot's title
for (i in all_of(looper)){
counter <- counter +1
boxplot( i ~ real, data=df,
main = paste("Estimated Probability for Category",counter,"Given the Real Categories")
)
}
I suppose it has to do with the way i is used, I tried "i" and also with one ` and I would always get one of the following errors:
For i: Fehler in stats::model.frame.default(formula = i ~ real, data = eval.m1_sim) :
Variablenlängen sind unterschiedlich (gefunden für 'real')
For "i" or ` : Fehler in terms.formula(formula, data = data) :
ungültiger Term in Modellformel
What am I missing?
You could go via column numbers:
# random example data as no reproducible example was given
df <- data.frame(
real = sample(1:4, 20, TRUE),
one = runif(20),
two = runif(20),
three = runif(20))
)
# graphics paramaters so we see all at once
par(mfrow = c(3,1), mar = c(2, 2, 1, 1))
# the easiest way is through column numbers
for(column in 2:4)
boxplot(df[[column]] ~ df$real)
Another option:
library(tidyverse)
df %>%
pivot_longer(-real) %>%
mutate(real = factor(real)) %>%
ggplot(aes(real, value)) +
geom_boxplot() +
facet_wrap(~name)

paired data for a facet_wrap

Imagine I have data foo below. Each row contains a measurement (y) on a species and each species is paired with another (species.pair). So in the example below, species a is paired with e, b with f, and so on. The number of observations for each species varies. I'd like to plot the density of each species's distribution along with its partner's distribution in its own facet. Below I hand coded this with the column sppPairs. The species are all unique and each has a match in species.pair. I'm unsure of how to make the grouping column sppPairs below. I'm sure there is some clever way to do this with {dplyr} but I can't figure out what to do. Some kind of pasting species to species.pair I imagine? Any help much appreciated.
foo <- data.frame(species = rep(letters[1:8],each=10),
species.pair = rep(letters[c(5:8,1:4)],each=10),
y=rnorm(80))
# species and species pair match exactly
all(unique(foo$species) %in% unique(foo$species.pair))
# what I want
foo$sppPairs <- c(rep("a:e",10),
rep("b:f",10),
rep("c:g",10),
rep("d:h",10),
rep("a:e",10),
rep("b:f",10),
rep("c:g",10),
rep("d:h",10))
p1 <- ggplot(foo,aes(y,fill=species))
p1 <- p1 + geom_density(alpha=0.5)
p1 <- p1 + facet_wrap(~sppPairs)
p1
Yes, you can use apply on the appropriate columns to paste the sorted elements together in the correct order (otherwise a:e is different from e:a and so on, and you end up with 8 groups instead of 4):
library(ggplot2)
foo <- data.frame(species = rep(letters[1:8], each = 10),
species.pair = rep(letters[c(5:8, 1:4)], each = 10),
y = rnorm(80))
foo$sppPairs <- apply(foo[c("species", "species.pair")], 1,
function(x) paste(sort(x), collapse = ":"))
ggplot(foo, aes(y, fill = species)) +
geom_density(alpha = 0.5) +
facet_wrap(~sppPairs)
Created on 2020-10-05 by the reprex package (v0.3.0)

How to use ggplot with prop.table(table(x)?

First, I have a data with two categorical variables into like this:
nombre <- c("A","B","C","A","D","F","F","H","I","J")
sexo <- c(rep("man",4),rep("woman",6))
edad <- c (25,14,25,76,12,90,65,45,56,43)
pais <- c(rep("spain",3),rep("italy",4),rep("portugal",3))
data <- data.frame(nombre=nombre,sexo=sexo,edad=edad,pais=pais)
If I use:
prop.table(table(data$sexo,data$pais), margin=1)
I can see the relative frequency of the levels, for example for Italy (Man=0.25 Woman=0.5)
but the problem is that when I try to plot the prop.table(table(x)) I get something different
ggplot(as.data.frame(prop.table(table(data),margin=1)), aes(x=pais ,y =Freq, fill=sexo))+geom_bar(stat="identity")
On the Y axis from 0 to 3 and for example in the bar Italy (Woman=2 Man=2.5)
I don't need that (and I don't know what is showing), I want the same with as I had with the table of the prop.table(table(x))
I think the problem is something related with the margin=1
Thanks you!
You need to make the same table
tab = prop.table(table(data$sexo,data$pais), margin=1)
tab = as.data.frame(tab)
Then plot:
ggplot(tab,aes(x=Var2,y=Freq,fill=Var1)) + geom_col()
Or simply:
barplot(prop.table(table(data$sexo,data$pais), margin=1))
You're probably looking for something like position = "dodge"
If I run the following on your data :
P <- prop.table(table(data$sexo,data$pais), margin=1)
ggplot(as.data.frame(P), aes(x = Var2, y = Freq, fill = Var1)) +
geom_bar(stat="identity", position = "dodge")
I output the following graph :

Warping labels in tree plots

I am using several tree plots (ctree, evtree, rpart, chaid) and I rely on categorical data. Levels of data are described with text labels.
In plot, it is not clear whether the displayed text belongs to the left or right node.
Is it possible to either warp the text labels in plot, or provide slightly different vertical alignment for the text displayed in left and right node?
As requested, here is a code producing such an issue in the plot:
<- data.frame(
y = as.factor(sample(1:3,200,r=T)),
x1 = as.factor(sample(1:3,200,r=T)),
x2 = as.factor(sample(1:3,200,r=T)),
x3 = as.factor(sample(1:3,200,r=T)),
x4 = as.factor(sample(1:3,200,r=T))
)
Df1[1:5] <- lapply(Df1[1:5], function(x) factor(x, levels = c(1,2,3),labels = c("long long long long long text","text1","lorem ipsum dolor")))
library("partykit")
library("rpart")
library("evtree")
library("CHAID")
rp <- rpart(y ~ .,data=Df1, minbucket=30)
plot(as.party(rp))
ct <- ctree(y~ . , data = Df1, minbucket=50)
plot(ct)
ev <- evtree(y ~ ., data = Df1, maxdepth = 5)
plot(ev)
ctrl <- chaid_control(minsplit=90, minbucket=30, minprob=0.05,alpha2=0.01, alpha3=-1, alpha4=0.01)
chaid1 <- chaid( y ~ ., data= Df1, control=ctrl)
plot(chaid1,cex=0.6)
Can't see how this would be possible using parameters in ?plot.party. You could, however, add \n (new line) to factor levels.
levels(Df1$x2)[1] <- "long long long \n long long text"
plot(as.party(rp))

Generating "2D" histogram in R

I am new to R and I would like to know how to generate histograms for the following situation :
I initially have a regular frequency table with 2 columns : Column A is the category (or bin) and Column B is the number of cases that fall in that category
Col A Col B
1-10 7
11-20 4
21-30 5
From this initial frequency table, I create a table with 3 columns : Col A is again the category (or bin), but now Col B is the "fraction of total cases", so for the category 1-10, column B will have the value 7/(7+4+5) = 7/16 . Now there is also a third column, Col C which is "fraction of total cases falling between the categories 1-20", so for 1-10, the value for Col C would be 7/(7+4) = 7/11. The complete table would look like below :
Col A Col B Col C
1-10 7/16 7/11
11-20 4/16 4/11
21-30 5/16 0
How do I generate a histogram from this 3-column table above ? My X axis should be the bin (1-10, 11-20 etc.) and my Y axis should be the fraction, however for every bin I have two fractions (Col B and Col C), so there will be two fraction "bars" for every bin in the histogram.
Any help would be greatly appreciated.
The data:
dat <- data.frame(A = c("1-10", "11-20", "21-30"), B = c(7, 4, 5))
Now, calculate the proportions and create a new object:
dat2 <- rbind(B = dat$B/sum(dat$B), C = c(dat$B[1:2]/sum(dat$B[1:2]), 0))
colnames(dat2) <- dat$A
Plot:
barplot(dat2, beside = TRUE, legend = rownames(dat2))
Your title should be changed to "Dodged Bar Chart" instead of 2D histogram, because histograms have continuous scale on x axis unlike bar chart and they are basically used for comparing the distributions of univariate data or the distributions of univariate data modeled on the dependent factor. You are trying to compare colB vs colC which can be effectively visualized using a 2D scatter plot but not with bar chart. The better way to compare the distributions of colB and colC using histograms would be plotting two histograms separately and check the change in location of the data points.
If you want to compare distributions of colB and colC, try the following code: I did round up the values for getting a reasonable data per your data description. Notice a random sampling by permutation is happening and everytime, you run the same code, there will be slight change in the distribution, but that will not affect the inference of distribution between colB and colC.
library("ggplot2")
# 44 datapoints between 1-10
a <- rep(1:10, 4)
a <- c(a, sample(a, size=4, replace=FALSE))
# 25 datapoints between 11-20
b <- rep(11:20, 2)
b <- c(b, sample(b, size=5, replace=FALSE))
# 31 datapoints between 21-30
c <- rep(21:30, 3)
c <- c(c, sample(c, size=1, replace=FALSE))
colB <- c(a, b, c)
# 64 datapoints between 1-10
a <- rep(1:10, 6)
a <- c(a, sample(a, size=4, replace=FALSE))
# 36 datapoints between 11-20
b <- rep(11:20, 3)
b <- c(b, sample(b, size=6, replace=FALSE))
colC <- c(a, b)
df <- data.frame(cbind(colB, colC=colC))
write.table(df, file = "data")
data <- read.table("data", header=TRUE)
data
ggplot(data=data, aes(x=colB, xmin=1, xmax=30)) + stat_bin(binwidth = 1)
ggplot(data=data, aes(x=colC, xmin=1, xmax=30)) + stat_bin(binwidth = 1)
# if you want density distribution, then you can try something like this:
ggplot(data=data, aes(x=colB, y = ..density.., xmin=1, xmax=30)) + stat_bin(binwidth = 1)
ggplot(data=data, aes(x=colC, y = ..density.., xmin=1, xmax=30)) + stat_bin(binwidth = 1)
HTH
-Sathish

Resources