Given the following example and chart, how would one be able to plot the exact value of the x axis next to each points?
x <- mtcars[order(mtcars$mpg),] # sort by mpg
x$cyl <- factor(x$cyl) # it must be a factor
x$color[x$cyl==4] <- "red"
x$color[x$cyl==6] <- "blue"
x$color[x$cyl==8] <- "darkgreen"
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl,
main="Gas Milage for Car Models\ngrouped by cylinder",
xlab="Miles Per Gallon", gcolor="black", color=x$color)
I tried modifying the labels argument, but it only adjusts the y axis label name.
You'll need to sort by category, then by x. Then you can use text as Ricardo suggests, accounting for the breaks between categories.
x <- mtcars[order(-mtcars$cyl, mtcars$mpg),]
# sort by category, then by position within category
# As above
x$cyl <- factor(x$cyl) # it must be a factor
x$color[x$cyl==4] <- "red"
x$color[x$cyl==6] <- "blue"
x$color[x$cyl==8] <- "darkgreen"
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl,
main="Gas Milage for Car Models\ngrouped by cylinder",
xlab="Miles Per Gallon", gcolor="black", color=x$color)
# Adding text
text(x = x$mpg,
y = 1:nrow(x) + ifelse(x$cyl == "6", 2, ifelse(x$cyl == "4", 4, 0)),
labels= x$mpg,
cex = 0.5,
pos = 4)
Related
I need to simulate 400 observations pairs of observations and plot a scatter plot of X1, X2 of the pairs of observations with a different set of colors in R. I used the following code below but I do not believe it is correct.
group <- rbinom(200, 1, 0.3) + 0 # Create grouping variable
group_pch <- group # Create variable for symbols
group_pch[group_pch == 1] <- 16
group_pch[group_pch == 0] <- 16
group_col <- group # Create variable for colors
group_col[group_col == 1] <- "red"
group_col[group_col == 0] <- "green"
plot(x, y, # Scatterplot with two groups
pch = group_pch,
col = group_col)
First create a dataframe with 400 x- and y-values, here (X1, X2). I created the values by using random numbers generated with rnorm(), see help(rnorm) for details. Then I created a third variable, which returns the label higher if a respective x (X1) value is greater than 3 (mean) and lower if not. Again, see help(cut) for further information.
Finally, with plot(x = df$X1, y = df$X2) a simple scatterplot can be visualised. Using the argument col = df$Color, the points are seperated by color according to the breaks = c(-Inf, 3, +Inf).
Unfortunately, from your question it is not clear how many colors/seperations/groups/... are required, and which rule should be used to differentiate between them.
# if you need to submit your code:
set.seed(1209) # to make results reproducible; see:
# ?set.seed()
# Create dataframe with 400 x and y-values
df <- data.frame(
X1 = rnorm(400, mean=3, sd=1),
X2 = rnorm(400, mean=5, sd=3))
# Create new variable (color) for plotting
df$Color <- cut(df$X1, breaks = c(-Inf, 3, +Inf),
labels = c("lower", "higher"),
right = FALSE)
# Plot #1
# Plot the points and differentiate X1 smaller/greater 3 by color
plot(x = df$X1, y = df$X2,
col = df$Color,
main = "Plot Title",
pch = 19, cex = 0.5)
If you are asked to use specific colors, then do the following:
# class(df$Color) # returns factor - perfect!
preferredColors <- c("red", "green")[df$Color]
# you need as many colors as labels, here 2 (lower, higher)
# Plot #2
plot(x = df$X1, y = df$X2,
col = preferredColors,
main = "Plot Title",
xlab = "Description of x-axes",
ylab = "Description of y-axes",
pch = 19, cex = 0.5)
Just for illustration; copy previous code only:
# output first three rows of the df for inspection purpose:
head(df, n=3)
#> X1 X2 Color
#> 1 2.450875 5.6721845 lower
#> 2 5.582115 4.8569917 higher
#> 3 2.324129 -0.3660018 lower
Created on 2021-09-12 by the reprex package (v2.0.1)
Output, Plot #2:
I am trying to include a legend for a scatterplot where size of plot indicates number of pairings
freqData <- as.data.frame(table(galton$child, galton$parent))
names(freqData) <- c("child", "parent", "freq")
plot(as.numeric(as.vector(freqData$parent)),
as.numeric(as.vector(freqData$child)),
pch = 21, col = "black", bg = "lightblue",
cex = .10 * freqData$freq,
xlab = "parent", ylab = "child")
legend("bottomright","(freqData)",pch=21, title="freqData")
Changing the size of points in the legend can be done by passing a vector of pt.cex values to legend(). The following code was used to generate the sample plot. The example uses a square root of the frequency so that the point area is proportional to the count in that pairing.
# historical data
library('HistData')
# Galton Data
rawData <- Galton
# making a set of unique parings and counting frequency
freqData <- unique(rawData)
freqData$count <- NA
for(i in 1:nrow(freqData)){
freqData$count[i] <- length(intersect(which(rawData$parent %in% freqData$parent[i]),which(rawData$chil %in% freqData$child[i])))
}
# making plots
plot(freqData$parent
,freqData$child
,pch=19 # plot symbol
,cex=0.1*sqrt(freqData$count)) # point expansion
# adding legend
legend('bottomright' # location
,legend=c(1,5,10,15,20,25,30,35) # entries
,title='count' # title
,pt.cex=0.1*sqrt(c(1,5,10,15,20,25,30,35)) # point expansion
,pch=19 # plot symbol
,ncol=2 # number of columns
)
I am plotting boxplots of fish biomass by reefname, in order of median biomass. All reefnames (sites) are either in or out of a MPA, e.g MPA="1" or MPA=="0". Currently all plots show green.
How can I show MPA=="0" sites as blue and MPA=="1" as green for example. While maintaining the order of the fish biomass.
MPA <- factor(Fish$MPA)
bymedian <- with(Fish, reorder(ReefName, log10(Biomassm+1)), median)
boxplot(log10(Biomassm+1) ~ bymedian, data = Fish,
xlab = "ReefName", ylab = "Biomassm",
main = "Biomassm in Caribbean", varwidth = TRUE,
col=(c("darkgreen")), las=3, cex.axis=0.3)
Thank you
It might be a better idea to use the ggplot2 package for this. Your code would then look like this:
ggplot(data=Fish, aes(x=reorder(ReefName, log10(Biomassm+1)), median), y=Biomassm, fill=MPA)) +
geom_boxplot() +
scale_y_log10("Biomassm") +
xlab("ReefName") +
scale_fill_manual(values=c("blue", "green")) +
ggtitle("Biomassm in Caribbean")
Here's a set of boxplots coloured depending on the value of MPA:
# generate some data
set.seed(1)
X = matrix(rnorm(100), ncol=10)
# order by median
X = X[,order(apply(X, 2, median))]
# some fake MPA values
MPA = round(runif(n=10, min=0, max=1))
# generate boxplots and check if MPA==1
boxplot(X, col=ifelse(test=MPA==1, yes='green', no='blue'))
# add legend
legend(x='bottomleft', fill=c('green','blue'), legend=c('MPA=1', 'MPA=0'), inset=c(0.01))
The output of ifelse is a vector of colours according to the MPA values and these are used to colour the boxes:
[1] "blue" "blue" "green" "blue" "blue" "green" "green" "blue" "blue" "green"
The following code :
avector <- as.vector(top.links.added.overall$Amount)
x <- as.vector(top.links.added.overall[order(avector),])
row.names(x) <- c("Yahoo" ,"Cnn", "Google")
x$color[x$Amount == 100] <- "red"
x$color[x$Amount == 500] <- "blue"
x$color[x$Amount == 1000] <- "darkgreen"
dotchart(x$Amount,
labels = row.names(x),
cex=.7,
groups = x$Amount,
gcolor = "black",
color = x$color,
pch=19,
main = "Gas Mileage for Car Models\ngrouped by cylinder",
xlab = "Miles Per Gallon")
Generates this graph :
Here is the format of the dataset top.links.added.overall$Amount :
here is the file dataset :
Amount,Name
1000,Google
500,Cnn
100,Yahoo
When I remove the code :
row.names(x) <- c("Yahoo" ,"Cnn", "Google")
I get row names of 1,2,3
I don't need I should need to set the names of the 'y' axis ? How can the code of the graph be amended so that the company with lowest numerical value(in this case yahoo) start at beginning of 'y' axis instead of top, which is currently what is occuring ?
I don't think I can test it with the offered R data objects but perhaps something along these lines:
x <- as.vector(top.links.added.overall[order(-avector),])
row.names(x) <- rev( c("Yahoo" ,"Cnn", "Google") )
Using mathematical negation to the order argument and the rev (reverse) function.
Edit: I now understand your frustration, but after looking at the code I decided to try this which seems to do it:
dotchart(x$Amount,
labels = row.names(x),
cex=.7,
groups = -x$Amount, # the code sorts by `as.numeric(groups)`
gcolor = "black",
color = x$color,
pch=19,
main = "Gas Mileage for Car Models\ngrouped by cylinder",
xlab = "Miles Per Gallon")
I'm using the following code in R to draw two density curves on a single graph;
mydata1<-read.csv(file="myfile1.csv",head=TRUE,sep=",")
mydata2<-read.csv(file="myfile2.csv",head=TRUE,sep=",")
pdf("comparison.pdf")
plot.multi.dens <- function(s)
{
junk.x = NULL
junk.y = NULL
for(i in 1:length(s)) {
junk.x = c(junk.x, density(s[[i]])$x)
junk.y = c(junk.y, density(s[[i]])$y)
}
xr <- range(junk.x)
yr <- range(junk.y)
plot(density(s[[1]]), xlim = xr, ylim = yr, xlab="Usage",main = "comparison")
for(i in 1:length(s)) {
lines(density(s[[i]]), xlim = xr, ylim = yr, col = i)
}
}
plot.multi.dens( list(mydata2$usage,mydata1$usage))
dev.off()
Now the problem is that the graph which is being produced shows two lines but the graph doesn't include the information that which line is which. For example, in the output, it should show that the red line is "a" and the black line is "b". I'm a newbie to R which is why i'm having some difficulty. any help will be appreciated!
Answer from quickR website
# Compare MPG distributions for cars with
# 4,6, or 8 cylinders
library(sm)
attach(mtcars)
# create value labels
cyl.f <- factor(cyl, levels= c(4,6,8),
labels = c("4 cylinder", "6 cylinder", "8 cylinder"))
# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")
# add legend via mouse click
colfill<-c(2:(2+length(levels(cyl.f))))
legend(locator(1), levels(cyl.f), fill=colfill)