R plotting frequency distribution - r

I know that we normally do in this way:
x=c(rep(0.3,100),rep(0.5,700))
plot(table(x))
However, we can only get a few dots or vertical lines in the graph.
What should I do if I want 100 dots above 0.3 and 700 dots above 0.5?

Something like this?
x <- c(rep(.3,100), rep(.5, 700))
y <- c(seq(0,1, length.out=100), seq(0,1,length.out=700))
plot(x,y)
edit: (following OP's comment)
In that case, something like this should work.
x <- rep(seq(1, 10)/10, seq(100, 1000, by=100))
x.t <- as.matrix(table(x))
y <- unlist(apply(x.t, 1, function(x) seq(1,x)))
plot(x,y)

You can lay with the linetype and linewidth settings...
plot(table(x),lty=3,lwd=0.5)

For smaller numbers (counts) you can use stripchart with method="stack" like this:
stripchart(c(rep(0.3,10),rep(0.5,70)), pch=19, method="stack", ylim=c(0,100))
But stripchart does not work for 700 dots.
Edit:
The dots() function from the package TeachingDemos is probably what you want:
require(TeachingDemos)
dots(x)

Related

Plotting list of functions using for loop in R

How can you plot a list of functions in one graph using a for loop in R? The following code does the trick, but requires a separate call of plot for the first function outside of the for loop, which is very clunky. Is there a way to handle all the plotting inside the for loop without creating multiple plots?
vec <- 1:10
funcs <- lapply(vec, function(base){function(exponent){base^exponent}})
x_vals <- seq(0, 10, length.out=100)
plot(x_vals, funcs[[1]](x_vals), type="l", ylim=c(0,100))
for (i in 2:length(vec)) {
lines(x_vals, funcs[[i]](x_vals))
}
You can also do the computations first and plotting after, like this:
vec <- 1:10
funcs <- lapply(vec, function(base) function(exponent){base^exponent})
x_vals <- seq(0, 10, length.out=100)
y_vals <- sapply(funcs, \(f) f(x_vals))
plot(1, xlim=range(x_vals), ylim=range(y_vals), type='n', log='y',
xlab='x', ylab='y')
apply(y_vals, 2, lines, x=x_vals)
This way you know the range of your y values before initiating the plot and can set the y axis limits accordingly (if you would want that). Note that I chose to use logarithmic y axis here.
Based on MrFlick's comment, it looks like something like this would be one way to do what I'm looking for, but is still not great.
vec <- 1:10
funcs <- lapply(vec, function(base){function(exponent){base^exponent}})
x_vals <- seq(0, 10, length.out=100)
plot(NULL, xlim=c(0,10), ylim=c(0,100))
for (i in 1:length(vec)) {
lines(x_vals, funcs[[i]](x_vals))
}

How to fix overlapping issue

plot(USArrests$Murder, USArrests$UrbanPop,
xlab="murder", ylab="% urban population", pch=20, col="grey",
ylim=c(20, 100), xlim=c(0, 20))
text(USArrests$Murder, USArrests$UrbanPop, labels=rownames(USArrests),
cex=0.7, pos=3)
I tried everything, reducing font size with cex, change the positions, change the ylim, xlim to fit the size, I also tried changing the margins, which didn't really help me so I got rid of them. At this point, I don't know how to do this with base R tool. I do know ggplot method, which is way easier. But I want to know if I can do the same task with the base plot(),text() code.
To find neighbors which are too near you could run kmeans() cluster analysis about the data. It's quite a hack, though!
First, subset your data.
dat <- USArrests[c("Murder", "UrbanPop")]
Set a seed. Play around with that. Different seeds => different results.
set.seed(42)
Analyze clusters with kmeans(), option centers assigns number of clusters, play around with that.
dat$cl <- kmeans(dat, centers=10, nstart=5)$cluster
Now split data and assign altering pos numbers for positioning later in the text() command.
l <- split(dat, dat$cl)
l <- lapply(l, function(x) within(x, {
if (nrow(x) == 1)
pos <- 2 # for those with just one observation in cluster
else
pos <- as.numeric(as.character(factor((1:nrow(x)) %% 2, labels=c(2, 4))))
}))
Assemble.
dat <- do.call(rbind, unname(l))
Now plot into a png with a somewhat high resolution, I chose 800x800.
png("plot.png", 800, 800, "px")
plot(dat$Murder, dat$UrbanPop, xlab="murder", ylab="% urban population",
pch=20, col="grey", ylim=c(20, 100), xlim=c(0, 20))
# the sapply assigns the text position according to `pos` column
sapply(c(4, 2), function(x)
with(dat[dat$pos == x, ],
text(Murder, UrbanPop, labels=rownames(dat[dat$pos == x, ]),
cex=0.7, pos=x)))
dev.off()
Which gives me:
I'm sure you can optimize this further.

R scientific notation in plots

I have a simple plot:
#!/usr/bin/Rscript
png('plot.png')
y <- c(102, 258, 2314)
x <- c(482563, 922167, 4462665)
plot(x,y)
dev.off()
R uses 500, 1000, 1500, etc for the y axis. Is there a way I can use scientific notation for the y axis and put * 10^3 on the top of the axis like the figure below?
A similar technique is to use eaxis (extended / engineering axis) from the sfsmisc package.
It works like this:
library(sfsmisc)
x <- c(482563, 922167, 4462665)
y <- c(102, 258, 2314)
plot(x, y, xaxt="n", yaxt="n")
eaxis(1) # x-axis
eaxis(2) # y-axis
This is sort of a hacky way, but there's nothing wrong with it:
plot(x,y/1e3, ylab="y /10^3")
How you get the labels onto your axis depends upon the used plotting system.(base, ggplot2 or lattice)
You can use functions from scales package to format your axis numbers:
library(scales)
x <- 10 ^ (1:10)
scientific_format(1)(x)
[1] "1e+01" "1e+02" "1e+03" "1e+04" "1e+05" "1e+06" "1e+07" "1e+08" "1e+09" "1e+10"
Here an example using ggplot2 :
library(ggplot2)
dat <- data.frame(x = c(102, 258, 2314),
y = c(482563, 922167, 4462665))
qplot(data=dat,x=x,y=y) +
scale_y_continuous(label=scientific_format(digits=1))+
theme(axis.text.y =element_text(size=50))
EDIT The OP has a specific need. Here some ideas I used here in order to accomplish this :
You can customize your plot labels using axis function.
Use mtext to put text in the outer plot region
Use expression to profit from the plotmath features...
y <- c(102, 258, 2314)
x <- c(482563, 922167, 4462665)
plot(x,y,ylab='',yaxt='n')
mtext(expression(10^3),adj=0,padj=-1,outer=FALSE)
axis(side=2,at=y,labels=round(y/1000,2))

R arrowed labelling of data points on a plot

I am looking to label data points with indices -- to identify the index number easily by visual examination.
So for instance,
x<-ts.plot(rnorm(10,0,1)) # would like to visually identify the data point indices easily through arrow labelling
Of course, if there's a better way of achieving this, please suggest
You can use arrows function:
set.seed(1); ts.plot(x <-rnorm(10,0,1), ylim=c(-1.6,1.6)) # some random data
arrows(x0=1:length(x), y0=0, y1=x, code=2, col=2, length=.1) # adding arrows
text(x=1:10, y=x+.1, 0, labels=round(x,2), cex=0.65) # adding text
abline(h=0) # adding a horizontal line at y=0
Use my.symbols from package TeachingDemos to get arrows pointing to the locations you want:
require(TeachingDemos)
d <- rnorm(10,0,1)
plot(d, type="l", ylim=c(min(d)-1, max(d)+1))
my.symbols(x=1:10, y=d, ms.arrows, angle=pi/2, add=T, symb.plots=TRUE, adj=1.5)
You can use text() for this
n <- 10
d <- rnorm(n)
plot(d, type="l", ylim=c(min(d)-1, max(d)+1))
text(1:n, d+par("cxy")[2]/2,col=2) # Upside
text(1:n, d-par("cxy")[2]/2,col=3) # Downside
Here a lattice version, to see the analogous of some base function.
set.seed(1234)
dat = data.frame(x=1:10, y = rnorm(10,0,1))
xyplot(y~x,data=dat, type =c('l','p'),
panel = function(x,y,...){
panel.fill(col=rgb(1,1,0,0.5))
panel.xyplot(x,y,...)
panel.arrows(x, y0=0,x1=x, y1=y, code=2, col=2, length=.1)
panel.text(x,y,label=round(y,2),adj=1.2,cex=1.5)
panel.abline(a=0)
})

Make histograms of stacked rectangles rather than columns

With the following code, I get a histogram as below
x <- rnorm(100)
hist(x,col="gray")
What can I do to get to display the bars as stacked rectangles (visible by their outlines, rather than a change in fill color) instead of uniform columns? Each rectangle represents a frequency of, for example, 1, although I want to be able to change this through a parameter.
From answer at this question (h/t Vincent Zoonekynd).
x <- rnorm(100)
hist(x,col="gray")
abline(h=seq(5,40,5),col="white")
Here is a function to get you started (it is actually a modicication of part of the examples for the tkBrush function in the TeachingDemos package):
rechist <- function(x,...){
tmp <- hist(x,plot=F)
br <- tmp$breaks
w <- as.numeric(cut(x,br,include.lowest=TRUE))
sy <- unlist(lapply(tmp$counts,function(x)seq(length=x)))
my <- max(sy)
sy <- sy/my
my <- 1/my
sy <- sy[order(order(x))]
plot.new()
plot.window(xlim=range(br), ylim=c(0,1))
rect(br[w], sy-my, br[w+1], sy,
border=TRUE, col='grey')
rect(br[-length(br)], 0, br[-1], tmp$counts*my)
axis(1)
}
rechist( iris$Petal.Length )

Resources