Dealing with zero log values in R? - r

I am plotting a scatter plot(x, y) and want it to be log-valued, so I do: plot(log(x), log(y)). I'd like to deal with cases where some value in x is 0, and thus not on the plot, while the corresponding y value is nonzero.
I'd like to display the scatter with log ticks but natural number values, meaning if in log2 then the ticks should be 2^0, 2^1, 2^2, ... that would allow me to plot 0 values on the scale as well so as to not miss these points.
Here's an example:
> x = c(0, 1, 20, 100, 200, 500)
> y = c(1, 16, 32, 105, 300, 50)
> plot(x, y)
There are six points. If I use:
> plot(log2(x), log2(y))
There are only 5 plotted, since x[0], y[0] is omitted (the x-value is 0). Therefore, I'd like to plot the log values but have tick labels to be natural numbers that are simply marked on a log scale. Then you can easily have on the same axis, 0, 2^0 (which is 1 of course), 2^1, 2^2, etc. and so on. Then the point (x[0], y[0]) will still be plotted, while keeping the log scale.
Side note: I don't think it's fair to downvote a post asking something very reasonable with an example. This is clearly on topic and relevant, and will come up for virtually everyone who is plotting things on a log value and cares about boundary / edge cases.
(I know some people deal with this by adding an arbitrary small constant to all points but I'd like to avoid that as it is messy.) thanks

If I understand you want to plot x versus y in the log scale?
Here and example using lattice and latticeExtra
# Some reproducible data
tm <- data.frame(x=seq(0,10,1),y=seq(0,10,1))
library(lattice)
library(latticeExtra)
xyplot(x ~ y , data=tm,
scales= list(x=list(log=2),
y=list(log=2)),
xscale.components = xscale.components.logpower, ## to get pretty scales
yscale.components = yscale.components.logpower
)

Related

Extend an axis limit to -1 when that axis is logarithmic in R

I am doing the following:
x = c(0, 1, 2, 3, 4, 5)
y = x ^ 2
plot(x, y, log="y")
What I want is that the graph also show me the scatter point at (x, y)=(0, 0).
I know that log(0) = -Inf. This will be the case when I am doing log(x) but here I am not doing log(x). Rather, I am just changing the scale of y-axis to be logarithmic. Therefore, I need to know if there is some way for me to display the scatter point (x, y) = (0, 0) as well.
No, what you are asking is mathematically impossible, because log(0) = -Inf. The point (0, 0) cannot be shown on a log-scale plot.
A log-scale is produced by log-transforming the data values and exponentiating the values at the axis ticks. For example, to plot the value 100 in a log-10 scale, you first log-transform 100 to log10(100) = 2, and then you transform the corresponding axis tick from 2 to 10^2 = 100. Thus, to plot the value 0 in a log-scale plot, you still need to calculate log10(0), even if the corresponding axis tick would be 10^-Inf = 0.
If your aim is to have a non-linear y-axis, and not necessarily a log-scale, then you can follow something like what's below.
# transfrom y-values
ny <- sqrt(y)
# plot the transformed values
plot(x, ny, yaxt='n', ylab = "y")
# label the y-axis
axis(side = 2, at = ny, labels = y)
Also, if you know what you want to replace log(0) with, then you can do that via ny, but I don't advise using log-scale when there is a zero.

How to plot degree of network

How can I plot a degree graoh like that?
The picture is only indicative, the result may not be identical to the image.
The important thing is that on the X axis there are labels of the nodes and on the Y axis the degree of each node.
Then the degree can be represented as a histogram (figure), with points, etc., it is not important.
This is what I tried to do and did not come close to what I want:
d = degree(net, mode="all")
hist(d)
or
t = table(degree(net))
plot(t, xlim=c(1,77), ylim=c(0, 40), xlab="Degree", ylab="Frequency")
I think it is a trivial thing but it's the first time I use R.
Thank you
This is what I have now:
I would like a graph that was more readable (I have 77 bars). That is, with more space between the bars and between the labels.
My aim is to show how a node (Valjean) has higher value than the other, I don't know if I am using the right graphic..
You can just use a barplot and specify the row names. For example,
net = rgraph(10)
rownames(net) = colnames(net) = LETTERS[1:10]
d = degree(net)
barplot(d, names.arg = rownames(net))

How are trellis axis limits calculated?

Say I want to create an ordinary xyplot without explicitly specifying axis limits, then how are axis limits calculated?
The following line of code produces a simple scatter plot. However, axis limits do not exactly range from 1 to 10, but are slightly expanded to the left and right and top and bottom sides (roughly by 0.5).
library(lattice)
xyplot(1:10 ~ 1:10, cex = 1.5, pch = 20, col = "black",
xlab = "x", ylab = "y")
Is there any way to determine the factor by which the axes were expanded on each site, e.g. using trellis.par.get? I already tried the following after executing the above-mentioned xyplot command:
library(grid)
downViewport(trellis.vpname(name = "figure"))
current.panel.limits()
$xlim
[1] 0 1
$ylim
[1] 0 1
Unfortunately, the panel limits are returned as normalized parent coordinates, which makes it impossible to obtain the "real" limits. Any suggestions would be highly appreciated!
Update:
Using base-R plot, the data range (and consequently the axis limits) is by default extended by 4% on each side, see ?par. But this factor doesn't seem to apply to 'trellis' objects. So what I am looking for is an analogue to the 'xaxs' (and 'yaxs') argument implemented in par.
Axis limits for xyplot are calculated in the extend.limits function. This function isn't exported from the lattice package, so to see it, type lattice:::extend.limits. Concerning a numeric vector, this function is passed the range of values from the corresponding data (c(1, 10) in this example). The final limits are calculated according to the following equation:
lim + prop * d * c(-1, 1)
lim are the limits of the data, in this case c(1, 10)
prop is lattice.getOption("axis.padding")$numeric, which by default is 0.07
d is diff(as.numeric(lim)), in this case 9
The result in this case is c(0.37, 10.63)
In case you're interested, the call stack from xyplot to extend.limits is
xyplot
xyplot.formula
limits.and.aspect
limitsFromLimitList
extend.limits

R: Histogram with custom breaks for custom x axis range

I need to plot a vector of numbers. Let's say these numbers range from 0 to 1000. I need to make a histogram where the x axis goes from 100 to 500, and I want to specify the number of bins to be 10. How do I do this?
I know how to use xlim and break separately, but I don't know how to make a given number of bins inside the custom range.
This is a very good question actually! I was bothered by this all the time but finally your question has kicked me to finally solve it :-)
Well, in this case we cannot simply do hist(x, xlim = c(100, 500), breaks = 9), as the breaks refer to the whole range of x, not related to xlim (in other words, xlim is used only for plotting, not for computing the histogram and setting the actual breaks). This is a clear flaw of the hist function and there is no simple remedy found in the documentation.
I think the easiest way out is to "xlim" the values before they go to the hist function:
x <- runif(1000, 0, 1000) # example data
hist(x[x > 100 & x < 500], breaks = 9)
breaks should be number of cells minus one.
For more info see ?hist

R: changing the size of some (but not all) plotted data points according to their weighting

I have generated a plot in R in which the size of each data point corresponds to its individual weighting, for instance like this:
x <- runif(10, 2, 200)
y <- runif(10, 5.0, 7.5)
weighting <- c(1, 1, 1, 1, 1, 10, 15, 15, 25, 25)
I have adjusted the size of the plotted data ponts with cex:
plot(x, y, cex = weighting)
Since some data points in the plot are very large because of their high weighting factors, I have reduced the size of all points by plot(x, y, cex = weighting/5) which would give a plot like:
Unfortunately, data points with a small weighting are tiny now. I'm sure there is a possibility to limit the sizing only to those points which have a high weighting factor and to plot the others (i.e. weighting = 1) at normal size. I don't know how to do that, can anybody help?
You may also have a look at scale_size_area in ggplot
# you need to keep your data in a data.frame
df <- data.frame(x = x, y = y, weighting = weighting)
ggplot(data = df, aes(x = x, y = y, size = weighting)) +
geom_point() +
scale_size_area()
Update, on cex and scaling of point size
Because the topic of the question is cex, I take the opportunity to cite a post by #Bert Gunter on R-help:
"Here's the problem: in order to accurately
represent the value, the "point" = circle area must be proportional
to the value. That is, the eye "sees" the areas, not the radii, as the
point "size." A delightful reference on this is Howard Wainer's 1982
or so (can't remember exactly) article in THE AMERICAN STATISTICIAN,
"How to Graph Data Badly" (or maybe "Plot" Data).
Anyway, using cex, I have no idea whether a point drawn with cex =
1.23 is 1.23 times the area or radius -- or neither -- of a point
drawn with cex =1. Indeed, it might vary depending on the
implementation/OS/graphics fonts. So it seems better to me to "draw"
the point with symbols(), where you can have complete control over the
size.
Obviously, let me know if I'm wrong about this." End quotation.
In the same thread #Gabor Grothendieck points to this nice article, where the base function symbols is used. One example where "[c]ircles [are] incorrectly sized by radius instead of area. Large values appear much bigger", and one where "Circles [are] correctly sized by area", and also where the inches argument is used to set size the largest bubble. I think this might be a base equivalent to scale_size_area() in ggplot.
How about plotting with log of weighting for size?
plot(x, y, cex = log10(weighting))
The function pmax might help:
minCex <- 1
plot(x, y, cex = pmax(minCex, weighting / 5))

Resources