This seems like a simple question, but I can't seem to find an answer anywhere. In the R {wordcloud} package, the wordcloud function, there is a scale value that you can enter. The full documentation (here: https://cran.r-project.org/web/packages/wordcloud/wordcloud.pdf) says: "A vector of length 2 indicating the range of the size of the words."
I can't seem to make any sense of the values though, and I can't find any other documentation. For instance, examples have scale=c(4,.5) or scale=c(8,.3). What do these numbers mean?
I've messed around with different values a bit, but I can't seem to figure out the pattern.
Thanks in advance for any help,
Seth
wordcloud internally calculates
size <- (scale[1] - scale[2]) * normedFreq + scale[2]
where the 2 elements of size are used to set strheight and strwidth. These are graphics values described as follows:
These functions compute the width or height, respectively, of the
given strings or mathematical expressions s[i] on the current plotting
device in user coordinates, inches or as fraction of the figure width
par("fin").
So, long story short, it's height and width.
Related
I'm having trouble interpreting what binwidth means in ggplot2 and I am looking for a more precise definition of what it means.
For example:
#this is example is taken from Elegant Graphics for Data Analysis
library(ggplot2)
qplot(percbelowpoverty, data = midwest,binwidth=1)
How do I interpret binwidth=1? What are its units? How does that relate to the number of bins that are calculated? I have no clue and I'm not finding ?stan_bin to be helpful in answering my question:
binwidth
The width of the bins. Can be specified as a numeric value, or a function that calculates width from x. The default is to use bins bins that cover the range of the data. You should always override this value, exploring multiple widths to find the best to illustrate the stories in your data.
The bin width of a date variable is the number of days in each time; the bin width of a time variable is the number of seconds.
Maybe just don't know where to look for documentation of things like this because I am having difficulty understanding a number of related issues (such as what the "weight" aesthetic is all about).
I think I've answered my own question. I was having trouble because I misread the x-axis units. The precentages in the midwest$percwhite column are not actually percentages (i.e, 96.7 is meant by us to be interpreted as 96.7%, but as data it is the actual number 96.7). It was for this reason I was confused about how to interpret the binwidth argument. Now I see that it has the standard interpretation that MrFlick provided in the comment:
Setting binwidth=1 mean each bin should be one x unit wide, eg (1-2], (2-3], (3,4], etc. The units are whatever units the midwest$percbelowpoverty values are in.
I am writing a module that creates a scatter plot from a 2 dimensional array of numbers provided by the user (x and y values). It is intended that the graph axis will be scaled to the value required to encompass all the input numbers, while also rounding to an aesthetically pleasing value. For example, if the maximum value entered is 4.56, I would like it to round the maximum axis value to 5. If the maximum value is 850, I'd like it to round the axis to 1000.
This initially seems like a simple task. Simply take the max value and round up. However, what makes it difficult is my module could be dealing with input values as small as 0.00000001 or as large as many billions.
Can anyone suggest a workflow for making this happen? I don't need the code itself, just the process required. The only way I have come up with is an extremely cumbersome iterative approach that still handles unusual values rather poorly.
Any advice would be most appreciated!
Thanks,
Greg
I am trying to make plots using not integers values for lwd. However, I have realised that, using values such as 1.5 or 0.5, gives me the same plot as using lwd=1.
How can I adjust the line width to intermediate values?
If you check ?par it says that lwd represents:
The line width, a positive number, defaulting to 1. The interpretation is device-specific, and some devices do not implement line widths less than one. (See the help on the device for details of the interpretation.)
So for example if you are outputting the plot to pdf check ?pdf and see that line width is interpreted as multiples of 1/96th of an inch and that 1 is the minimum. It is not possible then to make use of a "0.5" linewidth.
Consider what you are trying to show in your plot and perhaps scale the values up to a range of integers that make sense.
I have dataset include about 100 observations, say all of them are in (x,y) format, all of y is in integer format. I need proc sgplot to make a graphic about them. The range about my y is from 1 to 150. I hope I can force the graphic to show every corresponding y value on the y-axis instead of automatically reducing the ticks to a small number in order to show them clearly. For example, if the first five value of my y is (1,3,4,6,7,....), I hope the y tick shows exactly (1,3,4,6,7,....) instead (1,5,...).
I tried
yaxis value=(1 to 150 by 1) valueshint display=all;
It does not work as maybe I have too many observations. I know the result maybe overwhelming, but I just want to see the result. Thanks.
You don't say if you're using SAS/GRAPH or ODS GRAPHICS (SGPLOT etc.), so I'll answer the latter which is what I know; the answer should be useful for both in concept.
You likely cannot get SAS to plot so much on the axis unless the axis is very large itself. This means you have two options.
Raise the size of the graphic produced a lot in terms of pixels(and then shrink that to a usable size via image physical size, or using an external tool). Not necessarily usable in all cases, but produces a very high resolution plot (which is very big size-wise). This page explains how to do that for ODS graphics (use image_dpi as a high number, and width and height in inches as a normal number), and this page explains for SAS/GRAPH. You may need to make your font small to make it work (if you're adding numbers, which I assume you are), or you may need to make an initially large plot first and then go into paint/photoshop/gimp/etc. and make it smaller.
Use annotate to create the axis marks. This is fairly easy if you know how to use annotate, as you're just writing to the location of the axis (y) and the item (x), and then a bit below that for the text. This will make it very easy to make a total garbage plot, but it will likely work ultimately.
These likely work in both SAS/GRAPH and ODS GRAPHICS, and I can't test either as you don't post any code or simulated data to test with, but I think both approaches have some merit (as does the approach of "don't do this", but you've thought that through).
Is there an function in R that does the same job as Matlab's "bar" function?
R does have a "barplot" function in the library graphics, however, it is not the same.
The Matlab bar(X,Y) (verbatim excerpt from MATLAB documentation) "draws a bar for each element in Y at locations specified in X, where X is a vector defining the x-axis intervals for the vertical bars." (emphasis mine)
However, the R barplot function does not allow one to specify locations.
Perhaps there is a method in ggplot2 that supports this? I am only able to find standard bar charts in ggplot2.
No, barplot is not the same as bar, but you should read the whole help. You can do many things to position the bars. The first is simply their order in Y. You could insert spaces if you wish (additional 0s). If you have X and Y then sort Y on X (Y[order(X)]) and plot it. If you need to change positions use the "space" and "width" arguments. It's not as straightforward as specifying X values I suppose but it's definitely more useful in most situations. Generally what you want to adjust is widths of bars and spaces between bars. Their position on the X-axis should be arbitrary. If the position on the X-axis is really meaningful then you should be using line plots, not bar graphs.
In R:
barplot(rbind(1:10, 2:11), beside=T, names.arg=1:10)
In MATLAB:
>> bar(1:10, [(1:10)' (2:11)'])
Read up on par . Then observe, for example:
x<-c(1,2,4,5,6)
y<-c(3,4,3,4,2)
plot(x,y,type='h',lwd=6)
Edit: yes, I know this doesn't (yet) plot multiple data sets, but I would hope you can see simple ways to make that happen, with spacings, colors, etc. specified to your exact liking :-)
Sounds vaguely like the R stepfun. On the other hand one would need to know what "draws a bar" means before saying it is not the same as barplot(..., horiz=TRUE) One would, of course, need to examine some more detailed evidence such as data and plots before arriving at a conclusion, however. #John Colby should be congratulated for adding some specificity to the discussion. The axis function is probably what Quant Guy needs education regarding.