I have this sample code where I am plotting two variables on two vertical axis.
One has real range 0-1 and another one has integer range 0-4. I would like to show
both values simultaneously, but I do not know how.
I looked into this example on cross validated that was accepted, but I noticed that when I run the code, both vertical axes have the same scale.
I would need for two scales to be different. Please help.
Related
I've come up with a graph (a scatterplot) of the log(1+inf) (inf = number of people infected with a given disease on the y-axis against one of the explanatory variables, in this case, the populational density (pop./kmĀ²; x-axis) on my model. The log transformation was used merely for visualization, because it spreads the distribution of the data and allows for more aesthetically appealing plots. Basically, what I want is both axis to show the value of that same variable before the log transformation. The dots need to be plotted like plot(log(1+inf),log(populational_density), but the number on the axis should refer to plot(inf,populational_density). I've provided a picture of my graph with some manual editing on the y-axis to show you the idea of what I want.
The numbers in red would be the 'inf' values equivalent to log(inf);
Please, bear in mind that those values in red do not correspond to reality.
I understand the whole concept of y = f(x), but i've been asked to provide it. Is this possible? I'm using the ggplot2package for plotting.
I have a sample dataset
d=data.frame(n=rep(c(1,1,1,1,1,1,2,2,2,3),2),group=rep(c("A","B"),each=20),stringsAsFactors = F)
And I want to draw two separate histograms based on group variable.
I tried this method suggested by #jenesaisquoi in a separate post here
Generating Multiple Plots in ggplot by Factor
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)+facet_wrap(~group)
It did the trick but if you look closely, the proportions are wrong. It didn't calculate the proportion for each group but rather a grand proportion. I want the proportion to be 0.6 for number 1 for each group, not 0.3.
Then I tried dplyr package, and it didn't even create two graphs. It ignored the group_by command. Except the proportion is right this time.
d%>%group_by(group)%>%ggplot(data=.)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)
Finally I tried factoring with color
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..),color=group),binwidth = 1)
But the result is far from ideal. I was going to accept one output but with the bins side by side, not on top of each other.
In conclusion, I want to draw two separate histograms with correct proportions calculated within each group. If there is no easy way to do this, I can live with one graph but having the bins side by side, and with correct proportions for each group. In this example, number 1 should have 0.6 as its proportion.
By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..),binwidth = 1)+facet_wrap(~group)
You actually have the separation of charts by variable correct! Especially with ggplot, you sometimes need to consider the scales of the graph separately from the shape. Facet_wrap applies a new layer to your data, regardless of scale. It will behave the same, no matter what your axes are. You could also try adding scale_y_log10() as a layer, and you'll notice that the overall shape and style of your graph is the same, you've just changed the axes.
What you actually need is a fix to your scales. Understandable - frequency plots can be confusing. ..count../sum(..count..)) treats each bin as an independent unit, regardless of its value. See a good explanation of this here: Show % instead of counts in charts of categorical variables
What you want is ..density.., which is basically the count divided by the total count. The difference is subtle in principle, but the important bit is that the value on the x-axis matters. For an extreme case of this, see here: Normalizing y-axis in histograms in R ggplot to proportion, where tiny x-axis values produced huge densities.
Your original code will still work, just substituting the aesthetics I described above.
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..,)binwidth = 1)+facet_wrap(~group)
If you're still confused about density, so are lots of people. Hadley Wickham wrote a long piece about it, you can find that here: http://vita.had.co.nz/papers/density-estimation.pdf
I just came into a problem while making several maps in R, the problem I came to is that I want to plot several maps and some geom_points in those maps, each map will have some points with different values and so the legend with the scales (size and color) will change between maps. All I want is to have exactly the same legend, representing the same values (for both color and size). I've tried with breaks etc but my data is continuous, so I didn't find any way to fix it.
EDIT:Simple example
Will try to explain with simple example by myself. Imagine I have these two arrays to be plotted into different coordinates for 2 different days:
c<-(1,2,3,2,1)
c<-(1,9,2,1,2)
What I want is to set the legend of the plot to be always representing the range 1-9 as values of the geom_points, no matter the specific values of the given day, in a way that no matter the values, the legend will be always the same and if I try to set some slides, the scale will not change
Any ideas?
I'm trying to create a dot chart in Stata, splitting it into two categories
Running a chunk of code:
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
Creates such output:
So far so good but I'd like to remove labels of Y axis from the right graph to give the data some more space.
The only way I've been able to do that was to manually edit graph and hide the axis label object:
Is there a way to do it programmatically? I do know I could use one more over() but in some graphs of mine that is already taken.
I believe the solution is buried in help bystyle and help by_option. However, I can't get it to work with your example (I'm on Stata 12). But the description is clear. For example:
A bystyle determines the overall look of the combined graphs,
including
whether the individual graphs have their own axes and labels or if instead the axes and labels are shared across graphs arrayed in the
same row and/or in the same column;
...
There are options that let you control each of the above attributes --
see [G-3] by_option --
And also
iyaxes and ixaxes (and noiyaxes and noixaxes) specify whether the y axes and x axes are
to be displayed with each graph. The default
with most styles and
schemes is to place y axes on the leftmost graph of each row and to place x axes on
the bottommost graph of each column. The y and
x axes include the
default ticks and labels but exclude the axes titles.
If for some reason that doesn't work out, something like
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
gr_edit .plotregion1.grpaxis[2].draw_view.setstyle, style(no)
does (but I don't really like the approach). You can mess with at least the axis number [#] to do a bit of customization. I guess recording changes in the graphical editor and then recycling the corresponding code, may be one way out of difficult situations.
I have computed values for several categories for three networks. I'd like to create a bar plot in R to show the differences between these parameters for the networks. So far I plotted this with the barplot R function with the categories on the x-axis, their values on the y-axis and to each category three bars (one for each network).
But now I have one value which is much higher than all the others. Therefore the differences for the rest cannot be seen since they're represented only by a thin line because of that one large bar which almost fills the whole plot.
My idea was now to plot the values on the y-axis on an irregular scale, meaning for example, that one half represents the values from 0 to 300, and the other half from 300 to 3000. Is there any way to do this? Or a good alternative approach to handle this problem? I also thought of plotting the logarithm but unfortunatly I have also negative values.
I would suggest that an irregular scale isn't a good plan - I think it confuses viewers of the chart. Instead, you could use the layout() function to plot three separate barplots in a horizontal layout. Thus, each category could have it's own plot, with it's own scale.
If, however, you still have a single bar at 3000, while everything else is at 300, that won't really help. In that case, you could manually set your y-axis limits with ylim=c(min,max). To keep the bar from stretching off the screen, you can just use simple logic to define anything > 300 as 300, or something similar. Then, put a text point there stating the actual value (using text, maybe with arrow).
With those ideas out there, I would suggest that a graph where one value is 10x the other values might not really be worth presenting, or if it is, the main takeaway from it isn't going to be "how do values 2 and 3 compare to each other", it's going to be "holy moley look how much bigger 1 is than 2 and 3". So, it might not be a big deal if one bar is giant and two are small, as long as you aren't doing all 9 on a single plot (which would screw up other, relevant comparisons). So, if you split them using layout(), then it wouldn't be as big of a deal.