ggplot2: how to read the scale transformation from a plot object

ggplot2: how to read the scale transformation from a plot object - r

I'm trying to extract information about the limits and transform of an existing ggplot object. I'm getting close, but need some help. Here's my code
data = data.frame(x=c(1,10,100),y=(c(1,10,100)))
p = ggplot(data=data,aes(x=x,y=y)) + geom_point()
p = p + scale_y_log10()
q = ggplot_build(p)
r = q$panel$y_scales
trans.y = (q$panel$y_scales)[[1]]$trans$name
range.y = (q$panel$y_scales)[[1]]$rang
print(trans.y) gives me exactly what I want
[1] "log-10"
But range.y is a funky S4 object (see below).
> print(range.y)
Reference class object of class "Continuous"
Field "range":
[1] 0 2
> unclass(range.y)
<S4 Type Object>
attr(,".xData")
<environment: 0x11c9a0630>
I don't really understand S4 objects or how to query their attributes and methods. Or, if I'm just going down the wrong rabbit hole here, a better solution would be great :) In Matlab, I could just use the commands "get(gca,'YScale')" and "get(gca,'YLim')", so I wonder if I'm making this harder than it needs to be.

As #MikeWise points out in the comments, this all becomes a lot easier if you update ggplot to v2.0. It now uses ggproto objects instead of proto, and these are more convenient to get info from.
It's easy to find now what you need. Just printing ggplot_build(p) gives you a nice list of all that's there.
ggplot_build(p)$panel$y_scales[[1]]$range here gives you a ggproto object. You can see that contains several parts, one of which is range (again), which contains the data range. All the way down, you end up with:
ggplot_build(p)$panel$y_scales[[1]]$range$range
# [1] 0 2
Where 0 is 10^0 = 1 and 2 is 10^2 = 100.
Another way might be to just look it up in $data part like this:
apply(ggplot_build(p)$data[[1]][1:2], 2, range)
# y x
# 1 0 1
# 2 1 10
# 3 2 100
You can also get the actual range of the plotting window with:
ggplot_build(p)$panel$ranges[[1]]$y.range
[1] -0.1 2.1

Related

Why isn't the mean() function in R giving me the right result?

I've been practicing basics in R (3.6.3) and I'm stuck trying to understand this problem for hours already. This was the exercise:
Step 1: Generate sequence of data between 1 and 3 of total length 100; #use the jitter function (with a large factor) to add noise to your data
Step 2: Compute the vector of rolling averages roll.mean with the average of 5 consecutive points. This vector has only 96 averages.
Step 3: add the vector of these averages to your plot
Step 4: generalize step 2 and step 3 by making a function with parameters consec (default=5) and y.
y88 = seq(1,3,0.02)
y = jitter(y88, 120, set.seed(1))
y = y[-99] # removed one guy so y can have 100 elements, as asked
roll.meanT = rep(0,96)
for (i in 1:length(roll.meanT)) # my 'reference i' is roll.mean[i], not y[i]
{
roll.meanT[i] = (y[i+4]+y[i+3]+y[i+2]+y[i+1]+y[i])/5
}
plot(y)
lines(roll.meanT, col=3, lwd=2)
This produced this plot:
Then, I proceed to generalize using a function (it asks me to generalize steps 2 and 3, so the data creation step was ignored) and I consider y to remain constant):
fun50 = function(consec=5,y)
{
roll.mean <- rep(NA,96) # Apparently, we just leave NA's as NA's, since lenght(y) is always greater than lenght(roll.means)
for (i in 1:96)
{
roll.mean[i] <- mean(y[i:i+consec-1]) # Using mean(), I'm able to generalize.
}
plot(y)
lines(roll.mean, col=3, lwd=2)
}
Which gave me a completely different plot:
When I manually try too see if mean(y[1:5]) produces the right mean, it does. I know I could have already used the mean() function in the first part, but I would really like to get the same results using (y[i+4]+y[i+3]+y[i+2]+y[i+1]+y[i])/5 or mean(y[1:5],......).

You have the line
roll.mean[i] <- mean(y[i:i+consec-1]) # Using mean(), I'm able to generalize.
I believe your intention is to grab the values with indices i to (i+consec-1). Unfortunately for you - the : operator takes precedence over arithmetic operations.
> 1:1+5-1 #(this is what your code would do for i=1, consec=5)
[1] 5
> (1:1)+5-1 # this is what it's actually doing for you
> 5
> 2:2+5-1 #(this is what your code would do for i=2, consec=5)
[1] 6
> 3:3+5-1 #(this is what your code would do for i=3, consec=5)
[1] 7
> 3:(3+5-1) #(this is what you want your code to do for i=3, consec=5)
[1] 3 4 5 6 7
so to fix - just add some parenthesis
roll.mean[i] <- mean(y[i:(i+consec-1)]) # Using mean(), I'm able to generalize.

In UpSetR, how to show decimal number on the intersection bar

I am making an upset diagram for the following data in percentages. This is a dummy example for my more complicated data.
x <- c(a=80, b=9.9, c=5, 'a&b'=0.1, 'a&c'=1.65, 'c&b'=3.35)
upset(fromExpression(x), order.by = "freq")
I want these percentages to appear as decimal numbers and all the bars visible even if it is 0.1%. All the data is important in this plot.

The upset'ting plot
library(UpSetR)
x <- c(a=80, b=9.9, c=5, 'a&b'=0.1, 'a&c'=1.65, 'c&b'=3.35)
upset(fromExpression(x), order.by = "freq", show.numbers = 'yes')
Your question
So you want two things:
percentages to appear as decimal numbers
bars visible even if it is 0.1%
Percentages to appear as decimal numbers
You start by converting your vector of percentages to counts (integer) with fromExpression. So the input to upset is then a dataframe:
library(UpSetR)
x <- c(a=80, b=9.9, c=5, 'a&b'=0.1, 'a&c'=1.65, 'c&b'=3.35)
str(fromExpression(x))
#> 'data.frame': 98 obs. of 3 variables:
#> $ a: num 1 1 1 1 1 1 1 1 1 1 ...
#> $ b: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ c: num 0 0 0 0 0 0 0 0 0 0 ...
upset internally then gets the labels from this data, so the link to your original percentages is no longer present inside upset.
Having labels as percentages, or some other custom labels, does not seem to be a supported option for the function upset from the UpSetR package at the moment.
There is the show.numbers argument but only allow to show those absolute frequencies on top of the bars (show.numbers = "yes" or show.numbers = "Yes") or not (any other value for show.numbers), here's the code bit involved:
https://github.com/hms-dbmi/UpSetR/blob/fe2812c8cbe87af18c063dcee9941391c836e7b2/R/MainBar.R#L130-L132
So I think you need to change that piece of code, i.e., the geom_text and aes_string, to use a different aesthetic mapping (your relative frequencies). So maybe ask the developer to do it?
Bars visible even if it is 0.1%
Well, this ultimately depends on your y-axis dynamic range and the size of your plot, i.e., if the tallest bar is a lot greater than the shortest than it might be impossible to see both in the same chart (unless you make y-axis discontinuous).
Conclusion
I understand this is not really a solution to your problem but it is an answer that hopefully points you in the direction of the solution to your problem.

Two facts are standing in the way of a quick and easy solution to this problem:
UpSetR is very strongly oriented toward discrete sets of countable objects.
A potential solution would be instead of using whole objects to use fractional objects, but the first thing upset() does is to check for which columns of your data frame have "0" and "1" as their only levels. This is hardcoded. If this fails, the startend object becomes NULL and there is no way the function will be able to do anything.
UpSetR does not give very good access to the plots it creates.
Once the plots are made, you are left with no return value from upset(). This means you cannot modify the plot objects themselves or change way they are plotted outside of the arguments allowed to pass to upset().
So, what can you do?
Depending on how complicated your real plot is (and how often have to replot it) you might just do this:
x <- c(a=80, b=9.9, c=5, 'a&b'=0.1, 'a&c'=1.65, 'c&b'=3.35)
upset(fromExpression(x*100), order.by = "freq")
and then edit in inkscape/illustrator. (BAD)
Fork UpSetR and hijack the scale.intersections and scale.sets parameters. In the Make_main_bar() function you would just change the way it handles a "percent" argument to scale_intersections, and change the way Make_size_plot() handles the same argument to scale_sets. This would then become:
x <- c(a=80, b=9.9, c=5, 'a&b'=0.1, 'a&c'=1.65, 'c&b'=3.35)
upset(fromExpression(x*100), order.by = "freq",
scale.intersections="percent", scale.sets="percent")
I have personally forked UpSetR myself for other purposes, but the package in general needs a major refactoring so that it might be applied to additional use cases. The authors may have wanted to prevented uses of the concept outside of their concept.

Confusing geom_map error choropleth

I am getting an error using geom_map that I do not get while using geom_polygon when trying to make a choropleth map.
I am following the ggplot2 documentation as closely as possible.
I have a data.frame of positions plus an id column (lease_number) to reference later:
dfmap<- read.table(text="id long lat order hole piece group lease_number
1 -90.38103 28.78907 1 FALSE 1 1.1 00016
1 -90.38065 28.82965 2 FALSE 1 1.1 A0016
1 -90.33457 28.82930 3 FALSE 1 1.1 A0016
1 -90.33497 28.78872 4 FALSE 1 1.1 A0016
1 -90.38103 28.78907 5 FALSE 1 1.1 A0016", header=T)
And a data.frame of values with the corresponding id column (just one value here):
df <- data.frame(lease_number="A0016", var1=10)
Following the documentation's structure exactly:
ggplot(df, aes(fill=var1) +
geom_map(aes(map_id=lease_number), map=dfmap) +
expand_limits(dfmap)
Gives the following error:
Error in unit(x, default.units) : 'x' and 'units' must have length > 0
By merging the data like so I can produce a correct plot,
ggplot() +
geom_polygon(data=merge(dfmap, df, by='lease_number', all=T),
aes(x=long, y=lat, group=lease_number, fill=var1))
but I want to avoid this because I will need to reference a lot of different things and it will be much better to be able to reference between two data.frames with the lease_number column.
I have seen this question but that answer does not apply here since I cannot even get a base map to show up with geom_map if I remove the fill= argument. Does anyone know how to take on this error?
Thanks.

How do you plot a histogram of the terms that occur n or more times?

I have a list of words coming straight from file, one per line, that I import with read.csv which produces a data.frame. What I need to do is to compute and plot the numbers of occurences of each of these words. That, I can do easily, but the problem is that I have several hundreds of words, most of which occur just once or twice in the list, so I'm not interested in them.
EDIT https://gist.github.com/anonymous/404a321840936bf15dd2#file-wordlist-csv here is a sample wordlist that you can use to try. It isn't the same I used, I can't share that as it's actual data from actual experiments and I'm not allowed to share it. For all intents and purposes, this list is comparable.
A "simple"
df <- data.frame(table(words$word))
df[df$Freq > 2, ]
does the trick, I now have a list of the words that occur more than twice, as well as a hard headache as to why I have to go from a data.frame to an array and back to a data.frame just to do that, let alone the fact that I have to repeat the name of the data.frame in the actual selection string. Beats me completely.
The problem is that now the filtered data.frame is useless for charting. Suppose this is what I get after filtering
Var1 Freq
6 aspect 3
24 colour 7
41 differ 18
55 featur 7
58 function 19
81 look 4
82 make 3
85 mean 7
95 opposit 14
108 properti 3
109 purpos 6
112 relat 3
116 rhythm 4
118 shape 6
120 similar 5
123 sound 3
obviously if I just do a
plot(df[df$Freq > 2, ])
I get this
which obviously (obviously?) has all the original terms on the x axis, while the y axis only shows the filtered values. So the next logical step is to try and force R's hand
plot(x=df[df$Freq > 2, ]$Var1, y=df[df$Freq > 2, ]$Freq)
But clearly R knows best and already did that, because I get the exact same result. Using ggplot2 things get a little better
qplot(x=df[df$Freq > 2, ]$Var1, y=df[df$Freq > 2, ]$Freq)
(yay for consistency) but I'd like that to show an actual histograms, y'know, with bars, like the ones they teach in sixth grade, so if I ask that
qplot(x=df[df$Freq > 2, ]$Var1, y=df[df$Freq > 2, ]$Freq) + geom_bar()
I get
Error : Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
so let us try the last suggestion, shall we?
qplot(df[df$Freq > 2, ]$Var1, stat='identity') + geom_bar()
fair enough, but there are my bars? So, back to basics
qplot(words$word) + geom_bar() # even if geom_bar() is probably unnecessary this time
gives me this
Am I crazy or [substitute a long list of ramblings and complaints about R]?

I generate some random data
set.seed(1)
df <- data.frame(Var1 = letters, Freq = sample(1: 8, 26, T))
Then I use dplyr::filter because it is very fast and easy.
library(ggplot2); library(dplyr)
qplot(data = filter(df, Freq > 2), Var1, Freq, geom= "bar", stat = "identity")

First of all, at least with plot(), there.s no reason to force a data.frame. plot() understands table objects. You can do
plot(table(words$words))
# or
plot(table(words$words), type="p")
# or
barplot(table(words$words))
We can use Filter to filter rows, unfortunately that drops the table class. But we can add that back on with as.table. This looks like
plot(as.table(Filter(function(x) x>2, table(words$words))), type="p")

Simple line plot using R ggplot2

I have data as follows in .csv format as I am new to ggplot2 graphs I am not able to do this
T L
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
I tried to plot a line graph using following graph
data<-read.csv("sample.csv",head=TRUE,sep=",")
ggplot(data,aes(T,L))+geom_line()]
but I got following image it is not I want
I want following image as follows
Can anybody help me?

You want to use a variable for the x-axis that has lots of duplicated values and expect the software to guess that the order you want those points plotted is given by the order they appear in the data set. This also means the values of the variable for the x-axis no longer correspond to the actual coordinates in the coordinate system you're plotting in, i.e., you want to map a value of "L=1" to different locations on the x-axis depending on where it appears in your data.
This type of fairly non-sensical thing does not work in ggplot2 out of the box. You have to define a separate variable that has a proper mapping to values on the x-axis ("id" in the code below) and then overwrite the labels with the values for "L".
The coe below shows you how to do this, but it seems like a different graphical display would probbaly be better suited for this kind of data.
data <- as.data.frame(matrix(scan(text="
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
"), ncol=2, byrow=TRUE))
names(data) <- c("T", "L")
data$id <- 1:nrow(data)
ggplot(data,aes(x=id, y=T))+geom_line() + xlab("L") +
scale_x_continuous(breaks=data$id, labels=data$L)

You have an error in your code, try this:
ggplot(data,aes(x=L, y=T))+geom_line()
Default arguments for aes are:
aes(x, y, ...)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot2: how to read the scale transformation from a plot object - r

Related

Why isn't the mean() function in R giving me the right result?

In UpSetR, how to show decimal number on the intersection bar

Confusing geom_map error choropleth

How do you plot a histogram of the terms that occur n or more times?

Simple line plot using R ggplot2

Categories

Resources