Related
I'm analyzing data from the result of pulling 10 numbered balls from a jar with replacement, repeated 70 times. Here's my code (data included):
numbers <- c(8, 3, 9, 5, 1, 9, 10, 8, 8, 1, 9, 9, 8, 5, 1, 10, 5, 9, 6, 4, 10, 3,
10, 9, 8, 4, 8, 8, 9, 9, 1, 5, 9, 8, 4, 1, 8, 6, 7, 8, 2, 9, 5, 6,
10, 9, 1, 1, 5, 6, 2, 8, 6, 5, 2, 5, 4, 10, 10, 2, 2, 4, 9, 6, 9,
9, 6, 10, 9, 10)
num_frame <- data.frame(numbers)
ggplot(num_frame) +
geom_dotplot(aes(numbers), binwidth = 1, dotsize = 0.4) +
theme_bw() +
xlab("Numbers") +
ylab("Frequency")
The resulting plot is nice, except it labels gridlines at 0, 2.5, 5, 7.5, and 10, which is obviously not what I want. The scale is fine, but I would like the gridlines to be at integer values 1 through 10 (0 is fine too if necessary). How can I do this? I'd also like the y-axis to adjust likewise so that the grid is still square. Thanks!
Just add:
scale_x_continuous(breaks=1:10, minor_breaks=NULL)
minor_breaks=NULL suppress lines that aren't at the breaks
I am trying to create a diagram using ggplot2. There are several very small values to be displayed and a few larger ones. I'd like to display all of them in an appropriate way using logarithmic scaling. This is what I do:
plotPointsPre <- ggplot(data = solverEntries, aes(x = val, y = instance,
color = solver, group = solver))
...
finalPlot <- plotPointsPre + coord_trans(x = 'log10') + geom_point() +
xlab("costs") + ylab("instance")
This is the result:
It is just the same as without coord_trans(x = 'log10').
However, if I use it with the y-axis:
How do I achieve the logarithmic scaling on the x-axis? Besides, it is not about the x-axis, if I switch the values of x and y, then it works on the x-axis and no longer on the y-axis. So there seems to be some problem with the displayed values. Does anybody have an idea how to fix this?
Edit - Here's the used data contained in solverEntries:
solverEntries <- data.frame(instance = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20),
solver = c(4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1),
time = c(1, 24, 13, 6, 1, 41, 15, 5, 1, 26, 16, 5, 1, 39, 7, 4, 1, 28, 11, 3, 1, 31, 12, 3, 1, 38, 20, 3, 1, 37, 10, 4, 1, 25, 11, 3, 1, 32, 18, 4, 1, 27, 21, 3, 1, 23, 22, 3, 1, 30, 17, 2, 1, 36, 8, 3, 1, 37, 19, 4, 1, 40, 21, 3, 1, 29, 11, 4, 1, 33, 10, 3, 1, 34, 9, 3, 1, 35, 14, 3),
val = c(6553.48, 6565.6, 6565.6, 6577.72, 6568.04, 7117.14, 6578.98, 6609.28, 6559.54, 6561.98, 6561.98, 6592.28, 6547.42, 7537.64, 6549.86, 6555.92, 6546.24, 6557.18, 6557.18, 6589.92, 6586.22, 6588.66, 6588.66, 6631.08, 6547.42, 7172.86, 6569.3, 6582.6, 6547.42, 6583.78, 6547.42, 6575.28, 6555.92, 6565.68, 6565.68, 6575.36, 6551.04, 6551.04, 6551.04, 6563.16, 6549.86, 6549.86, 6549.86, 6555.92, 6544.98, 6549.86, 6549.86, 6561.98, 6558.36, 6563.24, 6563.24, 6578.98, 6566.86, 7080.78, 6570.48, 6572.92, 6565.6, 7073.46, 6580.16, 6612.9, 6557.18, 7351.04, 6562.06, 6593.54, 6547.42, 6552.3, 6552.3, 6558.36, 6553.48, 6576.54, 6576.54, 6612.9, 6555.92, 6560.8, 6560.8, 6570.48, 6566.86, 6617.78, 6572.92, 6578.98))
Your data in current form is not log distributed -- most val around 6500 and some 10% higher. If you want to stretch the data, you could use a custom transformation using the scales::trans_new(), or here's a simpler version that just subtracts a baseline value to make a log transform useful. After subtracting 6500, the small values will be mapped to around 50, with the large values around 1000, which is a more appropriate range for a log scale. Then we apply the same transformation to the breaks so that the labels will appear in the right spots. (i.e. the label 6550 is mapped to the data that is mapped to 6550 - 6500 = 50)
This method helps if you want to make the underlying values more distinguishable, but at the cost of distorting the underlying proportions between values. You might be able to help with this by picking useful breaks and labeling them with scaling stats, e.g.
7000
+7% over min
my_breaks <- c(6550, 6600, 6750, 7000, 7500)
baseline = 6500
library(ggplot2)
ggplot(data = solverEntries,
aes(x = val - baseline, y = instance,
color = solver, group = solver)) +
geom_point() +
scale_x_log10(breaks = my_breaks - baseline,
labels = my_breaks, name = "val")
Is this what you're looking for?
x_data <- seq(from=1,to=50)
y_data <- 2*x_data+rnorm(n=50,mean=0,sd=5)
#non log y
ggplot()+
aes(x=x_data,y=y_data)+
geom_point()
#log y scale
ggplot()+
aes(x=x_data,y=y_data)+
geom_point()+
scale_y_log10()
#log x scale
ggplot()+
aes(x=x_data,y=y_data)+
geom_point()+
scale_x_log10()
Let say i have a SpatialPolygons object with 3 polygons data name groupexc:
library(raster)
p1 <- matrix(c(2, 3, 4, 5, 6, 5, 4, 3, 2, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p2 <- matrix(c(8, 9, 10, 11, 12, 11, 10, 9, 8, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p3 <- matrix(c(5, 6, 7, 8, 9, 8, 7, 6, 5, 9, 10, 11, 10, 9, 8, 7, 8, 9), ncol=2)
groupexc <- spPolygons(p1, p2, p3)
And a SpatialPolygons object zoneexc that represents a single zone:
zoneexc = spPolygons(matrix(c(2,1,3,4,6,8,10,13,14,14,12,10,8,6,4,2,1,3,7,10,12,14,12,6,4,3,1,1,1,1,1,1), ncol=2))
Is there a way for me to expand the output from groupexc until it reach points in zoneexc?
before
plot(zoneexc, border='red', lwd=3)
plot(groupexc, add=TRUE, border='blue', lwd=2)
text(groupexc, letters[1:3])
after:
Any help would be appreciated.
Here is an approximate solution. This approach might break for large problems, and it depends on having sufficient number of nodes in each polygon. But it may be good enough for your purpose.
# example data
library(raster)
p1 <- matrix(c(2, 3, 4, 5, 6, 5, 4, 3, 2, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p2 <- matrix(c(8, 9, 10, 11, 12, 11, 10, 9, 8, 4, 5, 6, 5, 4, 3, 2, 3, 4), ncol=2)
p3 <- matrix(c(5, 6, 7, 8, 9, 8, 7, 6, 5, 9, 10, 11, 10, 9, 8, 7, 8, 9), ncol=2)
groups <- spPolygons(p1, p2, p3, attr=data.frame(name=c('a', 'b', 'c')))
zone <- spPolygons(matrix(c(2,1,3,4,6,8,10,13,14,14,12,10,8,6,4,2,1,3,7,10,12,14,12,6,4,3,1,1,1,1,1,1), ncol=2))
Now create nearest neighbor polygons. For this to work as below, you need dismo version 1.1-1 (or higher)
library(dismo)
# get the coordinates of the polygons
g <- unique(geom(groups))
v <- voronoi(g[, c('x', 'y')], ext=extent(zone))
# plot(v)
# assign group id to the new polygons
v$group <- g[v$id, 1]
# aggregate (dissolve) polygons by group id
a <- aggregate(v, 'group')
# remove areas outside of the zone
i <- crop(a, zone)
# add another identifier
i$name <- groups$name[i$group]
plot(i, col=rainbow(3))
text(i, "name", cex=2)
plot(groups, add=TRUE, lwd=2, border='white', lty=2)
To see how it works:
points(g[, c('x', 'y')], pch=20, cex=2)
plot(v, add=TRUE)
This is my data
x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22)
y = c(1, 6, 2, 5, 4, 7, 9, 6, 8, 4, 5, 6, 5, 5, 6, 7, 5, 8, 9,
5, 4, 7)
plot(x, y)
fit <- lm(y ~ x)
fit
abline(fit, col = "black", lwd = "1")
I would like to the plot to split the data into two groups, observations above the regression line and and those under the regression line. How can I do this?
You can use predict to get the fitted value at each x, and then a logical comparison between the observed and fitted to test if they're above or below the line. Then set the colors you plot based on this logical comparison.
prediction <- predict(fit)
colors<-ifelse(y>prediction,1,2)
plot(x,y,col=colors)
abline(fit, col= "black",lwd="1")
chocolate <- data.frame(
Sabor =
c(5, 7, 3,
4, 2, 6,
5, 3, 6,
5, 6, 0,
7, 4, 0,
7, 7, 0,
6, 6, 0,
4, 6, 1,
6, 4, 0,
7, 7, 0,
2, 4, 0,
5, 7, 4,
7, 5, 0,
4, 5, 0,
6, 6, 3
),
Tipo = factor(rep(c("A", "B", "C"), 15)),
Provador = factor(rep(1:15, rep(3, 15))))
tapply(chocolate$Sabor, chocolate$Tipo, mean)
ajuste <- lm(chocolate$Sabor ~ chocolate$Tipo + chocolate$Provador)
summary(ajuste)
anova(ajuste)
a1 <- aov(chocolate$Sabor ~ chocolate$Tipo + chocolate$Provador)
posthoc <- TukeyHSD(x=a1, 'chocolate$Tipo', conf.level=0.95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = chocolate$Sabor ~ chocolate$Tipo + chocolate$Provador)
$`chocolate$Tipo`
diff lwr upr p adj
B-A -0.06666667 -1.803101 1.669768 0.9950379
C-A -3.80000000 -5.536435 -2.063565 0.0000260
C-B -3.73333333 -5.469768 -1.996899 0.0000337
Here is some sample code using TukeyHSD. The output is a matrix, and I want the values to be displayed in scientific notation. I've tried using scipen and setting options(digits = 20) but some of my values from my actual data are still way too small so that the p adj values are 0.00000000000000000000
How can I get the values to be displayed in scientific notation?
You could do this:
format(posthoc, scientific = TRUE)
If you want to change the number of digits, for instance using 3, you could do this:
format(posthoc, scientific = TRUE, digits = 3)