How to make scatter plot points into numbers? - r

I am creating a scatter plot using ggplot/geom_point. Here is my code for building the function in ggplot.
AddPoints <- function(x) {
list(geom_point(data = dat , mapping = aes(x = x, y = y) , shape = 1 , size = 1.5 ,
color = "blue"))
}
I am wondering if it would be possible to replace the standard points on the plot with numbers. That is, instead of seeing a dot on the plot, you would see a number on the plot to represent each observation. I would like that number to correspond to a column for that given observation (column name 'RP'). Thanks in advance.
Sample data.
Data <- data.frame(
X = sample(1:10),
Y = sample(3:12),
RP = sample(c(4,8,9,12,3,1,1,2,7,7)))

Use geom_text() and map the rp variable to the label argument.
ggplot(Data, aes(x = X, y = Y, label = RP)) +
geom_text()

Related

Add a labelling function to just first or last ggplot label

I often find myself working with data with long-tail distributions, so that a huge amount of range in values happens in the top 1-2% of the data. When I plot the data, the upper outliers cause variation in the rest of the data to wash out, but I want to show those difference.
I know there are other ways of handling this, but I found that capping the values towards the end of the distribution and then applying a continuous color palette (i.e., in ggplot) is one way that works for me to represent the data. However, I want to ensure the legend stays accurate, by adding a >= sign to the last legend label
The picture below shows the of legend I'd like to achieve programmatically, with the >= sign drawn in messily in red.
I also know I can manually set breaks and labels, but I'd really like to just do something like, if(it's the last label) ~paste0(">=",label) else label) (to show with pseudo code)
Reproducible example:
(I want to alter the plot legend to prefix just the last label)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- tibble(x = x
,y = y
,z = z)
d %>%
ggplot(aes(x = x
,y = y
,fill = z
,color = z)) +
geom_point() +
scale_color_viridis_c()
One option would be to pass a function to the labels argument which replaces the last element or label with your desired label like so:
library(ggplot2)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- data.frame(
x = x,
y = y,
z = z
)
ggplot(d, aes(
x = x,
y = y,
fill = z,
color = z
)) +
geom_point() +
scale_fill_continuous(labels = function(x) {
x[length(x)] <- paste0(">=", x[length(x)])
x
}, aesthetics = c("color", "fill"))

How plot new point in ggplot with older color data?

I know similar questions asked before but my question is different. Consider data points data1 that have colors with respect to x and y coordinates and I plot it with ggplot
x = 1:100
y = 1:100
d = expand.grid(x,y)
data1 <- data.frame(
xval = d$Var1,
yval = d$Var2,
col = d$Var1+d$Var2)
data2 <- data.frame(
xnew = c(1.5, 90.5),
ynew = c(95.5, 4))
ggplot(data1, aes(xval, yval, colour = col)) + geom_point()
But I want the last line don't plot anything and I want plot data2 points with respect to colors of data1. for example I paint what I want to plot for data2 :
I changed the last line to:
ggplot(data1, aes(xval, yval, colour = col)) +
geom_point(data = data2, aes(x = xnew, y = ynew))
Now I expect that ggplot draw just 2 points of data2, but I have an Error:
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: Column colour must be a 1d atomic vector or a list
The problem is, that there is no mapping between col out of data1 and your data2.
Please try the following:
ggplot(data2, aes(x = xnew, y = ynew, colour = xnew)) + geom_point() +
scale_fill_gradientn(colours=c(2,1),
values = range(data1$xval),
rescaler = function(x,...) x,
oob = identity)

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

Add multiple ggplot2 geom_segment() based on mean() and sd() data

I have a data frame mydataAll with columns DESWC, journal, and highlight. To calculate the average and standard deviation of DESWC for each journal, I do
avg <- aggregate(DESWC ~ journal, data = mydataAll, mean)
stddev <- aggregate(DESWC ~ journal, data = mydataAll, sd)
Now I plot a horizontal stripchart with the values of DESWC along the x-axis and each journal along the y-axis. But for each journal, I want to indicate the standard deviation and average with a simple line. Here is my current code and the results.
stripchart2 <-
ggplot(data=mydataAll, aes(x=mydataAll$DESWC, y=mydataAll$journal, color=highlight)) +
geom_segment(aes(x=avg[1,2] - stddev[1,2],
y = avg[1,1],
xend=avg[1,2] + stddev[1,2],
yend = avg[1,1]), color="gray78") +
geom_segment(aes(x=avg[2,2] - stddev[2,2],
y = avg[2,1],
xend=avg[2,2] + stddev[2,2],
yend = avg[2,1]), color="gray78") +
geom_segment(aes(x=avg[3,2] - stddev[3,2],
y = avg[3,1],
xend=avg[3,2] + stddev[3,2],
yend = avg[3,1]), color="gray78") +
geom_point(size=3, aes(alpha=highlight)) +
scale_x_continuous(limit=x_axis_range) +
scale_y_discrete(limits=mydataAll$journal) +
scale_alpha_discrete(range = c(1.0, 0.5), guide='none')
show(stripchart2)
See the three horizontal geom_segments at the bottom of the image indicating the spread? I want to do that for all journals, but without handcrafting each one. I tried using the solution from this question, but when I put everything in a loop and remove the aes(), it give me an error that says:
Error in x - from[1] : non-numeric argument to binary operator
Can anyone help me condense the geom_segment() statements?
I generated some dummy data to demonstrate. First, we use aggregate like you have done, then we combine those results to create a data.frame in which we create upper and lower columns. Then, we pass these to the geom_segment specifying our new dataset. Also, I specify x as the character variable and y as the numeric variable, and then use coord_flip():
library(ggplot2)
set.seed(123)
df <- data.frame(lets = sample(letters[1:8], 100, replace = T),
vals = rnorm(100),
stringsAsFactors = F)
means <- aggregate(vals~lets, data = df, FUN = mean)
sds <- aggregate(vals~lets, data = df, FUN = sd)
df2 <- data.frame(means, sds)
df2$upper = df2$vals + df2$vals.1
df2$lower = df2$vals - df2$vals.1
ggplot(df, aes(x = lets, y = vals))+geom_point()+
geom_segment(data = df2, aes(x = lets, xend = lets, y = lower, yend = upper))+
coord_flip()+theme_bw()
Here, the lets column would resemble your character variable.

plotting with ggplot2. Error

I am trying to plot the data using the ggplot2 package, but I am crossing with an error:
the data are set of columns which represents every day values (the values change in altitude)
V1 V2.... V500
2E-15.....3E-14
3e-14.....3E-21
1.3E-15....NA
I want to plot all the data in two axis with a fill of the values.
Code;
a<-data.frame("/../vertical_value.csv",sep=",",header=F)
am<-melt(t(a))
dataset<-expand.grid(X = 1:500, H = seq(1,25,by=1))
dataset$axp<-am$value
g<-ggplot(dataset, aes(x = X, y = H, fill = axp)) + geom_tile()
error:
Error: Casting formula contains variables not found in molten data: XHaxp
Looking at this again, I think that you should be able to bypass this just by dropping NA rows after you melt.
a<-data.frame("/../vertical_value.csv",sep=",",header=F)
am<-melt(t(a))
am <- na.omit(am) ## ADD THIS LINE
dataset<-expand.grid(X = 1:500, H = seq(1,25,by=1))
dataset$axp<-am$value
g<-ggplot(dataset, aes(x = X, y = H, fill = axp)) + geom_tile()

Resources