colored points in R - r

I have a table with 3 numeric columnes. Two of them are coordinates and the third one means color. There are hundreds of rows in my text file.
I want to make a picture, where to first numbers mean coordinates of each point and the third one is the color of the point. The bigger number - the darker point.
How could i do this?
The example of the row in my file:
99.421875 48.921875 0.000362286050144

Will this do?
require(ggplot2)
# assuming your data is in df and x,y, and col are the column names.
ggplot(data = df, aes(x = x, y = y)) +
geom_point(colour="red", size = 3, aes(alpha=col))
# sample data
set.seed(45)
df <- data.frame(x=runif(100)*sample(1:10, 100, replace=T),
y= runif(100*sample(1:50, 100, replace=T)),
col=runif(100/sample(1:100)))
Plot:

A lattice solution:
library(lattice)
mydata <- matrix(c(1,2,3,1,1,1,2,5,10),nrow=3)
xyplot(mydata[,2] ~ mydata[,1], col = mydata[,3], pch= 19 ,
alpha = (mydata[,3]/10), cex = 15)
alpha here controls the transparency.

Here is a base R solution:
##Generate data
##Here z lies between 0 and 10
dd = data.frame(x = runif(100), y= runif(100), z= runif(100, 0, 10))
First normalise z:
dd$z = dd$z- min(dd$z)
dd$z = dd$z/max(dd$z)
Then plot as normal using the size of z for the shading:
##See ?gray for other colour combinations
##pch=19 gives solid points. See ?point for other shapes
plot(dd$x, dd$y, col=gray(dd$z), pch=19)

Another solution using base... to change the colour, you can replace some of data[,3] to 0 inside the rgb()
n <- 1000
data <- data.frame(x=runif(n),y=runif(n),col=runif(n))
plot(data[,1:2],col=rgb(data[,3],data[,3],data[,3],maxColorValue = max(data[,3])),pch=20)

Related

Indicating the maximum values and adding corresponding labels on a ggplot

ggplot(data = dat) + geom_line(aes(x=foo,y=bar)) +geom_line(aes(x=foo_land,y=bar_land))
which creates a plot like the following:
I want to try and indicate the maximum values on this plot as well as add corresponding labels to the axis like:
The data for the maximum x and y values is stored in the dat file.
I was attempting to use
geom_hline() + geom_vline()
but I couldn't get this to work. I also believe that these lines will continue through the rest of the plot, which is not what I am trying to achieve. I should note that I would like to indicate the maximum y-value and its corresponding x value. The x-value is not indicated here since it is already labelled on the axis.
Reproducible example:
library(ggplot2)
col1 <- c(1,2,3)
col2 <- c(2,9,6)
df <- data.frame(col1,col2)
ggplot(data = df) +
geom_line(aes(x=col1,y=col2))
I would like to include a line which travels up from 2 on the x-axis and horizontally to the y-axis indicating the point 9, the maximum value of this graph.
Here's a start, although it does not make the axis text red where that maximal point is:
MaxLines <- data.frame(col1 = c(rep(df$col1[which.max(df$col2)], 2),
-Inf),
col2 = c(-Inf, rep(max(df$col2), 2)))
MaxLines creates an object that says where each of three points should be for two segments.
ggplot(data = df) +
geom_line(aes(x=col1,y=col2)) +
geom_path(data = MaxLines, aes(x = col1, y = col2),
inherit.aes = F, color = "red") +
scale_x_continuous(breaks = c(seq(1, 3, by = 0.5), df$col1[which.max(df$col2)])) +
scale_y_continuous(breaks = c(seq(2, 9, by = 2), max(df$col2)))

How do I move lines in a ggplot to create a 3D effect and add a pseudo-axis that indicates the name of each line?

I have 3 lines in a plot which I want to move 1 unit on the X axis and 100 units on the Y axis to create a 3D effect as in the example below. So far I have only been able to do the lines. I tried with the position_nudge() function, but it didn't have the effect I expected, it changed the scale of the axes, but not the position of the lines.
Plus: If the plot with the frames looks like a cube, that would be a great thing.
Example:
MWE:
library(ggplot2)
Group <- c("A", "B", "C")
Time <- 0:10
DF <- expand.grid(Time = Time,
Group = Group)
DF$Y <- c(rep(1,5), 100, rep(1,5),
rep(1,5), 500, rep(1,5),
rep(1,5), 1000, rep(1,5))
ggplot(data = DF,
aes(x = Time,
y = Y,
color = Group)) +
geom_line(position = position_nudge(y = 100, x=1)) +
theme_bw()

Can I draw a horizontal line at specific number of range of values using ggplot2?

I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)

Color every point in a polygon depending on another dataset of points, in R

Problem:
1.) I have a shapefile that looks like this:
Extreme values for coordinates are: xmin = 300,000, xmax = 620,000, ymin = 31,000 and ymax = 190,000.
2.) I have a dataset of approx. 2mio points (every point is inside the given polygon) - each one is in one of a 5 different categories.
Now, for every point inside the border (distance between points has to be 10, so that would give us 580,800,000 points) I want to determine color, depending on a category of the nearest point in a dataset.
In the end I would like to draw a ggplot, where the color of every point is dependent on its category (so I'll use 5 different colors).
What I have so far:
My ideas for solution are not optimized and it takes R forever to determine categories for every point inside the polygon.
1.) I created a new dataset with points in a shape of a rectangle with extreme values of coordinates, with 10 units between points. From a new dataset I selected points that have fallen inside the border of polygons (with a function pnt.in.poly from package SDMTools). Then I wanted to find nearest points (from dataset) of every point in a polygon and determined category, but I never manage to get a subset from 580,800,000 points (obviously).
2.) I tried to take 2mio points and color an area around them, dependent on their category, but that did not work right.
I know that it is not possible to plot so many points and see the difference between plot with 200,000,000 points and plot with 1,000,000 points, but I would like to have an accurate coloring when zooming (drawing) only one little spot in a polygon (size of 100 x 100 for example).
Question: Is there any better a way of coloring so many points in a polygon (with creating a new shapefile or grouping points)?
Thank you for your ideas!
It’s really helpful if you include some data with your question, even (especially) if it’s a toy data set. As you don’t, I’ve made a toy example. First, I define a simple shape data frame and a data frame of synthetic data that includes x, y, and grp (i.e., a categorical variable with 5 levels). I crop the latter to the former and plot the results,
# Dummy shape function
df_shape <- data.frame(x = c(0, 0.5, 1, 0.5, 0),
y = c(0, 0.2, 1, 0.8, 0))
# Load library
library(ggplot2)
library(sgeostat) # For in.polygon function
# Data frame of synthetic data: random [x, y] and category (grp)
df_synth <- data.frame(x = runif(500),
y = runif(500),
grp = factor(sample(1:5, 500, replace = TRUE)))
# Remove points outside polygon
df_synth <- df_synth[in.polygon(df_synth$x, df_synth$y, df_shape$x, df_shape$y), ]
# Plot shape and synthetic data
g <- ggplot(df_shape, aes(x = x, y = y)) + geom_path(colour = "#FF3300", size = 1.5)
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_synth, aes(x = x, y = y, colour = grp))
g
Next, I create a regular grid and crop that using the polygon.
# Create a grid
df_grid <- expand.grid(x = seq(0, 1, length.out = 50),
y = seq(0, 1, length.out = 50))
# Check if grid points are in polygon
df_grid <- df_grid[in.polygon(df_grid$x, df_grid$y, df_shape$x, df_shape$y), ]
# Plot shape and show points are inside
g <- ggplot(df_shape, aes(x = x, y = y)) + geom_path(colour = "#FF3300", size = 1.5)
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_grid, aes(x = x, y = y))
g
To classify each point on this grid by the nearest point in the synthetic data set, I use knn or k-nearest-neighbours with k = 1. That gives something like this.
# Classify grid points according to synthetic data set using k-nearest neighbour
df_grid$grp <- class::knn(df_synth[, 1:2], df_grid, df_synth[, 3])
# Show categorised points
g <- ggplot()
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_grid, aes(x = x, y = y, colour = grp))
g
So, that's how I'd address that part of your question about classifying points on a grid.
The other part of your question seems to be about resolution. If I understand correctly, you want the same resolution even if you're zoomed in. Also, you don't want to plot so many points when zoomed out, as you can't even see them. Here, I create a plotting function that lets you specify the resolution. First, I plot all the points in the shape with 50 points in each direction. Then, I plot a subregion (i.e., zoom), but keep the same number of points in each direction the same so that it looks pretty much the same as the previous plot in terms of numbers of dots.
res_plot <- function(xlim, xn, ylim, yn, df_data, df_sh){
# Create a grid
df_gr <- expand.grid(x = seq(xlim[1], xlim[2], length.out = xn),
y = seq(ylim[1], ylim[2], length.out = yn))
# Check if grid points are in polygon
df_gr <- df_gr[in.polygon(df_gr$x, df_gr$y, df_sh$x, df_sh$y), ]
# Classify grid points according to synthetic data set using k-nearest neighbour
df_gr$grp <- class::knn(df_data[, 1:2], df_gr, df_data[, 3])
g <- ggplot()
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_gr, aes(x = x, y = y, colour = grp))
g <- g + xlim(xlim) + ylim(ylim)
g
}
# Example plot
res_plot(c(0, 1), 50, c(0, 1), 50, df_synth, df_shape)
# Same resolution, but different limits
res_plot(c(0.25, 0.75), 50, c(0, 1), 50, df_synth, df_shape)
Created on 2019-05-31 by the reprex package (v0.3.0)
Hopefully, that addresses your question.

Detect outer rows in the dataset

I have data set that contain positions of the objects:
so <- data.frame(x = rep(c(1:5), each = 5), y = rep(1:5, 5))
so1 <- so %>% mutate(x = x + 5, y = y +2)
so2 <- rbind(so, so1) %>% mutate(x = x + 13, y = y + 7)
so3 <- so2 %>% mutate(x = x + 10)
ggplot(aes(x = x, y = y), data = rbind(so, so1, so2, so3)) + geom_point()
What I want to know is if there is a method in R that can detect that the object is located in the outer row in the data set as I have to exclude such objects from the analysis. I want to exclude the objects in red as on the picture
So far I used min, max and ifelse but this is tidious and I could not create something that could be generalised to the different data sets with different design of x and y.
Is there any package that do the thing? or/and is it possible to solve such a problem?
You could perhaps use a "spatial" approach?
Visualizing your data as a spatial object, your problem would become to remove the borders of your patches...
This can be done quite straightforwardly using the package raster: find the boundaries and mask your data accordingly.
library(dplyr)
library(raster)
# Your reproducible example
myDF = rbind(so,so1,so2,so3)
myDF$z = 1 # there may actually be more 'z' variables
# Rasterize your data
r = rasterFromXYZ(myDF) # if there are more vars, this will be a RasterBrick
par(mfrow=c(2,2))
plot(r, main='Original data')
# Here I artificially add 1 row above and down and 1 column left and right,
# This is a trick needed to make sure to also remove the cells that are
# located at the border of your raster with `boundaries` in the next step.
newextent = extent(r) + c(-res(r)[1], res(r)[1], -res(r)[2], res(r)[2] )
r = extend(r, newextent)
plot(r, main='Artificially extended')
plot(rasterToPoints(r, spatial=T), add=T, col='blue', pch=20, cex=0.3)
# Get the cells to remove, i.e. the boundaries
bounds = boundaries(r[[1]], asNA=T) #[[1]]: in case r is a RasterBrick
plot(bounds, main='Cells to remove (where 1)')
plot(rasterToPoints(bounds, spatial=T), add=T, col='red', pch=20, cex=0.3)
# Then mask your data (i.e. subset to remove boundaries)
subr = mask(r, bounds, maskvalue=1)
plot(subr, main='Resulting data')
plot(rasterToPoints(subr, spatial=T), add=T, col='blue', pch=20, cex=0.3)
# This is your new data (the added NA's are not translated so it's OK)
myDF2 = rasterToPoints(subr)
Would it help you?

Resources