Too many x-axis values on ggplot2 scatterplot - r

I would be very thankful for anyone with advice on this. I think this is a similar to question to one previously posted here (Too many factors on x axis).
I have a dataset as follows:
> head(outputDF)
var1 var2 snpR stepD
1 A B 1.55809163171629 6
2 A C 1.57475543745267 6
3 A D 1.36003481988361 4
4 A E 1.60338829251054 4
5 A F 1.54720598772132 5
6 B C 1.10321616677002 2
I have a nice scatterplot from the function:
ggplot(outputDF, aes(x=snpR, y=stepD)) +geom_point(shape=1) +xlab("SNPR Distance") +
ylab("StepD Distance")
But the problem is that since there are so many distinct snpR values on the x-axis, the x-axis numbers are unreadable, and there are too many vertical grids coming off each of these x-axis number labels.
I know it is a trick with scale_x_continuous but I am just lost playing around with it...

Related

Connect observations with lines to a common point (i.e. (0,0))

I would like to connect observations from my df with a common point, i.e. the centerpoint (0,0) using ggplot2.
x y
1 5 4
2 -4 -2
3 -1 5
4 2 -8
Using geom_point(), I get the following.
Now, I would like to have lines connecting the four observations with the centerpoint at (0,0), like in the following (not made with R):
Is this possible at all using ggplot2?
I found a solution:
ggplot(df) + geom_point(aes(x,y)) + geom_segment(aes(xend=0, yend=0))
Answer based on #roland comments on a question.

divide not rectangle plot into subplots within spatstat package in R

I have data that contains information about sub-plots with different numbers and their corresponding species types (more than 3 species within each subplot). Every species have X & Y coordinates.
> df
subplot species X Y
1 1 Apiaceae 268675 4487472
2 1 Ceyperaceae 268672 4487470
3 1 Vitaceae 268669 4487469
4 2 Ceyperaceae 268665 4487466
5 2 Apiaceae 268662 4487453
6 2 Magnoliaceae 268664 4487453
7 3 Magnoliaceae 268664 4487453
8 3 Apiaceae 268664 4487456
9 3 Vitaceae 268664 4487458
with these data, I have created ppp for the points of each subplot within a window of general plot (big).
grp <- factor(data$subplot)
win <- ripras(data$X, data$Y)
p.p <- ppp(data$X, data$Y, window = window, marks = grp)
Now I want to divide a plot into equal 3 x 3 sub-plots because there are 9 subplots. The genetal plot is not rectangular looks similar to rombo shape when I plot.
I could use quadrats() funcion as below but it has divided my plot into unequal subplots. Some are quadrat, others are traingle etc which I don't want. I want all the subplots to be equal sized quadrats (divide it by lines that paralel to each sides). Can you anyone guide me for this?
divide <-quadrats(p.patt,3,3)
plot(divide)
Thank you!
Could you break up the plot canvas into 3x3, then run each plot?
> par(mfrow=c(3,3))
> # run code for plot 1
> # run code for plot 2
...
> # run code for plot 9
To return back to one plot on the canvas type
> par(mfrow=c(1,1))
This is a question about the spatstat package.
You can use the function quantess to divide the window into tiles of equal area. If you want the tile boundaries to be vertical lines, and you want 7 tiles, use
B <- quantess(Window(p.patt), "x", 7)
where p.patt is your point pattern.

Color the individuals of a R PCoA plot by groups

Should be a simple question, but I haven't found exactly how to do it so far.
I have a matrix as follow:
sample var1 var2 var3 etc.
1 5 7 3 1
2 0 1 6 8
3 7 6 8 9
4 5 3 2 4
I performed a PCoA using Vegan and plotted the results. Now my problem is that I want to color the samples according to a pre-defined group:
group sample
1 1
1 2
2 3
2 4
How can I import the groups and then plot the points colored according to the group tey belong to? It looks simple but I have been scratching my head over this.
Thanks!
Seb
You said you used vegan PCoA which I assume to mean wcmdscale function. The default vegan::wcmdscale only returns a scores matrix similarly as standard stats::cmdscale, but if you added some special arguments (such as eig = TRUE) you get a full wcmdscale result object with dedicated plot and points methods and you can do:
plot(<pcoa-result>, type="n") # no reproducible example: edit like needed
points(<pcoa-result>, col = group) # no reproducible example: group must be visible
If you have a modern vegan (2.5.x) the following also works:
library(magrittr)
plot(<full-pcoa-result>, type = "n") %>% points("sites", col = group)

Plotting numeric y-value data with non-numeric x-values in R?

Let's say I have data looking like this:
type value
A 1
A 1
A 2
A 2
A 3
B 2
B 2
B 2
B 3
C 2
C 3
C 4
C 5
How can I plot this in one graph, so I have the A, B, and C types on the x-axis, and then the corresponding y-values for each type plotted as dots?
So kind of a scatter plot, but with fixed x-values.
Try using ggplot2. It automatically identifies categorical variables and treats them accordingly.
library(ggplot)
#say your dataframe is stored as data
ggplot(aes(x=data$type,y=data$value))+geom_point()
As Ian points out, this will indeed over plot. You can read about it here. So if you are ok with a 'small amount of random variation to the location of each point', then +geom_jitter is a useful way of handling overplotting.

Connecting grouped dots/points on a scatter plot based on distance

I have 2 sets of depth point measurements, for example:
> a
depth value
1 2 2
2 4 3
3 6 4
4 8 5
5 16 40
6 18 45
7 20 58
> b
depth value
1 10 10
2 12 20
3 14 35
I want to show both groups in one figure plotted with depth and with different symbols as you can see here
plot(a$value, a$depth, type='b', col='green', pch=15)
points(b$value, b$depth, type='b', col='red', pch=14)
The plot seems okay, but the annoying part is that the green symbols are all connected (though I want connected lines also). I want connection only when one group has a continued data points at 2 m interval i.e. the symbols should be connected with a line from 2 to 8 m (green) and then group B symbols should be connected from 10-14 m (red) and again group A symbols should be connected (green), which means I do NOT want to see the connection between 8 m sample with the 16 m for group A.
An easy solution may be dividing the group A into two parts (say, A-shallow and A-deep) and then plotting A-shallow, B, and A-deep separately. But this is completely impractical because I have thousands of data points with hundreds of groups i.e. I have to produce many depth profiles. Therefore, there has to be a way to program so that dots are NOT connected beyond a prescribed frequency/depth interval (e.g. 2 m in this case) for a particular group of samples. Any idea?
If plot or lines encounters and NA value, it will automatically break the line. Using that, we can insert NA values for missing measurements in your data and that would fix the problem. One way is this
rng<-range(range(a$depth), range(b$depth))
rng<-seq(rng[1], rng[2], by=2)
aa<-rep(NA, length(rng))
aa[match(a$depth, rng)]<-a$value
bb<-rep(NA, length(rng))
bb[match(b$depth, rng)]<-b$value
plot(aa, rng, type='b', col='green', pch=15)
points(bb, rng, type='b', col='red', pch=14)
Which produces
Note that this code assumes that all depth measurements are evenly divisible by 2.
I'm not sure if you really have separate data.frames for all of your groups, but there may be better ways to fill in missing values depending on your real data structure.
We can use the fact that lines will but breaks in when there is a NA, like MrFlick suggests. There might be a simpler way, though:
#Merge the two sets together
all = merge(a,b,by='depth', all=T)
#Plot the lines
plot(all$value.x, all$depth, type='b', col='green', pch=15)
points(all$value.y, all$depth, type='b', col='red', pch=14)

Resources