Let's say I have data looking like this:
type value
A 1
A 1
A 2
A 2
A 3
B 2
B 2
B 2
B 3
C 2
C 3
C 4
C 5
How can I plot this in one graph, so I have the A, B, and C types on the x-axis, and then the corresponding y-values for each type plotted as dots?
So kind of a scatter plot, but with fixed x-values.
Try using ggplot2. It automatically identifies categorical variables and treats them accordingly.
library(ggplot)
#say your dataframe is stored as data
ggplot(aes(x=data$type,y=data$value))+geom_point()
As Ian points out, this will indeed over plot. You can read about it here. So if you are ok with a 'small amount of random variation to the location of each point', then +geom_jitter is a useful way of handling overplotting.
Related
Should be a simple question, but I haven't found exactly how to do it so far.
I have a matrix as follow:
sample var1 var2 var3 etc.
1 5 7 3 1
2 0 1 6 8
3 7 6 8 9
4 5 3 2 4
I performed a PCoA using Vegan and plotted the results. Now my problem is that I want to color the samples according to a pre-defined group:
group sample
1 1
1 2
2 3
2 4
How can I import the groups and then plot the points colored according to the group tey belong to? It looks simple but I have been scratching my head over this.
Thanks!
Seb
You said you used vegan PCoA which I assume to mean wcmdscale function. The default vegan::wcmdscale only returns a scores matrix similarly as standard stats::cmdscale, but if you added some special arguments (such as eig = TRUE) you get a full wcmdscale result object with dedicated plot and points methods and you can do:
plot(<pcoa-result>, type="n") # no reproducible example: edit like needed
points(<pcoa-result>, col = group) # no reproducible example: group must be visible
If you have a modern vegan (2.5.x) the following also works:
library(magrittr)
plot(<full-pcoa-result>, type = "n") %>% points("sites", col = group)
Still learning R, and have been struggling with plotting. Below is part of my data, and I will try to explain the type of plot:
> head(bees.net.counts)
Month Block Treatment Flower Bee_Richness Bee_Abundance
1 May 1 UB POSI 1 1
2 May 2 DS ERST 4 38
3 May 2 UB RUBU 2 2
4 May 3 DS ERST 3 4
5 May 3 DS TROH 1 10
6 May 3 GS ERST 1 1
I want to make a plot where Flower is on the x-axis (there are 54 different ones), Bee_Richness or Bee_Abundance is on the y-axis, different colored symbols for Block (n=4) and amount of shading in each of those symbols for Treatment (n=3) (ie Block 1 Treatment UB is a red circle unfilled, Block 1 Treatment DS is a circle with half shaded red, and Block 1 Treatment GS is fully shaded red).
The problem I have is that each line is plotted instead of putting every point above a specific flower spp (there are multiple rows that have, say, CHFA, but those represent different Blocks and Treatments).
I have also tried this by month, where I separated the four months to make different graphs (to limit the length of the x-axis). There are 10 records in May, with 4 different flower species. I still can't figure out a way to do this.
Thank you for your help!!
Edit: Here is what I hope to get = plot idea
This uses the idea of #d.b 's solution, but improves the axis labels.
plot(x = as.numeric(as.factor(df$Flower)), df$Bee_Richness,
pch = as.numeric(as.factor(df$Block)),
col = as.numeric(as.factor(df$Treatment)),
xaxt="n", xlab="Flower", ylab="Richness")
axis(1, at=1:length(levels(df$Flower)),
labels=levels(df$Flower))
Some added explanation
As you requested, the character is based on the Block.
The color is based on the Treatment. Let's look at the
color/Treatment. The trick is that when you make Treatment a factor,
each value is internally represented as an integer, so you can
use as.numeric on the factor and it translates
DS to 1, GS to 2 and UB to 3. That makes the argument
col = as.numeric(as.factor(df$Treatment))
give DS color 1 and so on. R uses the numbers 1-8 as some
easy-to-access colors. Since you only need 3, this works fine.
Similarly,
pch = as.numeric(as.factor(df$Block))
picks characters 1 through 3 for the three Block values in the small test data.
I have a set of paired data, x and y, that I want to plot but they are of varying lengths due to some NA values in y. How can I plot x and y only where there is data present in both variables?
x y
10 1
2 3
4 NA # not plotted
10 40
try - plot(na.pass(df)) might be useful in this case.
I have 2 sets of depth point measurements, for example:
> a
depth value
1 2 2
2 4 3
3 6 4
4 8 5
5 16 40
6 18 45
7 20 58
> b
depth value
1 10 10
2 12 20
3 14 35
I want to show both groups in one figure plotted with depth and with different symbols as you can see here
plot(a$value, a$depth, type='b', col='green', pch=15)
points(b$value, b$depth, type='b', col='red', pch=14)
The plot seems okay, but the annoying part is that the green symbols are all connected (though I want connected lines also). I want connection only when one group has a continued data points at 2 m interval i.e. the symbols should be connected with a line from 2 to 8 m (green) and then group B symbols should be connected from 10-14 m (red) and again group A symbols should be connected (green), which means I do NOT want to see the connection between 8 m sample with the 16 m for group A.
An easy solution may be dividing the group A into two parts (say, A-shallow and A-deep) and then plotting A-shallow, B, and A-deep separately. But this is completely impractical because I have thousands of data points with hundreds of groups i.e. I have to produce many depth profiles. Therefore, there has to be a way to program so that dots are NOT connected beyond a prescribed frequency/depth interval (e.g. 2 m in this case) for a particular group of samples. Any idea?
If plot or lines encounters and NA value, it will automatically break the line. Using that, we can insert NA values for missing measurements in your data and that would fix the problem. One way is this
rng<-range(range(a$depth), range(b$depth))
rng<-seq(rng[1], rng[2], by=2)
aa<-rep(NA, length(rng))
aa[match(a$depth, rng)]<-a$value
bb<-rep(NA, length(rng))
bb[match(b$depth, rng)]<-b$value
plot(aa, rng, type='b', col='green', pch=15)
points(bb, rng, type='b', col='red', pch=14)
Which produces
Note that this code assumes that all depth measurements are evenly divisible by 2.
I'm not sure if you really have separate data.frames for all of your groups, but there may be better ways to fill in missing values depending on your real data structure.
We can use the fact that lines will but breaks in when there is a NA, like MrFlick suggests. There might be a simpler way, though:
#Merge the two sets together
all = merge(a,b,by='depth', all=T)
#Plot the lines
plot(all$value.x, all$depth, type='b', col='green', pch=15)
points(all$value.y, all$depth, type='b', col='red', pch=14)
I would be very thankful for anyone with advice on this. I think this is a similar to question to one previously posted here (Too many factors on x axis).
I have a dataset as follows:
> head(outputDF)
var1 var2 snpR stepD
1 A B 1.55809163171629 6
2 A C 1.57475543745267 6
3 A D 1.36003481988361 4
4 A E 1.60338829251054 4
5 A F 1.54720598772132 5
6 B C 1.10321616677002 2
I have a nice scatterplot from the function:
ggplot(outputDF, aes(x=snpR, y=stepD)) +geom_point(shape=1) +xlab("SNPR Distance") +
ylab("StepD Distance")
But the problem is that since there are so many distinct snpR values on the x-axis, the x-axis numbers are unreadable, and there are too many vertical grids coming off each of these x-axis number labels.
I know it is a trick with scale_x_continuous but I am just lost playing around with it...