give different pch to names in scatter plot R? - r

I have these:
x=c(2,1,5,2) ; y=c(6,11,7,3)
x1=c(7,6,7,3) ; y1=c(3,9,4,3)
names(y1) = c("B", "C","A","D"); names(x1) = c("A", "B","C","D")
names(y) = c("C", "B","A","D");names(x) = c("D", "A","B","D")
plot(x1,y1,col="green")
The problem here is that it takes first value of x1 (7) and first value of y1 (3) and plot them. Which means “A” and “B”. I would like the correspondence to be A from x1 (7) should be plot with A from y1 (4).
Also I want to give different pch for each letter and plot the legend (in the plot all are dots (circles)).
Any hnts on this?

I'd recommend storing your data in data frames, not separate vectors. In this case, using data frames makes it easy to merge your x and y data so that they line up by name:
dx = data.frame(name = names(x1), x1 = x1)
dy = data.frame(name = names(y1), y1 = y1)
d = merge(dx, dy)
d
# name x1 y1
# 1 A 7 4
# 2 B 6 3
# 3 C 7 9
# 4 D 3 3
Then plotting works pretty easily, again using the data frame:
with(d, plot(x1,y1,col="green", pch = as.integer(name)))
I'll leave adding the legend to you - just search for "how to add a legend to a plot in R", or look at ?legend.
As a side-note, ggplot2 is very popular for plotting. It automatically adds legends, like this:
library(ggplot2)
ggplot(d, aes(x = x1, y = y1, shape = name)) +
geom_point(color = "green") +
theme_bw()

Related

R: How to get a scatter plot from matrix data with discrete x axis

I'm pretty new at R and coding so I don't know how to explain it well on this site but I couldn't find a better forum to ask.
Basically I have a 6x6 matrix with each row being a discrete gene and each column being a sample.
I want the genes as the x-axis and the y-axis being the values of the samples, so that each gene will have its 6 samples above at their respective value.
I have this matrix in Excel and when I highlight it and plot it it gives me exactly what I want.
But trying to reduplicate it in R gives me a giant lattice plot at best.
I've tried boxplot(), scatterchart(), plot(), and ggplot().
I'm assuming I have to alter my matrix but I don't know how.
this may help:
library(tidyverse)
gene <- c("a", "b", "c", "d", "e", "f")
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,-6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c(4,-5,6,7,8,9)
x5 <- c(9,8,7,6,5,4)
x6 <- c(5,4,3,2,-1,0)
df <- data.frame(gene, x1, x2, x3, x4, x5, x6) #creates data.frame
as_tibble(df) # convenient way to check data.frame values and column format types
df <- df %>% gather(sample, observation, 2:7) # here's the conversion to long format
as_tibble(df) #watch df change
#example plots
p1 <- ggplot(df, aes(x = gene, y = observation, color = sample)) + geom_point()
p1
p2 <- ggplot(df, aes(x = gene, y = observation, group = sample, color = sample)) +
geom_line()
p2
p3 <- p2 + geom_point()
p3
This is very easy to solve - if your matrix is 6x6 with one gene per row and one observation per column (thus six observations per gene) you first need to make it long format (36 rows) - with such a simple format this can be done using unlist - and then plotting that against a vector of numbers for representing the genes:
# Here I make some dummy data - a 6x6 matrix of random numbers:
df1 <- matrix(rnorm(36,0,1), ncol = 6)
# To help show which way the data unlists, and make the
# genes different, I add 4 to gene 1:
df1[1,] <- df1[1,] + 4
#### TL;DR - HERE IS THE SOULTION ####
# Then plot it, using rep to make the x-axis data vector
plot(x = rep(1:6, times = 6), y = unlist(df1))
To improve the readability add axis labels:
# With axis labels
plot(x = rep(1:6, times = 6), y = unlist(df1),
xlab = 'Gene', ylab = 'Value')
You could also used ggplot with the geom_point aesthetic or geom_jitter - e.g:
ggplot() +
geom_jitter(mapping = aes(x = rep(1:6, times = 6), y = as.numeric(unlist(data.frame(df1)))))
Note that you can also create a "jitter" effect in base R using rnorm() on the x values, tweaking the amount of jittering with the last argument of the rnorm() function:
plot(x = rep(1:6, times = 6) + rnorm(36, 0, 0.05), y = unlist(df1), xlab = 'Gene', ylab = 'Value')

ggplot display geom_segment as a sequence of points

I am trying to display some data, where I don't only need to display a point using geom_point, but also want to trace a line to it from the axis. I figured I can do it with geom_segment, but I want to display a sequence of discrete dots instead.
Say I have a data like this:
df2 <- data_frame(x = c("a", "b", "c" ,"d"), y = c(3:6))
# A tibble: 4 × 2
x y
<chr> <int>
1 a 3
2 b 4
3 c 5
4 d 6
What I want to get is like the graph below, only having a dot in each of 4 variables between 0 and their value (with the desired points marked manually in red):
ggplot(df2, aes(x=x)) + geom_point(aes(y=y)) + geom_point(aes(y=0))
This works... you could wrap it up in a function to make it more generalizable if needed.
First we use expand.grid to create all combinations of x and 1:(max(y) - 1), join it to the original data, and filter out the unnecessary ones.
library(dplyr)
df3 = left_join(expand.grid(x = unique(df2$x), i = 1:max(df2$y - 1)),
df2) %>%
filter(i < y)
Once the data is constructed, the plotting is easy:
ggplot(df2, aes(x=x)) +
geom_point(aes(y=y)) +
geom_point(y = 0) +
geom_point(data = df3, aes(y = i), color = "red") +
expand_limits(y = 0)
I'm not sure if you actually want the dots to be red - if you want them to all look the same then you could use 1:max(df2$y) (omit the -1) and use <= in the filter to and then only use the resulting data frame.
If you wanted to use a data.table approach, using a similar expansion methodology you could use:
dt <- setDT(df2)
dt_expand<-dt[rep(seq(nrow(dt)),dt$y),]
dt_expand[,y2:=(1:.N),by=.(x)]
ggplot(dt_expand, aes(x=x)) + geom_point(aes(y=y2)) + geom_point(aes(y=0))
Note I didn't include the red coloring, but that is easily done if you want it
Here a solution in base R. The idea is to create 2 different datasets , one for red points:
dat1 <- do.call(rbind,Map(function(x,y)data.frame(x=x,y=seq(0,y)),df2$x,df2$y))
And another for the black points
dat2 <- do.call(rbind,Map(function(x,y)data.frame(x=x,y=c(0,y)),df2$x,df2$y))
Then the plot is just the juxtopsition of 2 layers of the same plot but with different datas:
library(ggplot2)
ggplot(data=dat1,aes(x=x,y=y)) +
geom_point(col="red") +
geom_point(data=dat2)
Yet another option, which is similar to #Gregor's, in that it's creating a new data vector.
d <- data.frame(x = c("a", "b", "c" ,"d"), y = c(3:6))
new_points <- mapply(seq, 0, d$y)
new <- data.frame(new = unlist(lapply(new_points, as.data.frame)),
x = rep(letters[1:4], d$y + 1),
group = 1)
d <- merge(d, new, by = "x")
d$group <- as.factor(ifelse(d$y == d$new|d$new == 0, 2, d$group))
ggplot(d, aes(x, new, color = group)) +
geom_point() +
scale_color_manual(values = c("red", "black")) +
theme(legend.position = "none")

How to plot on two Y axis based on X value in R?

Here is my question, I have a data like this
A B C D
a 24 1 2 3
b 26 2 3 1
c 25 3 1 2
Now I would like to plot A in a Y axis (0 to 30) and B~D in another Y axis (0 to 5) in one graph. Also, I want a, b, c row has a line to link them together (lets say a, b, c represents a mouse ID). Could anyone come up with ideas on how to do it? I prefer using R. Thanks in advance!
# create some data
data = as.data.frame(list(A = c(24,26,25),
B = c(1,2,3),
C = c(2,3,1),
D = c(3,1,2)))
# adjust your margins to allow room for your second axis
par(mar=c(5, 4, 4, 4) + 0.1)
# create your first plot
plot(1:3,data$A,pch = 19,ylab = "1st ylab",xlab="index")
# set par to new so you dont' overwrite your current plot
par(new=T)
# set axes = F, set your ylim and remove your labels
plot(1:3,data$B,ylim = c(0,5), pch = 19, col = 2,
xlab="", ylab="",axes = F)
# add your points
points(1:3,data$C,pch = 19,col = 3)
points(1:3,data$D, pch = 19,col = 4)
# set the placement for your axis and add text
axis(4, ylim=c(0,5))
mtext("2nd ylab",side=4,line=2.5)
I greatly prefer using ggplot2 for plotting. Sadly, ggplot2 does not support this for philosophical reasons.
I would like to propose an alternative which uses facets, i.e. subplots. Note that to be able to plot the data using ggplot2, we need to change the data structure. We do this using gather from the tidyr package. In addition, I use the programming style as defined in dplyr (which uses piping a lot):
library(ggplot2)
library(dplyr)
library(tidyr)
df = data.frame(A = c(24, 26, 25), B = 1:3, C = c(2, 3, 1), D = c(3, 1, 2))
plot_data = df %>% mutate(x_value = rownames(df)) %>% gather(variable, value, -x_value)
ggplot(plot_data) + geom_line(aes(x = x_value, y = value, group = variable)) +
facet_wrap(~ variable, scales = 'free_y')
Here, each subplot has it's own y-axis.

How do I plot the following in R?

I'm new to plotting in R so I ask for your help. Say I have the following matrix.
mat1 <- matrix(seq(1:6), 3)
dimnames(mat1)[[2]] <- c("x", "y")
dimnames(mat1)[[1]] <- c("a", "b", "c")
mat1
x y
a 1 4
b 2 5
c 3 6
I want to plot this, where the x-axis contains each rowname (a, b, c) and the y-axis is the value of each rowname (a = 1 and 4, b = 2 and 5, c = 3 and 6). Any help would be appreciated!
| o
| o x
| o x
| x
|_______
a b c
Here's one way using base graphics:
plot(c(1,3),range(mat1),type = "n",xaxt ="n")
points(1:3,mat1[,2])
points(1:3,mat1[,1],pch = "x")
axis(1,at = 1:3,labels = rownames(mat1))
Edited to include different plotting symbol
matplot() was designed for data in just this format:
matplot(y = mat1, pch = c(4,1), col = "black", xaxt ="n",
xlab = "x-axis", ylab = "y-axis")
axis(1, at = 1:nrow(mat1), labels = rownames(mat1)) ## Thanks, Joran
And finally, a lattice solution
library(lattice)
dfmat <- as.data.frame(mat1)
xyplot( x + y ~ factor(rownames(dfmat)), data=dfmat, pch=c(4,1), cex=2)
You could do it in base graphics, but if you're going to use R for much more than this I think it is worth getting to know the ggplot2 package. Note that ggplot2 only takes data frames - but then, it is often more useful to keep your data in data frames rather than matrices.
d <- as.data.frame(mat1) #convert to a data frame
d$cat <- rownames(d) #add the 'cat' column
dm <- melt(d, id.vars)
dm #look at dm to get an idea of what melt is doing
require(ggplot2)
ggplot(dm, aes(x=cat, y=value, shape=variable)) #define the data the plot will use, and the 'aesthetics' (i.e., how the data are mapped to visible space)
+ geom_point() #represent the data with points

smooth curve of scatter data frame data in R and add confedence interval

I have many data.frame and each one contains many columns. Say my first data.frame col1=a, col2=b,col3=c
I want to plot x-axis=b/a and y-axis=a. I managed to plot them (scatter plot)
plot (dataframe$b/dataframe$a, dataframe$a, xlim=...,ylim=..)
Now, I need to get the pattern for the scatter data ( I don't want linear regression as both of x and y are changing). I did use the command loess(..) and I was able to show the pattern.
lo_smooth<-loess(x,y, f=number, iter=number)
How I can add the confidence intervals (CI) to the graph? My goal is to check if two data.frame are within each other CI or not.
A solution that uses your attempt (Well done!) plus your clarification
Some dummy data
dppm <- data.frame(a = runif(100, 1, 100), b = runif(100,1, 100))
dppm_2 <- data.frame(a = runif(100, 1, 75), b = runif(100,1,75))
dppm_3 <- data.frame(a = runif(100, 1,50), b = runif(100,1,50))
Using reshape2 to melt these data into a single data frame
library(reshape2)
data_list <- list(dppm1 = dppm, dppm2 = dppm_2, dppm3 = dppm_3)
all_data <- melt(data_list, id.vars = c('a','b'))
This single data frame has a column L1 that is the identifier (the name of the list component in data_list.
head(all_data)
a b L1
## 1 83.202896 36.94026 dppm1
## 2 42.618987 11.23863 dppm1
## 3 29.505029 11.91742 dppm1
## 4 63.569487 59.07395 dppm1
## 5 94.499772 47.32779 dppm1
## 6 4.535389 64.11570 dppm1
We can then plot this combined data set and colour by this identifier.
We also set the fill for the smooth to the same identifier so that the CIs will be coloured in the same way.
ggplot(all_data,aes(x = b/a, y = a, colour = L1)) +
geom_point() +
stat_smooth(method = "loess", se = TRUE,level = 0.90, aes(fill = L1))+
coord_cartesian(ylim = c(0, 100))

Resources