How do I plot the following in R? - r

I'm new to plotting in R so I ask for your help. Say I have the following matrix.
mat1 <- matrix(seq(1:6), 3)
dimnames(mat1)[[2]] <- c("x", "y")
dimnames(mat1)[[1]] <- c("a", "b", "c")
mat1
x y
a 1 4
b 2 5
c 3 6
I want to plot this, where the x-axis contains each rowname (a, b, c) and the y-axis is the value of each rowname (a = 1 and 4, b = 2 and 5, c = 3 and 6). Any help would be appreciated!
| o
| o x
| o x
| x
|_______
a b c

Here's one way using base graphics:
plot(c(1,3),range(mat1),type = "n",xaxt ="n")
points(1:3,mat1[,2])
points(1:3,mat1[,1],pch = "x")
axis(1,at = 1:3,labels = rownames(mat1))
Edited to include different plotting symbol

matplot() was designed for data in just this format:
matplot(y = mat1, pch = c(4,1), col = "black", xaxt ="n",
xlab = "x-axis", ylab = "y-axis")
axis(1, at = 1:nrow(mat1), labels = rownames(mat1)) ## Thanks, Joran

And finally, a lattice solution
library(lattice)
dfmat <- as.data.frame(mat1)
xyplot( x + y ~ factor(rownames(dfmat)), data=dfmat, pch=c(4,1), cex=2)

You could do it in base graphics, but if you're going to use R for much more than this I think it is worth getting to know the ggplot2 package. Note that ggplot2 only takes data frames - but then, it is often more useful to keep your data in data frames rather than matrices.
d <- as.data.frame(mat1) #convert to a data frame
d$cat <- rownames(d) #add the 'cat' column
dm <- melt(d, id.vars)
dm #look at dm to get an idea of what melt is doing
require(ggplot2)
ggplot(dm, aes(x=cat, y=value, shape=variable)) #define the data the plot will use, and the 'aesthetics' (i.e., how the data are mapped to visible space)
+ geom_point() #represent the data with points

Related

R: How to get a scatter plot from matrix data with discrete x axis

I'm pretty new at R and coding so I don't know how to explain it well on this site but I couldn't find a better forum to ask.
Basically I have a 6x6 matrix with each row being a discrete gene and each column being a sample.
I want the genes as the x-axis and the y-axis being the values of the samples, so that each gene will have its 6 samples above at their respective value.
I have this matrix in Excel and when I highlight it and plot it it gives me exactly what I want.
But trying to reduplicate it in R gives me a giant lattice plot at best.
I've tried boxplot(), scatterchart(), plot(), and ggplot().
I'm assuming I have to alter my matrix but I don't know how.
this may help:
library(tidyverse)
gene <- c("a", "b", "c", "d", "e", "f")
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,-6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c(4,-5,6,7,8,9)
x5 <- c(9,8,7,6,5,4)
x6 <- c(5,4,3,2,-1,0)
df <- data.frame(gene, x1, x2, x3, x4, x5, x6) #creates data.frame
as_tibble(df) # convenient way to check data.frame values and column format types
df <- df %>% gather(sample, observation, 2:7) # here's the conversion to long format
as_tibble(df) #watch df change
#example plots
p1 <- ggplot(df, aes(x = gene, y = observation, color = sample)) + geom_point()
p1
p2 <- ggplot(df, aes(x = gene, y = observation, group = sample, color = sample)) +
geom_line()
p2
p3 <- p2 + geom_point()
p3
This is very easy to solve - if your matrix is 6x6 with one gene per row and one observation per column (thus six observations per gene) you first need to make it long format (36 rows) - with such a simple format this can be done using unlist - and then plotting that against a vector of numbers for representing the genes:
# Here I make some dummy data - a 6x6 matrix of random numbers:
df1 <- matrix(rnorm(36,0,1), ncol = 6)
# To help show which way the data unlists, and make the
# genes different, I add 4 to gene 1:
df1[1,] <- df1[1,] + 4
#### TL;DR - HERE IS THE SOULTION ####
# Then plot it, using rep to make the x-axis data vector
plot(x = rep(1:6, times = 6), y = unlist(df1))
To improve the readability add axis labels:
# With axis labels
plot(x = rep(1:6, times = 6), y = unlist(df1),
xlab = 'Gene', ylab = 'Value')
You could also used ggplot with the geom_point aesthetic or geom_jitter - e.g:
ggplot() +
geom_jitter(mapping = aes(x = rep(1:6, times = 6), y = as.numeric(unlist(data.frame(df1)))))
Note that you can also create a "jitter" effect in base R using rnorm() on the x values, tweaking the amount of jittering with the last argument of the rnorm() function:
plot(x = rep(1:6, times = 6) + rnorm(36, 0, 0.05), y = unlist(df1), xlab = 'Gene', ylab = 'Value')

give different pch to names in scatter plot R?

I have these:
x=c(2,1,5,2) ; y=c(6,11,7,3)
x1=c(7,6,7,3) ; y1=c(3,9,4,3)
names(y1) = c("B", "C","A","D"); names(x1) = c("A", "B","C","D")
names(y) = c("C", "B","A","D");names(x) = c("D", "A","B","D")
plot(x1,y1,col="green")
The problem here is that it takes first value of x1 (7) and first value of y1 (3) and plot them. Which means “A” and “B”. I would like the correspondence to be A from x1 (7) should be plot with A from y1 (4).
Also I want to give different pch for each letter and plot the legend (in the plot all are dots (circles)).
Any hnts on this?
I'd recommend storing your data in data frames, not separate vectors. In this case, using data frames makes it easy to merge your x and y data so that they line up by name:
dx = data.frame(name = names(x1), x1 = x1)
dy = data.frame(name = names(y1), y1 = y1)
d = merge(dx, dy)
d
# name x1 y1
# 1 A 7 4
# 2 B 6 3
# 3 C 7 9
# 4 D 3 3
Then plotting works pretty easily, again using the data frame:
with(d, plot(x1,y1,col="green", pch = as.integer(name)))
I'll leave adding the legend to you - just search for "how to add a legend to a plot in R", or look at ?legend.
As a side-note, ggplot2 is very popular for plotting. It automatically adds legends, like this:
library(ggplot2)
ggplot(d, aes(x = x1, y = y1, shape = name)) +
geom_point(color = "green") +
theme_bw()

creating histogram bins in r

I have this code.
a = c("a", 1)
b = c("b",2)
c = c('c',3)
d = c('d',4)
e = c('e',5)
z = data.frame(a,b,c,d,e)
hist = hist(as.numeric(z[2,]))
I am trying to have a histogram such that the bins would be a,b,c,d,e
and the freq values would be 1,2,3,4,5.
However, it gives me an empty screen(no bins at all for histogram model)
You are plotting the factor levels of each column for row 2, which is in this case always 1.
When creating the dataframe you add stringsAsFactors=FALSE to avoid converting the numbers to factors. This should work:
z = data.frame(a,b,c,d,e,stringsAsFactors=FALSE)
hist(as.numeric(z[2,]))
Perhaps this would work for you: it creates a data frame with the x elements being the letters a through 'e', and the y elements being the numbers 1 through 5. It then renders a histogram and tells ggplot not to perform any binning.
library(ggplot2)
tmp <- data.frame(x = letters[1:5], y = 1:5)
ggplot(tmp, aes(x = x, y = y)) + geom_histogram(stat = "identity")

R plot 2D surface of a matrix of numbers

I am currently trying, given a n*p matrix of numbers, to plot a graph with n*p squares, each square having a colour depending of the number in the matrix.
The matrix is defined as follow:
ll <- list(c(1,3,4,3,6,5,8),c(1,1,4,5,7,6,8),c(1,3,1,1,3,4,8),c(2,1,1,2,1,3,5))
mm <- do.call(rbind,ll)
In a very general way, I would like to define colors for group of numbers.
For example:
Yellow for the group {1,2}
Orange for the group {3,4,5}
Red for the gorup of numbers {6,7,8}
And then "plot" the matrix. Like the colorfull matplotlib picture on this link:
http://activeintelligence.org/blog/archive/matplotlib-sparse-matrix-plot/
I really have no clue how to do it, and any point of view would be greatly appreciated!
cc <- mm # make copy to modify
cc[] <- findInterval(cc, c(0, 2.5, 5.5, 8.5 ) ) # values 1:3
cc
image(seq(dim(cc)[1]), seq(dim(cc)[2]), cc, col=c("yellow","orange","red"))
The values in the cc -matrix will pull from the color vector.
I suppose that the position of the points are defined by the n and p ranks. You could handle this with ggplot2 and reshape2.
ll <- list(c(1,3,4,3,6,5,8),c(1,1,4,5,7,6,8),c(1,3,1,1,3,4,8),c(2,1,1,2,1,3,5))
mm <- do.call(rbind,ll)
rownames(mm) = 1:nrow(mm)
colnames(mm) = 1:ncol(mm)
library(reshape2)
library(ggplot2)
mm_long = melt(mm)
colnames(mm_long) = c("x", "y", "group")
mm_long$colour_group = NA
mm_long$colour_group[mm_long$group %in% c(1,2)] = 1
mm_long$colour_group[mm_long$group %in% c(3,4,5)] = 2
mm_long$colour_group[mm_long$group %in% c(6,7,8)] = 3
mm_long$group = factor(mm_long$group)
mm_long$colour_group = factor(mm_long$colour_group)
ggplot(mm_long, aes(x=x, y=y)) +
geom_point(aes(colour=colour_group), shape=15, size=10) +
scale_colour_manual(values = c("yellow","orange", "red"))
Basically following suggestions from #MrFlick & #BondedDust:
cols=c(rep("yellow", 2), rep("orange", 3), rep("red", 3))
image(1:ncol(mm), 1:nrow(mm), t(mm), col=cols, breaks=c(0:length(cols))+0.5, xlab="", ylab="")
or
heatmap(mm, col=cols, breaks=c(0:length(cols))+0.5, Colv=NA, Rowv=NA, scale="none")

smooth curve of scatter data frame data in R and add confedence interval

I have many data.frame and each one contains many columns. Say my first data.frame col1=a, col2=b,col3=c
I want to plot x-axis=b/a and y-axis=a. I managed to plot them (scatter plot)
plot (dataframe$b/dataframe$a, dataframe$a, xlim=...,ylim=..)
Now, I need to get the pattern for the scatter data ( I don't want linear regression as both of x and y are changing). I did use the command loess(..) and I was able to show the pattern.
lo_smooth<-loess(x,y, f=number, iter=number)
How I can add the confidence intervals (CI) to the graph? My goal is to check if two data.frame are within each other CI or not.
A solution that uses your attempt (Well done!) plus your clarification
Some dummy data
dppm <- data.frame(a = runif(100, 1, 100), b = runif(100,1, 100))
dppm_2 <- data.frame(a = runif(100, 1, 75), b = runif(100,1,75))
dppm_3 <- data.frame(a = runif(100, 1,50), b = runif(100,1,50))
Using reshape2 to melt these data into a single data frame
library(reshape2)
data_list <- list(dppm1 = dppm, dppm2 = dppm_2, dppm3 = dppm_3)
all_data <- melt(data_list, id.vars = c('a','b'))
This single data frame has a column L1 that is the identifier (the name of the list component in data_list.
head(all_data)
a b L1
## 1 83.202896 36.94026 dppm1
## 2 42.618987 11.23863 dppm1
## 3 29.505029 11.91742 dppm1
## 4 63.569487 59.07395 dppm1
## 5 94.499772 47.32779 dppm1
## 6 4.535389 64.11570 dppm1
We can then plot this combined data set and colour by this identifier.
We also set the fill for the smooth to the same identifier so that the CIs will be coloured in the same way.
ggplot(all_data,aes(x = b/a, y = a, colour = L1)) +
geom_point() +
stat_smooth(method = "loess", se = TRUE,level = 0.90, aes(fill = L1))+
coord_cartesian(ylim = c(0, 100))

Resources