Ploting a matrix using ggplot2 in R - r

I want to plot using ggplot2 the distribution of 5 variables corresponding to a matrix's column names
a <- matrix(runif(1:25),ncol=5,nrow=1)
colnames(a) <- c("a","b","c","d","e")
rownames(a) <- c("count")
I tried:
ggplot(data=melt(a),aes(x=colnames(a),y=a[,1]))+ geom_point()
However, this gives a result as if all columns had the same y value
Note: i'm using the reshape package for the melt() function

All columns look like they have the same y-value because you are only specifying 1 number in the y= statement. You are saying y=a[,1] which if you type a[,1] into your command window you will find is 0.556 (the number that everything is appearing at). I think this is what you want:
library(reshape2)
library(ggplot2)
a_melt<- melt(a)
ggplot(data=a_melt,aes(x=unique(Var2),y=value))+ geom_point()
Note that I saved a new dataset called a_melt so that things were easier to reference. Also since the data was melted,it is cleaner if we define our x-values to be the Var2 column of a_meltrather than the columns of a.

Related

Print histograms including variable name for all variables in R

I'm trying to generate a simple histogram for every variable in my dataframe, which I can do using sapply below. But, how can I include the name of the variable in either the title or the x-axis so I know which one I'm looking at? (I have about 20 variables.)
Here is my current code:
x = # initialize dataframe
sapply(x, hist)
Here's a way to modify your existing approach to include column name as the title of each histogram, using the iris dataset as an example:
# loop over column *names* instead of actual columns
sapply(names(iris), function(cname){
# (make sure we only plot the numeric columns)
if (is.numeric(iris[[cname]]))
# use the `main` param to put column name as plot title
print(hist(iris[[cname]], main=cname))
})
After you run that, you'll be able to flip through the plots with the arrows in the viewer pane (assuming you're using R Studio).
Here's an example output:
p.s. check out grid::grob(), gridExtra::grid.arrange(), and related functions if you want to arrange the histograms onto a single plot window and save it to a single file.
How about this? Assuming you have wide data you can transform it to long format with gather. Than a ggplot solution with geom_histogram and facet_wrap:
library(tidyverse)
# make wide data (20 columns)
df <- matrix(rnorm(1000), ncol = 20)
df <- as.data.frame(df)
colnames(df) <- LETTERS[1:20]
# transform to long format (2 columns)
df <- gather(df, key = "name", value = "value")
# plot histigrams per name
ggplot(df) +
geom_histogram(aes(value)) +
facet_wrap(~name, ncol = 5)

using an apply function with ggplot2 to create bar plots for more than one variable in a data.frame

Is there a way to use an apply function in R in order to create barplots with ggplot2?
Say, we have a dataframe containing only factor variables out of which one is boolean. In my case I have a dateframe with +40 variables. Can one plot all the variables against the boolean one with a single line of code?
data("diamonds")
factors <- sapply(diamonds, function(x) is.factor(x))
factors_only <- diamonds[,factors]
factors_only$binary <- sample(c(1, 0), length(factors_only), replace=TRUE)
factors_only$binary <- as.factor(factors_only$binary)
But I want to create barplots like this one:
qplot(factors_only$color, data=factors_only, geom="bar", fill=factors_only$binary)
This does not work:
sapply(factors_only,function(x) qplot(x, data=factors_only, geom="bar", fill=binary))
Please advise
You can use lapply to run through the variable names and then use get to pull up the current variable.
temp <- lapply(names(factors_only),
function(x) qplot(get(x), data=factors_only, geom="bar", fill=binary, xlab=x))
To print a list item,
print(temp[[1]])
will produce
A nice feature of running through the variable names is that you can use these to dynamically name the labels in the figure.

How to plot three histograms adjacent to each other for 3 given series?

I have one data table (tab separated) as follows.
data.csv
A B C
0.0509259 0.0634984 0.0334984
0.12037 0.0599042 0.0299042
0.00925926 0.0109824 0.0599042
0.990741 0.976837 0.059442
0.99537 0.997404 0.0549042
0.99537 0.997404 0.0529042
0.00462963 0.0109824 0.0699042
0.986111 0.975839 0.0999042
0.12963 0.0758786 0.0899042
0.00462963 0.00419329 0.0499042
0.865741 0.876597 0.0519042
0.865741 0.870807 0.0539042
How can i plot multiseries data in one histogram as explained below.
data<-read.table("C:/Users/User/Desktop/data.csv",header=T)
hist(data$A)
hist(data$B)
hist(data$C)
how can i merge these three histogram together in a way that i can see three diffrernt series together in different colors in one plot?
Sample output:
Here are two ways. In base R:
barplot(t(as.matrix(data)),beside=TRUE,
col=c("red","green","blue"),names=rownames(data))
Using ggplot.
library(ggplot2)
library(reshape2)
gg <- melt(data.frame(id=rownames(data),data),id="id")
gg$id <- factor(gg$id,levels=unique(gg$id))
ggplot(gg,aes(x=id,y=value,fill=variable))+geom_bar(stat="identity",position="dodge")
The ggplot approach, which ultimately is much more flexible, is also more work. You have to add a column based on the row names (or a sequence 1:nrow(data), if you prefer), and convert the data from wide to long format (as in the other answer). But you're still not done: ggplot converts the id's to a factor and then orders them alphabetically, so the groups are, e,g, 1, 10, 11, 12, 2, 3, ... You don't want that, so you have to reorder the factor first, and then plot.
If you're ok with ggplot2, you can do it as follows:
library(reshape2)
library(ggplot2)
1: Rearrange the dataframe to change A,B,C, to factors:
dat3 <- melt(dat2, varnames = c('A','B','C'))
2: Plot using the factors: (
qplot(data=dat3, value, fill=variable, position = 'dodge')
Can't say too many good things about ggplot2

Convert CSV data to a matrix to a heatmap in R

Doing some visualizations for a paper I'm writing and am stuck in trasfering data from a CSV-loaded table to a matrix (to be able to plot a heatmap from it afterwards).
I'm doing this:
dta.tesiscsv<- read.csv("dtatesis.csv", header=TRUE)
to load a data sample that looks like this:
Col,Row,Kf
1,1,100
1,2,97.14285714
2,1,100
...,...,...
but am kind of lost on the next step (creating an empty matrix and transfering data from the table to it based on a formula):
X<- matrix(nrow= 48, ncol=12)
X[dta.test[,c(1:2)]] <- dta.test$Kf
You can use acast from reshape2 package to get the data in the matrix form you desire.
require(reshape2)
acast(dta.test, Row ~ Col, value.var = "Kf")
This'll fill missing values with NA. If you want to fill them, for example, with 0 instead, then,
acast(dta.test, Row ~ Col, value.var = "Kf", fill = 0)
would accomplish that. You can wrap this around with heatmap(.) to get the heatmap.
How about (which should make sense if there is one row per Col/Row-combination):
dta.tesiscsv <- read.table(text="Col,Row,Kf
1,1,100
1,2,97.14285714
2,1,100",h=T,sep=",")
X <- tapply(dta.tesiscsv[,3],dta.tesiscsv[,2:1],head,1)
heatmap(X)
You're real close. To use matrix indexing, the indices have to be a matrix, not a data.frame.
X[as.matrix(dta.test[,c(1:2)])] <- dta.test$Kf

Dealing with Factors

I have an object that is a factor with a number of levels:
x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
In the shortest manner possible, I would like to use ggplot to create a bar graph of the % frequency of each factor.
I keep finding that there are a lot of little annoyances that get in between summarizing and plotting when I have a factor. Here are a few examples of what I mean by annoyances:
as.data.frame(summary(x))
You have to rename the columns and the 1st column values are now rownames in the last example. In the next, you have to cheat to use cast and then you have to relabel because it defaults to a colname of "(all)".
as.data.frame(q1$com.preferred)
dat$value <- 1
colnames(dat) <- c("pref", "value")
cast(dat, pref ~.)
colnames(dat)[2] <- "value"
Here's another example, somewhat better, but less than ideal.
data.frame(x=names(summary(x)),y=summary(x))
If there's a quick way to do this within ggplot, I'd be more than interested to see it. So far, my biggest problem is changing counts to frequencies.
Following up on #dirk and #joran's suggestions (#joran really gets credit. I thought as.data.frame(), and not just data.frame(), was necessary, but it turns out #joran's right ...
x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
t1 <- table(x)
t2 <- data.frame(t1)
t3 <- data.frame(prop.table(t1))
qplot(x,Freq,data=t2,geom="bar",ylab="Count")
qplot(x,Freq,data=t3,geom="bar",ylab="Proportion")
edit: shortened slightly (incorporated #Chase's prop.table too)
You can have qplot do the summary work for you without the outside computations, try any of the following:
x <- rep(c('A','B','C'), c(20,10,15))
qplot(x, weight=1/length(x), ylab='Proportion')
qplot(x, weight=100/length(x), ylab='Percent')
qplot(x, weight=1/length(x), ylab='Percent') + scale_y_continuous(formatter='percent')
ggplot(data.frame(x=x),aes(x, weight=1/length(x))) + geom_bar() + ylab('Proportion')
There is probably a way to do this using transformations inside the ggplot functions as well, but I have not found it yet.
Did you try the ggplot-equivalent of just calling barplot(table(x)/length(x)) ? I.e.
R> x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
R> table(x)
x
A B C
20 10 15
which we turn into percentages easily
R> table(x)/length(x)*100
x
A B C
44.4444 22.2222 33.3333
and can then plot
R> barplot(table(x)/length(x)*100)
just fine:

Resources