Heat scatter plot with non-numerical output - r

Image of excel data set
I have a table in excel with 100 columns and 100 rows. The column starts at 0% and works up to 100%. Same for the row, starts at 0% and goes up to 100%. It is a 2-way sensitivity analysis, i.e. which drug would be optimal if x(variable in column)=10% and y(variable in row)=30%.
I have 100 by 100 table, with the name of four different drugs scattered across the table. I want to take this data into R and create a scatter plot, essentially a square with 10000 smaller squares. I then want R to colour each square based on the drug which is most optimal for that combination of X and Y.
I've attached an image of dummy data showing the same example in a 10 by 10 table.
Hope you can help!

You'll need to start by prepping the data -- reading it in, using something like the pivot_longer() function from the tidyverse package to make the columns into rows, and then likely doing some clean up on the percentages.
After that, the plot (using ggplot2) itself may be pretty straightforward. The geom_tile() function is the one that creates the squares.
library(tidyverse)
# Create test data
df <- expand_grid(x = 1:100, y = 1:100) %>%
mutate(drug = sample(LETTERS[1:4], size = 10000, replace = TRUE))
# Make the plot
df %>%
ggplot(aes(x, y, fill = drug)) +
geom_tile()

Related

Is there a way to replace "Corr" values with actual graphs in R?

Here's the dataset I'm using: https://www.dropbox.com/s/b3nv38jjo5dxcl6/nba_2013.csv?dl=0, it contains statistics for NBA players.
And I want to see how different columns correlate with each other, thus I want to draw pairwise scatterplots.
Here's my code:
library(GGally)
nba %>%
select(ast, fg, trb) %>%
ggpairs()
nba variable contains the whole dataset
And when I want to draw the pairwise scatterplot, I get something like this:
Generated Output
However, some of the graphs are replaced by "Corr" values, is there a way to replace these "Corr" values with actual graphs so that the output looks as follows:
Desired Output
here is an approach:
nba %>%
select(ast, fg, trb) %>%
ggpairs(upper = list(continuous = "points",
combo ="facethist", discrete = "facetbar", na = "na"))

Creating multiple density plots using only summary statistics (no raw data) in R

I work with a massive 4D nifti file (x - y - z - subject; MRI data) and due to the size I can't convert to a csv file and open in R. I would like to get a series of overlaying density plots (classic example here) one for each subject with the idea to just visualise that there is not much variance in density distributions across the sample.
I could however, extract summary statistics for each subject (mean, median, SD, range etc. of the variable of interest) and use these to create the density plots (at least for the variables that are normally distributed). Something like this would be fantastic but I am not sure how to do it for density plots.
Your help will be much appreciated.
So these really aren't density plots per se - they are plots of densties of normal distributions with given means and standard deviations.
That can be done in ggplot2, but you need to expand your table of subjects and summaries into grids of points and normal densities at those points.
Here's an example. First, make up some data, consisting of subject IDs and some simulated sample averages and sample standard deviations.
library(tidyverse)
set.seed(1)
foo <- data_frame(Subject = LETTERS[1:10], avg=runif(10, 10,20), stdev=runif(10,1,2))
Now, for each subject we need to obtain a suitable grid of "x" values along with the normal density (for that subject's avg and stdev) evaluated at those "x" values. I've chosen plus/minus 4 standard deviations. This can be done using do. But that produces a funny data frame with a column consisting of data frames. I use unnest to explode out the data frame.
bar <- foo %>%
group_by(Subject) %>%
do(densities=data_frame(x=seq(.$avg-4*.$stdev, .$avg+4*.$stdev, length.out = 50),
density=dnorm(x, .$avg, .$stdev))) %>%
unnest()
Have a look at bar to see what happened. Now we can use ggplot2 to put all these normal densities on the same plot. I'm guessing with lots of subjects you wouldn't want a legend for the plot.
bar %>%
ggplot(aes(x=x, y=density, color=Subject)) +
geom_line(show.legend = FALSE)

ggplot2: specifying different scales for rows in facet layout for bar plots

My data are visualized in the package ggplot2 via bar plots with several (~10) facets. I want first to split these facets in several rows. I can use function facet_grid() or facet_wrap() for this. In the minimal example data here I build 8 facets in two rows (4x2). However I need to adjust scales for different facets, namely: first row contains data on small scale, and in the second row values are bigger. So I need to have same scale for all data in the first row to compare them along the row, and another scale for the second row.
Here is the minimal example and possible solutions.
#loading necessary libraries and example data
library(dplyr)
library(tidyr)
library(ggplot2)
trial.facets<-read.csv(text="period,xx,yy
A,2,3
B,1.5,2.5
C,3.2,0.5
D,2.5,1.5
E,11,13
F,16,14
G,8,5
H,5,4")
#arranging data to long format with omission of the "period" variable
trial.facets.tidied<-trial.facets %>% gather(key=newvar,value=newvalue,-period)
And now plotting itself:
#First variant
ggplot(trial.facets.tidied,aes(x=newvar,y=newvalue,position="dodge"))+geom_bar(stat ="identity") +facet_grid(.~period)
#Second variant:
ggplot(trial.facets.tidied,aes(x=newvar,y=newvalue,position="dodge"))+geom_bar(stat ="identity") +facet_wrap(~period,nrow=2,scales="free")
The results for the first and second variants are as follows:
In both examples we have either free scales for all graphs, or fixed for all graphs. Meanwhile the first row (first 4 facets) needs to be scaled somewhat to 5, and the second row - to 15.
As a solution to use facet_grid() function I can add a fake variable "row" which specifies, to what row should the corresponding letter belong. The new dataset, trial.facets.row (three lines shown only) would look like as follows:
period,xx,yy,row
C,3.2,0.5,1
D,2.5,1.5,1
E,11,13,2
Then I can perform the same rearrangement into long format, omitting variables "period" and "row":
trial.facets.tidied.2<-trial.facets.row %>% gather(key=newvar,value=newvalue,-period,-row)
Then I arrange facets along variables "row" and "period" in the hope to use the option scales="free_y" to adjust scales only across rows:
ggplot(trial.facets.tidied.2,aes(x=newvar,y=newvalue,position="dodge"))+geom_bar(stat ="identity") +facet_grid(row~period,scales="free_y")
and - surprise: the problem with scales is solved, however, I get two groups of empty bars, and whole data is again stretched across a long strip:
All discovered manual pages and handbooks (usually using the mpg and mtcars dataset) do not consider such situation of such unwanted or dummy data
I used a combination of your first method (facet_wrap) & second method (leverage on dummy variable for different rows):
# create fake variable "row"
trial.facets.row <- trial.facets %>% mutate(row = ifelse(period %in% c("A", "B", "C", "D"), 1, 2))
# rearrange to long format
trial.facets.tidied.2<-trial.facets.row %>% gather(key=newvar,value=newvalue,-period,-row)
# specify the maximum height for each row
trial.facets.tidied.3<-trial.facets.tidied.2 %>%
group_by(row) %>%
mutate(max.height = max(newvalue)) %>%
ungroup()
ggplot(trial.facets.tidied.3,
aes(x=newvar, y=newvalue,position="dodge"))+
geom_bar(stat = "identity") +
geom_blank(aes(y=max.height)) + # add blank geom to force facets on the same row to the same height
facet_wrap(~period,nrow=2,scales="free")
Note: based on this reproducible example, I'm assuming that all your plots already share a common ymin at 0. If that's not the case, simply create another dummy variable for min.height & add another geom_blank to your ggplot.
Looking over SO I encountered a solution which might be a bit tricky - from here
The idea is to create a second fake dataset which would plot a single point at each facet. This point will be drawn in the position, corresponding to the highest desired value for y scale in every case. So heights of scales can be manually adjusted for each facet. Here is the solution for the dataset in question. We want y scale (maximum y value) 5 for the first row, and 17 for the second row. So create
df3=data.frame(newvar = rep("xx",8),
period = c("A","B","C","D","E","F","G","H"),
newvalue = c(5,5,5,5,17,17,17,17))
And now superimpose the new data on our graph using geom_point() .
ggplot(trial.facets.tidied,aes(x=newvar,y=newvalue,position="dodge"))+
geom_bar(stat ="identity") +
facet_wrap(~period,nrow=2,scales="free_y")+
geom_point(data=df3,aes(x=newvar,y=newvalue),alpha=1)
Here what we get:
Here I intentionally draw this extra point to make things clear. Next we need to make it invisible, which can be achieved by setting alpha=0 instead of 1 in the last command.
This approach draws an invisible line at the maximum for each row
#loading necessary libraries and example data
library(dplyr)
library(tidyr)
library(ggplot2)
trial.facets<-read.csv(text="period,xx,yy
A,2,3
B,1.5,2.5
C,3.2,0.5
D,2.5,1.5
E,11,13
F,16,14
G,8,5
H,5,4")
# define desired number of columns
n_col <- 4
#assign a row number - mmnsodulo number of colu
trial.facets$row <- seq(0, nrow(trial.facets)-1) %/% n_col
# determine the max by row, and round up to nearest multiple of 5
# join back to original
trial.facets.max <- trial.facets %>%
group_by(row) %>%
summarize(maxvalue = (1 + max(xx, yy) %/% 5) * 5 )
trial.facets <- trial.facets %>% inner_join(trial.facets.max)
# make long format carrying period, row and maxvalue
trial.facets.tidied<-trial.facets %>% gather(key=newvar,value=newvalue,-period,-row,-maxvalue)
# plot an invisible line at the max
ggplot(trial.facets.tidied,aes(x=newvar,y=newvalue,position="dodge"))+
geom_bar(stat ="identity") +
geom_hline(aes(yintercept=maxvalue), alpha = 0) +
facet_wrap(~period,ncol=n_col,scales="free")

plotting each column of a matrix individually in a single graph in R

I have a 10x10 matrix and I want to plot each column(in the form of lines) in the following way
1. There should be one y-axis which will cover the scale of all columns of matrix.
2. There should be single x-axis with 10 points(= the number of columns).
3. the first column of matrix should be plotted within the point-1 and point-2 of x-axis, the second column of matrix within the point 2 and point-3, third column within the point-3 and point-4 and so on....
I have seen already posts, but all are multiple plots which are not according to my requirements. Could you please help me that how this can be done in R
You could convert your data from wide to long format and then use a standard plotting utility like ggplot to appropriately group your data and position it:
# Build a sample matrix, dat
set.seed(144)
dat <- matrix(rnorm(100), nrow=10)
# Build a data frame, to.plot, where each element represents one value in the matrix
to.plot <- expand.grid(row=factor(seq(nrow(dat))), col=factor(seq(ncol(dat))))
to.plot$dat <- dat[cbind(to.plot$row, to.plot$col)]
to.plot$col <- as.factor(to.plot$col)
# Plot
library(ggplot2)
ggplot(to.plot, aes(x=as.numeric(col)+(row-1)/max(row), y=dat, group=col, col=col))
+ geom_line() + scale_x_continuous(breaks=1:10) + xlab("Column")
Here's how you do it with matplot.
matplot(y = myData,
,x = matrix(seq(prod(dim(myData)))/nrow(myData),
nrow=nrow(myData),byrow=F)
- 1/nrow(myData) + 1)
The trick is constructing the right matrix for the x values.

Easy way to view multiple Y variables against same X

I want to visualize many time series at once. I am new at R, and have spent about 6 hours searching the web and reading about how to tackle this relatively simple problem. My dataset has five time points arranged as rows, and 100 columns. I can easily plot any column against the time points with qplot(time, var2, geom="line"). But I want to learn how to do this for a flexible number of columns, and how to print 6 to 12 of the individual graphs on one page.
Here I learned about the multiplot function, got that to work in terms of layout.
What I am stuck on is how for get the list of variables into a FOR statement so I can have one statement to plot all the variables against the same five time points.
this is what I am playing with. It makes 9 plots, 3 columns wide, but I do not know how to get all my variables into the array for yvars?
for (i in 1:9) {
p1 = qplot(symbol,yvar, geom ="smooth", main = i))
plots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = plots, cols = 3)
Stupidly on my part right now it makes 9 identical plots. So how do I create the list so the above will cycle through all my columns and make those plots?
first melt all your data using the reshape2 package
datm <- melt(your.original.data.frame, id = "time")
Now plot it using facets:
qplot(time, value, data = datm, facets= variable ~ ., geom="point")
Let me know if this works. If you could, please upload your data, it would help tremendously.

Resources