Pairs scatter plot; one vs many [duplicate] - r

This question already has answers here:
Plot one numeric variable against n numeric variables in n plots
(4 answers)
Closed 5 years ago.
Is there a parsimonious way to create a pairs plot that only compares one variable to the many others? In other words, can I plot just one row or column of the standard pairs scatter plot matrix without using a loop?

Melt your data then use ggplot with facet.
library("ggplot2")
library("reshape2")
#dummy data
df <- data.frame(x=1:10,
a=runif(10),
b=runif(10),
c=runif(10))
#melt your data
df_melt <- melt(df,"x")
#scatterplot per group
ggplot(df_melt,aes(x,value)) +
geom_point() +
facet_grid(.~variable)

I'll round it out with a base plotting option (using df from #zx8754):
layout(matrix(seq(ncol(df)-1),nrow=1))
Map(function(x,y) plot(df[c(x,y)]), names(df[1]), names(df[-1]))
Although arguably this is still a loop using Map.

For the fun, with lattice (with #zx8754 "df_melt"):
library(lattice)
xyplot(value ~ x | variable, data = df_melt, layout = c(3,1),
between = list(x=1))

Related

How do I make my row names appear on my x axis? And the numbers on from my variables appear as the y axis?

I created a dataframe with countries as row names and percentages as obs. from the variables, but when making a histogram it seems that the percentages from the variables are occupying the x axis and the country names aren't even there. How do I make it so that the countrie's names are on the x axis and the variables on the y?
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
row.names(G08) <- G08$Country
G08[1] <- NULL
hist(G08$Anxiety.Disorders)
I use the melt() call to create one observation per row. Then, I use ggplot to produce the bar plot.
library(ggplot2)
library(reshape2)
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia-Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
G08melt <- melt(G08, "Country")
G08.bar <- ggplot(G08melt, aes(x = Country, y=value)) +
geom_bar(aes(fill=variable),stat="identity", position ="dodge") +
theme_bw()+
theme(axis.text.x = element_text(angle=-40, hjust=.1))
G08.bar
Looking at your question, I think you tried to do a grouped column diagram instead of a histogram. You can do the plot directly using the barplot function from the graphics package. But before that, you need to convert your dataframe into a matrix. I removed the first column from G08.
mat<-G08[,-1]
Now just simply use the barplot function on the transpose of the matrix mat and use the names parameter of barplot to write the names of the Countries on the x-axis:
barplot(t(mat),beside=T,col=c('red','blue','gold'),border=NA,names=G08$Country,cex.names=0.45,las=2)
par(new=T)
legend('topright',c("Anxiety","Depressive","Bipolar"),fill=c("red","blue","gold"),cex=0.5,title='Disorder types')
Suggestion:
For a little bit of more 'fresh air' in the graph, you can just set beside=F in barplot and get a stacked column diagram:

Sorting bars in a bar chart with ggplot2 [duplicate]

This question already has an answer here:
ggplot legends - change labels, order and title
(1 answer)
Closed 6 years ago.
First time asking here so forgive me if I'm not clear enough.
So far I have seen many replies to similar questions, which explain how to sort bars by some field of a data frame; but I've been not able to find how to sort them by the default stat "count" of geom_bar (which is obviously NOT a field of the data frame.)
For example, I run this code:
library(ggplot2)
Name <- c( 'Juan','Michael','Andrea','Charles','Jonás','Juan','Donata','Flavia' )
City <- c('Madrid','New York','Madrid','Liverpool','Madrid','Buenos Aires','Rome','Liverpool')
City.Id <- c(1,2,1,3,1,4,5,3)
df = data.frame( Name,City,City.Id )
a <- ggplot( df,aes( x = City, text=paste("City.Id=",City.Id)) ) +
geom_bar()
ggplotly(a)
And then I would like to visualize the resulting bars ordered by their height (=count.) Note that I must keep the "City.Id" info to show in the final plot. How can this be done?
Given that you're already using ggplot2, I'd suggest looking into what else the tidyverse can offer. Namely the forcats package for working with factors.
forcats has a nice function fct_infreq() which will (re)set the levels of a factor to be in the order of their frequency. If the data is a character vector not already a factor (like City is in your data) then it will first make it a factor, and then set the levels to be in frequency order.
Try this code:
# Load packages
library(ggplot2)
library(forcats)
# Create data
Name <- c( 'Juan','Michael','Andrea','Charles','Jonás','Juan','Donata','Flavia' )
City <- c('Madrid','New York','Madrid','Liverpool','Madrid','Buenos Aires','Rome','Liverpool')
City.Id <- c(1,2,1,3,1,4,5,3)
df = data.frame( Name,City,City.Id )
# Create plot
a <- ggplot(df, aes(x = fct_infreq(City), text=paste("City.Id=",City.Id)) ) +
geom_bar()
a
One could use reorder :
df$City <- reorder(df$City,df$City.Id,length)
and then plot with the code in the question.

How do I put multiple boxplots in the same graph in R?

Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)

Visualize summary-statistics with R

My dataset looks similar to the one described here( i have more variables=columns and more observations):
dat=cbind(var1=c(100,20,33,400),var2=c(1,0,1,1),var3=c(0,1,0,0))
Now I want to create a bargraph with R where on the x axis one see the names of all the variable, and on the y axis the mean of the respective variable.
As a second task it would be great to show not only the mean, also the standard deviation within the same plot.
It would be nice, solving this with gglopt or qplot.
Thanks
Using base R:
dat <- cbind(var1=c(1,0.20,0.33,4),var2=c(1,0,1,1),var3=c(0,1,0,0))
dat <- as.data.frame(dat) # get this into a data frame as early as possible
barplot(sapply(dat,mean))
Using ggplot
library(ggplot2)
library(reshape2) # for melt(...)
df <- melt(dat)
ggplot(df, aes(x=variable,y=value)) +
stat_summary(fun.y=mean,geom="bar",color="grey20",fill="lightgreen")+
stat_summary(fun.data="mean_sdl",mult=1)

Facet for continuous variables in ggplot2 [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
ggplot - facet by function output
ggplot2's facets option is great for showing multiple plots by factors, but I've had trouble learning to efficiently convert continuous variables to factors within it. With data like:
DF <- data.frame(WindDir=sample(0:180, 20, replace=T),
WindSpeed=sample(1:40, 20, replace=T),
Force=sample(1:40, 20, replace=T))
qplot(WindSpeed, Force, data=DF, facets=~cut(WindDir, seq(0,180,30)))
I get the error : At least one layer must contain all variables used for facetting
I would like to examine the relationship Force~WindSpeed by discrete 30 degree intervals, but it seems facet requires factors to be attached to the data frame being used (obviously I could do DF$DiscreteWindDir <- cut(...), but that seems unecessary). Is there a way to use facets while converting continuous variables to factors?
Making an example of how you can use transform to make an inline transformation:
qplot(WindSpeed, Force,
data = transform(DF,
fct = cut(WindDir, seq(0,180,3))),
facets=~fct)
You don't "pollute" data with the faceting variable, but it is in the data frame for ggplot to facet on (rather than being a function of columns in the facet specification).
This works just as well in the expanded syntax:
ggplot(transform(DF,
fct = cut(WindDir, seq(0,180,3))),
aes(WindSpeed, Force)) +
geom_point() +
facet_wrap(~fct)

Resources