Visualize summary-statistics with R - r
My dataset looks similar to the one described here( i have more variables=columns and more observations):
dat=cbind(var1=c(100,20,33,400),var2=c(1,0,1,1),var3=c(0,1,0,0))
Now I want to create a bargraph with R where on the x axis one see the names of all the variable, and on the y axis the mean of the respective variable.
As a second task it would be great to show not only the mean, also the standard deviation within the same plot.
It would be nice, solving this with gglopt or qplot.
Thanks
Using base R:
dat <- cbind(var1=c(1,0.20,0.33,4),var2=c(1,0,1,1),var3=c(0,1,0,0))
dat <- as.data.frame(dat) # get this into a data frame as early as possible
barplot(sapply(dat,mean))
Using ggplot
library(ggplot2)
library(reshape2) # for melt(...)
df <- melt(dat)
ggplot(df, aes(x=variable,y=value)) +
stat_summary(fun.y=mean,geom="bar",color="grey20",fill="lightgreen")+
stat_summary(fun.data="mean_sdl",mult=1)
Related
How do I make my row names appear on my x axis? And the numbers on from my variables appear as the y axis?
I created a dataframe with countries as row names and percentages as obs. from the variables, but when making a histogram it seems that the percentages from the variables are occupying the x axis and the country names aren't even there. How do I make it so that the countrie's names are on the x axis and the variables on the y? Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom') Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43) Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35) Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05) G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder) row.names(G08) <- G08$Country G08[1] <- NULL hist(G08$Anxiety.Disorders)
I use the melt() call to create one observation per row. Then, I use ggplot to produce the bar plot. library(ggplot2) library(reshape2) Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia-Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom') Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43) Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35) Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05) G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder) G08melt <- melt(G08, "Country") G08.bar <- ggplot(G08melt, aes(x = Country, y=value)) + geom_bar(aes(fill=variable),stat="identity", position ="dodge") + theme_bw()+ theme(axis.text.x = element_text(angle=-40, hjust=.1)) G08.bar
Looking at your question, I think you tried to do a grouped column diagram instead of a histogram. You can do the plot directly using the barplot function from the graphics package. But before that, you need to convert your dataframe into a matrix. I removed the first column from G08. mat<-G08[,-1] Now just simply use the barplot function on the transpose of the matrix mat and use the names parameter of barplot to write the names of the Countries on the x-axis: barplot(t(mat),beside=T,col=c('red','blue','gold'),border=NA,names=G08$Country,cex.names=0.45,las=2) par(new=T) legend('topright',c("Anxiety","Depressive","Bipolar"),fill=c("red","blue","gold"),cex=0.5,title='Disorder types') Suggestion: For a little bit of more 'fresh air' in the graph, you can just set beside=F in barplot and get a stacked column diagram:
Graphing 3 axis accelerometer data in R
I have data from a 3 axis accelerometer that I would like to create a graph of in R. The data is currently in a CSV file that looks like this. time,X_value,Y_value,Z_value 0.000,0.00000,0.00000,0.00000 0.014,-0.76674,3.02088,10.41717 0.076,-0.64344,3.08493,8.82323 0.132,-0.68893,3.01071,8.82862 0.193,0.48483,2.40438,9.73482 0.255,-0.71168,2.07637,8.94174 0.312,-0.32920,0.79188,10.77690 0.389,-0.54468,2.08236,9.77732 0.434,-1.53648,-0.00898,11.77887 I want to show the change in all three over time in one graph. Any suggestions on how I might do that?
You'll want to read up on plotting in R. This is a fairly common analysis. R: plot multiple lines in one graph https://stats.stackexchange.com/questions/7439/how-to-change-data-between-wide-and-long-formats-in-r You will want to melt the data frame and then plot it, grouped by your factors (x axis, y axis, z axis). library(ggplot2) library(reshape2) t <- 1:10 x <- rnorm(10) y <- rnorm(10) z <- rnorm(10) df <- data.frame(t,x,y,z) dfm <- melt(df, id.vars = "t") ggplot(dfm, aes(x=t, y=value)) + geom_line(aes(color=variable))
How do I put multiple boxplots in the same graph in R?
Sorry I don't have example code for this question. All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6). If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks! Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column. Here is an example: set.seed(1) df <- data.frame( value = runif(810,0,6), group = 1:9 ) df library(ggplot2) ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6) The ylim(0,6) sets the y-axis to be between 0 and 6 If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format ## Some sample data dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100)) ## Reshape data wide -> long library(reshape2) long <- melt(dat) plot(value ~ variable, data=long)
Correlation matrix plot with ggplot2
I want to create a correlation matrix plot, i.e. a plot where each variable is plotted in a scatterplot against each other variable like with pairs() or splom(). I want to do this with ggplot2. See here for examples. The link mentions some code someone wrote for doing this in ggplot2, however, it is outdated and no longer works (even after you swap out the deprecated parts). One could do this with a loop in a loop and then multiplot(), but there must be a better way. I tried melting the dataset to long, and copying the value and variable variables and then using facets. This almost gives you something correct. d = data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100), x4=rnorm(100), x5=rnorm(100)) library(reshape2) d = melt(d) d$value2 = d$value d$variable2 = d$variable library(ggplot2) ggplot(data=d, aes(x=value, y=value2)) + geom_point() + facet_grid(variable ~ variable2) This gets the general structure right, but only works for the plotting each variable against itself. Is there some more clever way of doing this without resorting to 2 loops?
library(GGally) set.seed(42) d = data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100), x4=rnorm(100), x5=rnorm(100)) # estimated density in diagonal ggpairs(d) # blank ggpairs(d, diag = list("continuous"="blank")
Using PerformanceAnalytics library : library("PerformanceAnalytics") chart.Correlation(df, histogram = T, pch= 19)
Multiple boxplots in ggplot2
I have three vectors for each I would like to make side-to-side boxplots in ggplot2. Each vector contains observations from three separate samples so ideally I would like to identify each boxplot. I know of course how to accomplish that with the simple boxplot command but in ggplot2, it seems to be more complicated, at least for a newbie such as myself. Could you please tell me whether there is a painless way to proceed here? Thank you.
library(ggplot2) library(reshape2) # re-create your samples via runif (though I should have set.seed first) obs_1 <- runif(100) obs_2 <- runif(100) obs_3 <- runif(100) # you need a data frame, but you can do it on the fly # this makes 3 columns from each of your samples # then uses melt to do wide to long (which is what geom_boxplot needs gg <- ggplot(melt(data.frame(obs_1, obs_2, obs_3)), aes(x=variable, y=value)) gg <- gg + geom_boxplot() gg You should really make a proper data frame, do the melt and rename column as needed. This was just to show a quick example.