Vertical data groups from table in R - r

I have data in table in vertical groups of 7 columns which I need to plot against each other to find relationships. How do I group them vertically in groups of 7 and what plots would be helpful for nice graphics? I looked at "car" which seems appropriate.
A_0,A_1,A_2,A_3,A_4,A_5,A_6,B_0,B_1,B_2,B_3,B_4,B_5,B_6,C_0,C_1,C_2,C_3,C_4,C_5,C_6,D_0,D_1,D_2,D_3,D_4,D_5,D_6,D_0,D_1,D_2,D_3,D_4,D_5,D_6,E_0,E_1,E_2,E_3,E_4,E_5,E_6
0,2,0,0,1,1,1,0,1,0,0,1,1,1,0,0.000003,0,0,0.000004,0.000004,0.000004,0,1,0,0,1,1,1,0,2,0,0,1,1,1,0,0.000003,0,0,0.000002,0.000002,0.000002
2,1,0,0,0,0,4,2,1,0,0,0,0,4,0.000007,0.000003,0,0,0,0,0.000012,1,1,0,0,0,0,1,2,1,0,0,0,0,4,0.000003,0.000002,0,0,0,0,0.000006
1,0,0,0,0,4,1,1,0,0,0,0,4,1,0.000003,0,0,0,0,0.000012,0.000003,1,0,0,0,0,1,1,1,0,0,0,0,4,1,0.000002,0,0,0,0,0.000006,0.000002
1,0,0,1,0,0,1,1,0,0,1,0,0,1,0.000003,0,0,0.000003,0,0,0.000001,1,0,0,1,0,0,1,1,0,0,1,0,0,1,0.000002,0,0,0.000001,0,0,0
0,1,0,0,1,2,3,0,1,0,0,1,2,3,0,0.000003,0,0,0.000001,0.000002,0.000003,0,1,0,0,1,1,1,0,1,0,0,1,2,3,0,0.000001,0,0,0,0.000001,0.000002
1,0,0,1,2,3,0,1,0,0,1,2,3,0,0.000003,0,0,0.000001,0.000002,0.000003,0,1,0,0,1,1,1,0,1,0,0,1,2,3,0,0.000001,0,0,0,0.000001,0.000002,0
1,0,0,0,2,0,2,1,0,0,0,2,0,2,0.000001,0,0,0,0.000002,0,0.000002,1,0,0,0,1,0,1,1,0,0,0,2,0,2,0,0,0,0,0.000001,0,0.000001
0,2,0,0,0,1,1,0,2,0,0,0,1,1,0,0.000002,0,0,0,0.000001,0.000001,0,1,0,0,0,1,1,0,2,0,0,0,1,1,0,0.000001,0,0,0,0.000001,0
2,0,0,0,1,1,2,2,0,0,0,1,1,2,0.000002,0,0,0,0.000001,0.000001,0.000002,1,0,0,0,1,1,1,2,0,0,0,1,1,2,0.000001,0,0,0,0.000001,0,0.000001
0,1,1,2,2,0,3,0,1,1,2,2,0,2,0,0.000001,0.000001,0.000002,0.000002,0,0.000003,0,1,1,1,1,0,1,0,1,1,2,2,0,3,0,0.000001,0,0.000001,0.000001,0,0.000001
1,1,2,2,0,3,1,1,1,2,2,0,2,1,0.000001,0.000001,0.000002,0.000002,0,0.000003,0.000001,1,1,1,1,0,1,1,1,1,2,2,0,3,1,0.000001,0,0.000001,0.000001,0,0.000001,0.000001
0,1,1,1,2,2,0,0,1,1,1,1,2,0,0,0.000001,0.000001,0.000001,0.000002,0.000002,0,0,1,1,1,1,1,0,0,1,1,1,2,2,0,0,0.000001,0.000001,0,0.000001,0.000001,0
0,1,1,0,0,0,12,0,1,1,0,0,0,6,0,0.000001,0.000001,0,0,0,0.000007,0,1,1,0,0,0,1,0,1,1,0,0,0,12,0,0,0,0,0,0,0.000006
1,1,0,0,0,12,34,1,1,0,0,0,6,15,0.000001,0.000001,0,0,0,0.000007,0.000017,1,1,0,0,0,1,1,1,1,0,0,0,12,34,0,0,0,0,0,0.000006,0.000015
1,0,0,0,12,34,1,1,0,0,0,6,15,0,0.000001,0,0,0,0.000007,0.000017,0.000001,1,0,0,0,1,1,1,1,0,0,0,12,34,1,0,0,0,0,0.000006,0.000015,0
1,0,13,4,11,9,10,0,0,1,2,1,5,7,0.000001,0,0.000002,0.000002,0.000002,0.000005,0.000008,1,0,1,1,1,1,1,1,0,13,4,11,9,10,0,0,0.000006,0.000002,0.000005,0.000004,0.000005
0,13,4,11,9,10,18,0,1,2,1,5,7,4,0,0.000002,0.000002,0.000002,0.000005,0.000008,0.000006,0,1,1,1,1,1,1,0,13,4,11,9,10,18,0,0.000006,0.000002,0.000005,0.000004,0.000005,0.000008
13,4,11,9,10,18,39,1,2,1,5,7,4,9,0.000002,0.000002,0.000002,0.000005,0.000008,0.000006,0.000011,1,1,1,1,1,1,1,13,4,11,9,10,18,39,0.000006,0.000002,0.000005,0.000004,0.000005,0.000008,0.000017
4,11,9,10,18,39,8,2,1,5,7,4,9,4,0.000002,0.000002,0.000005,0.000008,0.000006,0.000011,0.000005,1,1,1,1,1,1,1,4,11,9,10,18,39,8,0.000002,0.000005,0.000004,0.000005,0.000008,0.000017,0.000004
11,9,10,18,39,8,16,1,5,7,4,9,4,5,0.000002,0.000005,0.000008,0.000006,0.000011,0.000005,0.000006,1,1,1,1,1,1,1,11,9,10,18,39,8,16,0.000005,0.000004,0.000005,0.000008,0.000017,0.000004,0.000007
9,10,18,39,8,16,0,5,7,4,9,4,5,0,0.000005,0.000008,0.000006,0.000011,0.000005,0.000006,0,1,1,1,1,1,1,0,9,10,18,39,8,16,0,0.000004,0.000005,0.000008,0.000017,0.000004,0.000007,0
1,0,1,6,0,0,2,1,0,1,3,0,0,2,0.000001,0,0.000001,0.000003,0,0,0.000002,1,0,1,1,0,0,1,1,0,1,6,0,0,2,0,0,0,0.000003,0,0,0.000001
0,1,6,0,0,2,2,0,1,3,0,0,2,2,0,0.000001,0.000003,0,0,0.000002,0.000002,0,1,1,0,0,1,1,0,1,6,0,0,2,2,0,0,0.000003,0,0,0.000001,0.000001
0,2,2,0,0,8,4,0,2,2,0,0,8,4,0,0.000002,0.000002,0,0,0.000011,0.000006,0,1,1,0,0,1,1,0,2,2,0,0,17,19,0,0.000001,0.000001,0,0,0.000008,0.000009
2,2,0,0,8,4,1,2,2,0,0,8,4,1,0.000002,0.000002,0,0,0.000011,0.000006,0.000002,1,1,0,0,1,1,1,2,2,0,0,17,19,3,0.000001,0.000001,0,0,0.000008,0.000009,0.000001
2,0,0,8,4,1,1,2,0,0,8,4,1,2,0.000002,0,0,0.000011,0.000006,0.000002,0.000003,1,0,0,1,1,1,0.5,2,0,0,17,19,3,5,0.000001,0,0,0.000008,0.000009,0.000001,0.000002

Since you haven't put data or specified what sort of chart, I shall put untested code assuming a bubble plot -
library(ggplot2)
library(reshape2
datamelted <- melt(data, id.vars = 'ABcol')
ggplot(datamelted, aes(x = ABcol, y = variable, size = value) + geom_point()

Related

How do I create a grouped boxplot in R?

I have a data frame containing 5 probes which are my variables in a dataframe, cg02823866, cg13474877, cg14305799, cg15837913 and cg19724470. I want to create a boxplot that will group cg02823866 and cg14305799 into a group called 'GeneBody' and then cg13474877, cg14305799 and cg19724470 into a group called 'Promoter'. I then want to colour code the boxplots to represent the probe names. I can't figure out how to group those variables into groups to plot the graph.
I created an ungrouped boxplot of the five probes and it looked like this.
I want there to be the titles 'Promoter' and 'GeneBody' on the x axis. Above the 'GeneBody' title there are the 2 boxplots for the cg02823866 and cg14305799 probes. Then a 'Promoter' label with the boxplots for cg13474877, cg14305799 and cg19724470. I then want each boxplots colour coded to represent each different probe.
My data frame that I imported into RStudio looks like this: https://i.stack.imgur.com/r4gEC.png
Assuming you have some data with variable names Beta (your y axis), Probe (your current x axis), and group (either "GeneBody" or "Promoter"), you can do something like the following:
library(ggplot2)
ggplot(data, aes(x = group, y = Beta, fill = Probe)) +
geom_boxplot()
If you provide a reproducible set of data, I can probably do better.
Adding to Ben's answer the traditional iris-data.frame example,which you can easily load by data(iris):
ggplot(iris) +
aes(x = "", y = Sepal.Length, group = Species) +
geom_boxplot(shape = "circle", fill = "#112446") +
theme_minimal()
So you just need a column which indicates the group dependency.
It gets of course more difficult with uncleand data, where you might need to transpond the data first etc. But those are follow up questions i guess.
Also if you want to make your life easier, use esquisse R-Studio add-on
Boxplot

Determine order of several boxplots in one plot in R qqplot

I tried to create a relatively simple boxplot plot in R's ggplot2: One value on the x axis and several variables on the y axis. I'm using a code similar to this one:
ggplot() +
# Boxplot 1
geom_boxplot(df[which(df$Xvalue=="Boxplot1"),],
mapping = aes(X, "Y")) +
# Boxplot 2
geom_boxplot(df[which(df$Xvalue=="Boxplot2"),],
mapping = aes(X, "Y")) +
# Boxplot 3
geom_boxplot(df[which(df$Xvalue=="Boxplot3"),],
mapping = aes(X, "Y")) +
The boxplots in my real code are ordered alphabetically, however, I need them to be in a customized, categorial order.
I'm aware I could restructure my data frame so that I don't use a subset and a new geom_boxplot command for each boxplot, but I've structured the data that way for other reasons and that's not the solution I'm looking for right now.
Maybe there is an easy way using the scale_Y_manual or else? Any help is appreciated!

How to plot bar charts with 'n' number of columns and group by another column?

I am learning r currently and I have an r data-frame containing data I have scraped from a football website.
There are 58 columns(Variables,attributes) for each row. Out of these variables, I wish to plot 3 in a single bar chart.I have 3 important variables 'Name', 'Goals.with.right.foot', 'Goals.with.left.foot'.
What I want to build is a bar chart with each 'Name' appearing on the x-axis and 2 independent bars representing the other 2 variables.
Sample row entry:
{......., RONALDO, 10(left), 5(right),............}
I have tried playing around a lot with ggplot2 geom_bar with no success.
I have also searched for similar questions however I cannot understand the answers. Is anyone able to explain simply how do I solve this problem?
my data frame is called 'Forwards' who are the strikers in a game of football. They have attributes Name, Goals.with.left.foot and Goals.with.right.foot.
barplot(counts, main="Goals",
xlab="Goals", col=c("darkblue","red"),
legend = rownames(counts))
You could try it this way:
I simulated a frame as a stand in for yours, just replace it with a frame containing the columns you're interested in:
df <- data.frame(names = letters[1:5], r.foot = runif(5,1,10), l.foot = runif(5,1,10))
# transform your df to long format
library(reshape2)
plotDf <- melt(df, variable.name = 'footing', value.name = 'goals')
# plot it
library(ggplot2)
ggplot(plotDf, aes(x = names, y = goals, group = footing, fill = footing)) +
geom_col(position = position_dodge()) #does the same as geom_bar, but uses stat_identity instead of stat_count
Results in this plot:
your plot
This works, because ggplot expects one variable containing the values needed for the y-axis and one or more variable containing the grouping factor(s).
with the melt-function, your data.frame is merged into the so called 'long format' which is exactly the needed orientation of data.

Clustered bar chart R using 2 Numeric Variables/Metrics

I want to create a clustered Bar chart in R using 2 numeric variables, e.g:
Movie Genre (X-axis) and Gross$ + Budget$ should be Y-axis
It's a very straightforward chart to create in Excel. However, in R, I have put Genre in my X-axis and Gross$ in Y-axis.
My question is: Where do I need to put another Numeric variable ie Budget$ in my code so that the new Budget$ will be visible beside Gross$ in the chart?
Here is my Code:
ggplot(data=HW, aes(reorder(x=HW$Genre,-HW$Gross...US, sum),
y=HW$Gross...US))+
geom_col()
P.S. In aes I have just put reorder to sort the categories.
Appreciate help!
Could you give us some data so we can recreate it?
I think you are looking for geom_bar() and one of its options, position="dodge", which tells ggplot to put the bars side by side. But without knowing your data and its structure I can't further help you.
Melting the dataset should help in this case. A dummy-data based example below:
Data
HW <- data.frame(Genre = letters[sample(1:6, 100, replace = T)],
Gross...US = rnorm(100, 1e6, sd=1e5),
Budget...US = rnorm(100, 1e5, sd=1e4))
Code
library(tidyverse)
library(reshape2)
HW %>%
melt %>%
ggplot(aes(Genre, value, fill=variable)) + geom_col(position = 'dodge')

Reordering data based on a column in [r] to order x-value items from lowest to highest y-values in ggplot

I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create

Resources