This question already has answers here:
Plot each column against each column
(2 answers)
Plot all pairs of variables in R data frame based on column type [duplicate]
(1 answer)
Closed 17 days ago.
I have the output of 3 different algorithms as a continuous vector. Instead of comparing their correlation 1 by 1, I would like to plot them all simuntaionusly in the same plot, but in different panels. The dataframe looks like this (but contains >10k ids):
df <- data.frame(id=1:5,
feature1=runif(5),
feature2=runif(5,min = 3,max=5),
feature3=runif(5, min = 5,max=8))
Ideally, the resulting plot should looks something like this:
I am fairly sure that there is some simple tidyr function, which expands my dataframe in such a way that I can simply use ggplot2 in combination with facet_grid, but I searched and coudn't find anything..
Any help is much appreciated!
So I get that the title is terrible and generic like. I have no idea how to concisely describe what I am trying to do.
I've got a 2 column data frame in R, column A has data values, column B had data that has now been binned (was year associated with Column A, now is a bin label based on year ranges).
I need to generate a new data frame which uses the bin labels as columns with the associated data values as row entries, preferably sorted, back-filled with 'NA' to prevent columns of different lengths.
Sample data:
df <- data.frame(values=c(1,NA,3,NA,5:6,7:9),
bins=rep(c("yr1_yr2","yr2_yr3","yr3_yr4"),each=3))
SOLUTION EDIT: So after a lot of experimentation I was able to do what I wanted with my data by using the 'cut_width' function from ggplot2 to slice my data into bins then plop it in a distribution graph.
Thank you all for your attempts, sorry again for the vague question and lack of sample data.
Not quite sure if this is getting close to what you want...
library(tidyverse)
reshape2::melt(df, id.vars='bins', measure.vars='values')
returns
bins variable value
1 yr1_yr2 values 1
2 yr1_yr2 values NA
3 yr1_yr2 values 3
4 yr2_yr3 values NA
5 yr2_yr3 values 5
6 yr2_yr3 values 6
7 yr3_yr4 values 7
8 yr3_yr4 values 8
9 yr3_yr4 values 9
This is my data https://www.dropbox.com/s/msf0ro8saav7wbl/data1.txt?dl=0 (dataA), i want to extract "Habitat" to have frequency table so that i can calculate any statistical analysis such as mean and variance, and also to plot such as boxplot using ggplot2
I tried to use solution in duplicate question here R: How to get common counts (frequency) of levels of two factor variables by ID Variable (as new data frame) but i think it does not help my problem
Here's the easiest way to get a data.frame with frequencies using table. I'm using t to transpose and as.data.frame.matrix to transform it into a data.frame.
as.data.frame.matrix(t(table(data1)))
A B C
Adult 1 2 1
Juvenile 2 0 0
This question already has an answer here:
Grouping & Visualizing cumulative features in R
(1 answer)
Closed 6 years ago.
I have a set of data that I would like to plot like this:
Now this is plotted using LibreOffice Calc in Ubunutu. I have tried to do this in R using following code:
ggplot(DATA, aes(x="Samples", y="Count", fill=factor(Sample1)))+geom_bar(stat="identity")
This does not give me a stacked bar graph for each sample, but rather one single graph. I have had a similar question, that used a different dataframe, that was answered here. However, in this problem I don't have just one sample, but information for at least three. In LibreOffice Calc or Excel I can choose the stacked bar graph option and then choose to use rows as the data series. How can I achieve a similar graph in ggplot2?
Here is the dataframe/object for which I am trying to produce the graph:
Aminoacid Sequence,Sample1,Sample2,Sample3
Sequence 1,16,10,33
Sequence 2,2,2,7
Sequence 3,1,1,6
Sequence 4,4,1,1
Sequence 5,1,2,4
Sequence 6,4,3,14
Sequence 7,2,2,2
Sequence 8,8,5,12
Sequence 9,1,3,17
Sequence 10,7,1,4
Sequence 11,1,1,1
Sequence 12,1,1,2
Sequence 13,1,1,1
Sequence 14,1,2,2
Sequence 15,5,4,7
Sequence 16,3,1,8
Sequence 17,7,5,20
Sequence 18,3,3,21
Sequence 19,2,1,5
Sequence 20,1,1,1
Sequence 21,2,2,5
Sequence 22,1,1,3
Sequence 23,4,2,9
Sequence 24,2,1,1
Sequence 25,4,4,3
Sequence 26,4,1,3
I copied the content of a .csv file, is that reproducible enough? It worked for me to just use read.csv(.file) in R.
Edit:
Thank you for redirecting me to another post with a very similar problem, I did not find that before. That post brought me a lot closer to the solution. I had to change the code just a little to fit my problem, but here is the solution:
df <- read.csv("example.csv")
df2 <- melt(example, id="Aminoacid.Sequence")
ggplot(df2, aes(x=variable, y=value, fill=Aminoacid.Sequence))+geom_bar(stat="identity")
Using variable as on the x-axis makes bar graph for each sample (Sample1-Sample3 in the example). Using y=value uses the value in each cell for that sample on the y-axis. And most importantly, using fill="Aminoacid.Sequence" stacks the values for each sequence on top of each other giving me the same graph as seen in the screenshot above!
Thank you for your help!
Try something along the following lines:
library(reshape2)
df <- melt(DATA) # you probably need to adjust the id.vars here...
ggplot(df, aes(x=variable, y=value) + geom_bar(stat="identity")
Note that you need to adjust the ggplot and the melt code somewhat, but since you haven't provided sample data, no one can provide the actual code necessary. The above provides the basic approach on how to deal with these multiple columns representing your samples, though. melt will "stack" the columns on top of each other, and create a column with the old variable name. This you can then use as x for ggplot.
Note that if you have other data in the data frame as well, melt will also stack these. For that reason you will need to adjust the commands to fit your data.
Edit: using your data:
library(reshape2)
library(ggplot2)
### reading your data:
# df <- read.table(file="clipboard", header=T, sep=",")
df2 <- melt(df)
head(df2)
Aminoacid.Sequence variable value
1 Sequence 1 preDLI 16
2 Sequence 2 preDLI 2
3 Sequence 3 preDLI 1
4 Sequence 4 preDLI 4
5 Sequence 5 preDLI 1
6 Sequence 6 preDLI 4
This can be used as in:
ggplot(df2, aes(x=variable, y=value, fill=Aminoacid.Sequence)) + geom_bar(stat="identity")
I am sure you want to change some details about the graph, such as the colors etc, but this should answer your inital question.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
dropping factor levels in a subsetted data frame in R
I have a data frame which has a factor column, then I would like to use subset to extract only part of its data. But the extracted data frame's factor column still has the same levels even some levels has no value. This would impact my following actions (like visualization using ggplot).
The following is a sample code.
d<-data.frame(c1=factor(c(1,1,2,3)),c2=c("a","b","c","d"))
d<-subset(d,c1 %in% c(1,2))
d$c1
The column c1 still have 3 levels (1,2,3), but actually I'd like to it to be (1,2), because these's no value for level 3. Then in visualization, I won't draw any graph for level 3.
How can I achieve that ? Thanks
Use droplevels:
d <- droplevels(d)