Adding Legend in R using row names - r

I have data frame which I want to pass first two columns rows+variable names to the legend.
Inside of df I have group of dataset in which they grouped with letters from a to h.
The thing I want to succeed is that something like 78_256_DQ0_a and
78_256_DQ1_a and 78_256_DQ2_a to legends a and so on for other groups.
I dont know how to pass this format to the ggplot.
Any help will be appreciated.
Lets say I have a data frame like this;
df <- do.call(rbind,lapply(1,function(x){
AC <- as.character(rep(rep(c(78,110),each=10),times=3))
AR <- as.character(rep(rep(c(256,320,384),each=20),times=1))
state <- rep(rep(c("Group 1","Group 2"),each=5),times=6)
V <- rep(c(seq(2,40,length.out=5),seq(-2,-40,length.out=5)),times=2)
DQ0 = sort(replicate(6, runif(10,0.001:1)))
DQ1 = sort(replicate(6, runif(10,0.001:1)))
DQ2 = sort(replicate(6, runif(10,0.001:1)))
No = c(replicate(1,rep(letters[1:6],each=10)))
data.frame(AC,AR,V,DQ0,DQ1,DQ2,No)
}))
head(df)
AC AR V DQ0 DQ1 DQ2 No
1 78 256 2.0 0.003944916 0.00902776 0.00228837 a
2 78 256 11.5 0.006629239 0.01739512 0.01649540 a
3 78 256 21.0 0.048515226 0.02034436 0.04525160 a
4 78 256 30.5 0.079483625 0.04346118 0.04778420 a
5 78 256 40.0 0.099462310 0.04430493 0.05086738 a
6 78 256 -2.0 0.103686255 0.04440260 0.09931459 a
*****************************************************
this code for plotting the df
library(reshape2)
df_new <- melt(df,id=c("V","No"),measure=c("DQ0","DQ1","DQ2"))
library(ggplot2)
ggplot(df_new,aes(y=value,x=V,group=No,colour=No))+
geom_point()+
geom_line()

Adding lty = variable to your aesthetics, like so:
ggplot(df_new, aes(y = value, x = V, lty = variable, colour = No)) +
geom_point() +
geom_line()
will give you separate lines for DQ0, DQ1, and DQ2.

Related

Adding labels to outliers in a scatterplot

I currently have a matrix (table) like this that contains 6 women's height and weight:
V1 V2 V3 V4
1 Bella 161 60
2 Jessica 160 55
3 Indigo 179 72
4 Tina 165 54
5 Sofia 178 70
6 Fiona 163 51
On the scatterplot (height vs weight), I want to label the outliers using the female's name. Is there a method to do so? I've tried
text(V4, V3, labels=V2)
but it doesn't seem to work.
The text function doesn't know where to get V4 and such.
A few options, sticking with base graphics:
text(dat$V4, dat$V3, labels = $V2)
with(dat, text(V4, V3, labels = V2))
text(V3 ~ V4, data = dat, labels = V2)
Demonstration:
plot(mpg ~ disp, data = mtcars, pch = 16, col = "gray90")
text(mpg ~ disp, data = mtcars[2:4,], labels = cyl)
You can also try a ggplot2 approach. Here I include the code using your data but you would have to define what is an outlier. Here the approach:
library(ggplot2)
#Data
df <- data.frame(V1=1:6,
V2=c('Bella','Jessica','Indigo','Tina','Sofia','Fiona'),
V3=c(161,160,179,165,178,163),
V4=c(60,55,72,54,70,51),stringsAsFactors = F)
We have to define the outlier. Let's say values greater than 175 in V3 and greater than 69 in V4 are outliers:
#Create label
df$label <- ifelse(df$V3>175 & df$V4>69,df$V2,NA)
Now we plot:
#Plot
ggplot(df,aes(x=V3,y=V4))+
geom_point()+
geom_text(aes(label=label),vjust=-0.5)
Output:
Also take note on great suggestions from sage #r2evans. They are very clear!

Heatmap in ggplot2 issue with fill

I'm trying to make a heatmap using ggplot2. What I want to be plotted is in the form of a matrix which is the result of a function.
Here is the data:
Image A B C D E F
1 3 23 45 23 45 90
2 4 34 34 34 34 89
3 34 33 24 89 23 67
4 3 45 234 90 12 78
5 78 89 34 23 12 56
6 56 90 56 67 34 45
Here is the function:
vector_a <- names(master)[2:4]
vector_b <- names(master)[5:6]
heatmap_prep <- function(dataframe, vector_a,vector_b){
dummy <- as.data.frame(matrix(0, nrow=length(vector_a), ncol=length(vector_b)))
for (i in 1:length(vector_a)){
first_value <- dataframe[[ vector_a[i] ]]
# print(first_value)
for(j in 1:length(vector_b)){
second_value <- dataframe[[ vector_b[j] ]]
result <- cor(first_value, second_value, method = "spearman")
dummy [i,j] <- result
}
}
rownames(dummy) <- vector_a
return(as.matrix(dummy))
heatmap_data_matrix1 <- heatmap_prep(master,vector_a, vector_b)
Using the data in heatmap_data_matrix1, I want to create a heatmap using the following code:
library(ggplot2)
if (length(grep("ggplot2", (.packages() ))) == 0){
library(ggplot2)
}
p <- ggplot(data = heatmap_data_matrix1, aes(x = vector_a, y = vector_b)
+ geom_tile(aes(fill = ))
However, this does not work. How should I reformat my data/code so this heatmap can be created? What should I put under "fill="?
Thanks!
Due to many of R functions being vectorized and that, for the most part, you don't need to pre-allocate or define a vector the for loop is unnecessary. You can simply run corr(x,y, method = "spearman") without the complications of the loop.
Regarding your question of what to put in for fill, you'll need to reshape your data to the configuration that ggplot2 uses (long format).
The gather function from tidyr does this, placing the rows/columns of the correlation into separate columns, and then using the r value for fill.
library(tidyverse) # for tidyr, tibble, ggplot2, and magrittr
heatmap_function <- function(df, a, b) {
cor_data <- cor(df[a], df[b], method = "spearman") %>%
as.data.frame(rownames = a) %>%
rownames_to_column("x") %>%
gather(y, fill, -x)
ggplot(cor_data, aes(x = x, y = y, fill = fill)) +
geom_tile()
}
This results in:
heatmap_function(master, c("A","B","C"), c("D","E"))

Plot data from different csv files in one graph using R + ggplot

I have multiple .csv files, every on of this has a column (called: Data) that I want to compare with each other. But first, I have to group the values in a column of each file. In the end I want to have multiple colored "lines" with the mean value of each group in one graph. I will describe the process I use to get the graph I want below. This works for a single file but I don't know how to add multiple "lines" of multiple files in one graph using ggplot.
This is what I got so far:
data = read.csv(file="my01data.csv",header=FALSE, sep=",")
A single .csv File looks like the following, but without the headline
ID Data Range
1,63,5.01
2,61,5.02
3,65,5.00
4,62,4.99
5,62,4.98
6,64,5.01
7,71,4.90
8,72,4.93
9,82,4.89
10,82,4.80
11,83,4.82
10,85,4.79
11,81,4.80
After getting the data I group it with the following lines:
data["Group"] <- NA
data[(data$Range>4.95), "Group"] <- 5.0
data[(data$Range>4.85 & data$Range<4.95), "Group"] <- 4.9
data[(data$Range>4.75 & data$Range<4.85), "Group"] <- 4.8
The final data looks like this:
myTable <- "ID Data Range Group
1 63 5.01 5.00
2 61 5.02 5.00
3 65 5.00 5.00
4 62 4.99 5.00
5 62 4.98 5.00
6 64 5.01 5.00
7 71 4.90 4.90
8 72 4.93 4.90
9 72 4.89 4.90
10 82 4.80 4.80
11 83 4.82 4.80
10 85 4.79 4.80
11 81 4.80 4.80"
myData <- read.table(text=myTable, header = TRUE)
To plot this dataframe I use the following lines:
( pplot <- ggplot(data=myDAta, aes(x=myDAta$Group, y=myDAta$Data))
+ stat_summary(fun.y = mean, geom = "line", color='red')
+ xlab("Group")
+ ylab("Data")
)
Which results in a graph like this:
I assume you have the names of your .csv-files stored in a vector named file_names. Then you can run the following code and should get a different line for each file:
library(ggplot2)
data_list <- lapply(file_names, read.csv , header=FALSE, sep=",")
data_list <- lapply(seq_along(data_list), function(i){
df <- data_list[[i]]
df$Group <- round(df$Range, 1)
df$DataNumber <- i
df
})
finalTable <- do.call(rbind, data_list)
finalTable$DataNumber <- factor(finalTable$DataNumber)
ggplot(finalTable, aes(x=Group, y=Data, group = DataNumber, color = DataNumber)) +
stat_summary(fun.y = mean, geom = "line") +
xlab("Group") +
ylab("Data")
How it works
First the different datasets are read with read.csv into a list data_list. Then each data.frame in that list is assigned a Group.
I used round here with k=1, which means it rounds to one decimal point (I figured that's what your are doing).
Then also a unique number (in this case simply the index of the list) is assigned to each data.frame. After that the list is combined to one data.frame with rbind and then DataNumber is turned into a factor (prettier for plotting). Finally I added DataNumber as a group and color variable to the plot.
You can add another line by using stat_summary again; you can define the data and aes argument to any other dataset:
#some pseudo data for testing
my_other_data <- myData
my_other_data$Data <- my_other_data$Data * 0.5
pplot <- ggplot(data=myData, aes(x=Group, y=Data)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
stat_summary(data=my_other_data, aes(x=Group, y=Data),
fun.y = mean, geom = "line", color='green') +
xlab("Group") +
ylab("Data")
pplot
Why not creating a classifying column ("Class")
myTable1$Class <- "table1"
myTable1
"ID Data Range Group Class
1 63 5.01 5.00 table1
2 61 5.02 5.00 table1
3 65 5.00 5.00 table1"
myTable2$Class <- "table2"
myTable2
"ID Data Range Group Class
1 63 5.01 5.00 table2
2 61 5.02 5.00 table2
3 65 5.00 5.00 table2"
And merging dataframe
dfBIND <- rbind(myTable1, MyTable2)
So that you can ggplot with a grouping or coloring variable
pplot <- ggplot(data=dfBIND, aes(x= dfBIND$Group, y= dfBIND$Data, group=Class)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
xlab("Group") +
ylab("Data")

R Plot Bar graph transposed dataframe

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

ggplot-How to create a legend using row&column names

I have data frame which I want to pass first two columns rows and
variable column names to create legend.
Inside of df I have group of dataset in which they grouped with letters from a to h. In particular, I want to pass AC&AR columns rows as names in combination with DQ0:DQ2 variables and they should be shown in the legend with that format.
something like 78_256_DQ0, and 78_256_DQ1 and 78_256_DQ2 for data group a
and same for the rest of letters in the df.
my reproducible df like this;
df <- do.call(rbind,lapply(1,function(x){
AC <- as.character(rep(rep(c(78,110),each=10),times=3))
AR <- as.character(rep(rep(c(256,320,384),each=20),times=1))
V <- rep(c(seq(2,40,length.out=5),seq(-2,-40,length.out=5)),times=2)
DQ0 = sort(replicate(6, runif(10,0.001:1)))
DQ1 = sort(replicate(6, runif(10,0.001:1)))
DQ2 = sort(replicate(6, runif(10,0.001:1)))
No = c(replicate(1,rep(letters[1:6],each=10)))
data.frame(AC,AR,V,DQ0,DQ1,DQ2,No)
}))
head(df)
AC AR V DQ0 DQ1 DQ2 No
1 78 256 2.0 0.003944916 0.00902776 0.00228837 a
2 78 256 11.5 0.006629239 0.01739512 0.01649540 a
3 78 256 21.0 0.048515226 0.02034436 0.04525160 a
4 78 256 30.5 0.079483625 0.04346118 0.04778420 a
5 78 256 40.0 0.099462310 0.04430493 0.05086738 a
6 78 256 -2.0 0.103686255 0.04440260 0.09931459 a
*****************************************************
library(reshape2)
df_new <- melt(df,id=c("V","No"),measure=c("DQ0","DQ1","DQ2"))
library(ggplot2)
ggplot(df_new,aes(y=value,x=V,group=No,colour=No))+
geom_point()+
geom_line()
UPDATE
after #... answer I made a little bit progress. His solution is partially ok. Because when we melt names
df$names <- interaction(df$AC,df$AR,names(df)[4:6])
df_new <- melt(df,id=c("V","No","names1"),measure=c("DQ0","DQ1","DQ2"))
this command plots 4 rows for each group a to h.
the output becomes like this;
head(df)
AC AR V DQ0 DQ1 DQ2 No names
1 78 256 2.0 0.002576547 0.04294134 0.008302918 a 78.256.DQ0
2 78 256 11.5 0.010150299 0.04570650 0.011749370 a 78.256.DQ1
3 78 256 21.0 0.012540026 0.06977744 0.013887357 a 78.256.DQ2
4 78 256 30.5 0.036532977 0.11460343 0.071172301 a 78.256.DQ0
5 78 256 40.0 0.042801967 0.11518191 0.073756228 a 78.256.DQ1
6 78 256 -2.0 0.043275144 0.13033194 0.076569977 a 78.256.DQ2
**************************************************************
and with modification of the plot command
ggplot(df_new,aes(y=value,x=V,lty=variable,colour=names))+
geom_point()+
geom_line()
the output format which I prefer is something I can refer all rows of DQ0,DQ1 and DQ2 inside of each group. Any suggestions?
last condition
u can use df$names <- interaction(v$AC,v$AR,DQ0) and then also set names in you melt command as id. Later you use color=names in your aes function.
So, this will add a column name with a combination of the defined columns. You can also set a sep='_' if you prefer over ..
If you now use this column for colouring, you will get those labels as legend names.
finally I found a way using gather from dplyr.
df_gather <- df %>% gather(DQ, value,-No, -AC, -AR, -V)
and using interaction function from #drmariod answer
df_gather$names <- interaction(df_gather$AC,df_gather$AR,df_gather$DQ)
and here is the result of this question:)

Resources