R and multi-columns graphic - r

I'm like a newbie in R, I worked with it during my university studies, but it's far away...
I have a table with 4 columns: vine ID, and 3 columns for NDVI (a vegetation index) values at 3 dates.
ID 09052017 25052017 16062017
1 233 244 238
2 225 234 247
3 224 231 245
4 124 115 124
I know how to read my table, create variables with it, select columns or rows, make a plot(x,y).
My goal is to represent for each ID a line with the 3 NDVI values and all that in a same graph windows
But i'm a little bit confused to do what I want.
Somebody can give some ideas to create this ?

Like this?
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
gather(date, NDVI, -ID) %>%
ggplot(aes(x = as.Date(date, '%d%m%Y'), y = NDVI, group = ID, col = factor(ID))) +
geom_line() +
xlab("Date")

Related

Plot Column of Dataframe as Y, Another Column as X, and Group by Another Column in ggplot in R [duplicate]

I have a survey file in which row are observation and column question.
Here are some fake data they look like:
People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good
My aim is to create this kind of plot with ggplot2.
I absolutely don't care of the colors, design, etc.
The plot doesn't correspond to the fake data
Here are my fake data:
raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...
EDIT: Many years later
For a pure ggplot2 + utils::stack() solution, see the answer by #markus!
A somewhat verbose tidyverse solution, with all non-base packages explicitly stated so that you know where each function comes from:
library(magrittr) # needed for %>% if dplyr is not attached
"http://pastebin.com/raw.php?i=L8cEKcxS" %>%
utils::read.csv(sep = ",") %>%
tidyr::pivot_longer(cols = c(Food, Music, People.1),
names_to = "variable",
values_to = "value") %>%
dplyr::group_by(variable, value) %>%
dplyr::summarise(n = dplyr::n()) %>%
dplyr::mutate(value = factor(
value,
levels = c("Very Bad", "Bad", "Good", "Very Good"))
) %>%
ggplot2::ggplot(ggplot2::aes(variable, n)) +
ggplot2::geom_bar(ggplot2::aes(fill = value),
position = "dodge",
stat = "identity")
The original answer:
First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:
raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it
freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level
Then you need to create a data frame out of it, melt it and plot it:
Names=c("Food","Music","People") # create list of names
data=data.frame(cbind(freq),Names) # combine them into a data frame
data=data[,c(5,3,1,2,4)] # sort columns
# melt the data frame for plotting
data.m <- melt(data, id.vars='Names')
# plot everything
ggplot(data.m, aes(Names, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")
Is this what you're after?
To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:
> head(df)
ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
1 1 A 1980 450 338 154 36 13 9
2 2 A 2000 288 407 212 54 16 23
3 3 A 2020 196 434 246 68 19 36
4 4 B 1980 111 326 441 90 21 11
5 5 B 2000 63 298 443 133 42 21
6 6 B 2020 36 257 462 162 55 30
Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.
For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:
> data
Names Very.Bad Bad Good Very.Good
1 Food 7 6 5 2
2 Music 5 5 7 3
3 People 6 3 7 4
Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).
In #jakub's answer the calculations are done before the data is passed to ggplot(), which is why the stat in geom_bar is set to "identity" (i.e. take the data as is and do nothing with it).
Another approach is to let ggplot do the counting for you, hence we can make use of stat = "count", the default of geom_bar:
library(ggplot2)
ggplot(stack(df1[, -1]), aes(ind, fill = values)) +
geom_bar(position = "dodge")
data
df1 <- read.csv(text = "People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good
P7,Bad,Very Bad,Good
P8,Very Good,Very Bad,Good
P9,Very Bad,Good,Bad
P10,Bad,Good,Very Bad
P11,Good,Bad,Very Bad
P12,Very Bad,Bad,Very Good
P13,Bad,Very Good,Bad
P14,Bad,Very Good,Very Bad
P15,Good,Good,Good
P16,Very Bad,Very Good,Very Bad
P17,Very Bad,Good,Good
P18,Very Bad,Very Bad,Bad
P19,Very Good,Very Bad,Very Bad
P20,Very Bad,Bad,Good", header = TRUE)

ggplot: how to draw a plot with colored columns?

Hy guys, I'm need to replicate this kind of plot Real Wages differences for each city
In the x axis there is the code of each city, while in the y axis the values of the real wage.
Actually, I have 2 different variables for real wage ( Real wage 1 and Real Wage 2 ).
Real Wage 1 is always bigger than Real wage 2, hence the orange part of the bars should represent the percentage more compared with Real wage 1 ( blu part).
My database is something like this
#database
City Code Real Wage 1 Real Wage 2
91 530 500
92 520 490
93 410 390
94 300 270
95 205 200
96 501 434
97 700 678
98 800 730
99 900 820
100 740 700
101 590 560
102 420 400
103 340 320
104 290 270
105 120 100
How can I do that?
I don't even know if it is possible with ggplot2 overlap the bars of 2 variable or write the percentage in the orange part of each bar
UPDATE
Thanks to #Chamkrai for the code.
Does anyone have any idea how to write the % of difference within each bar as in the picture i've posted?
UPDATE 2
Thanks to #r2evans for the remaining part of the code
UPDATE 3
( I ve substitute the variable City Code with the variable Provincia which is a character variable, containing the name of each city ).
UPDATE 4
It works!!
I should have used select before mutate!!
THANKS TO ALL
library(tidyverse)
df %>%
pivot_longer(-City_Code) %>%
ggplot() +
aes(x = reorder(City_Code, value), y = value, fill = name) +
geom_col() +
coord_flip()
This expanded code is based on #Chamkrai's excellent first-answer, please accept that answer over this. I offer it here for concise clarity on the comments.
df %>%
mutate(pct = scales::percent((Real_Wage_2 - Real_Wage_1) / Real_Wage_1)) %>%
pivot_longer(-c(City_Code, pct)) %>%
group_by(City_Code) %>%
mutate(pcty = sum(value)) %>%
ggplot() +
aes(x = reorder(City_Code, value), y = value, fill = name) +
geom_col() +
coord_flip() +
geom_text(aes(label = pct, y = pcty), data = ~ subset(., name == "Real_Wage_1"), hjust = 1.1)

R Plot Bar graph transposed dataframe

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

Change the name of different groups with same prefix in a long data frame

I have a data frame in long format that I want to merge different groups with the same prefix. The structure of the df I am working with is as follows:
id point variable value
1531 36 P6A area 290
1532 48 P6B area 230
1533 60 P5A area 20
1534 72 P5B area 180
1535 84 P4A area 100
1536 96 P4B area 90
I want to change it to something like this:
id point variable value
1531 36 P6 area 260
1533 60 P5 area 100
1535 84 P4 area 80
Note that the new value is the mean of PxA and PxB.
I would be very grateful if somebody could help me out.
Murillo
If you just need to keep the first two characters from each value (as in your example data) then you can do:
library(dplyr)
dfMean = df %>% mutate(point = substr(point, 1,2)) %>%
group_by(point) %>%
summarise(PointMean = mean(value))
A base R option would be to use aggregate:
df$point2 = substr(df$point, 1,2)
aggregate(value ~ point2, data=df, mean)

Creating Heatmaps in R

I want to create a heatmap using R.
Here is how my dataset looks like:
sortCC
Genus Location Number propn
86 Flavobacterium CC 580 0.3081827843
130 Algoriphagus CC 569 0.3023379384
88 Joostella CC 175 0.0929861849
215 Paracoccus CC 122 0.0648246546
31 Leifsonia CC 48 0.0255047821
sortN
Genus Location Number propn
119 Niastella N 316 0.08205661
206 Aminobacter N 252 0.06543755
51 Nocardioides N 222 0.05764736
121 Niabella N 205 0.05323293
257 Pseudorhodoferax??? N 193 0.05011685
149 Pedobacter N 175 0.04544274
Here is the code I have so far:
row.names(sortCC) <- sortCC$Genus
sortCC_matrix <- data.matrix(sortCC)
sortCC_heatmap <- heatmap(sortCC_matrix, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(5,10))
I was going to generate 2 separate heatmap, but when I used the code above it looked wrong while using R.
Questions: 1)Is it possible to combine the two data set since they have the same genus, but the difference is the location and number & proportion. 2) If it is not possible to combine the two then how do I exclude the location column from the heatmap.
Any suggestions will be much appreciated! Thanks!
Since you have the same columns, you cand bind your data.frames and use some facets to differentiate it. Here a solution based on ggplot2:
dat <- rbind(sortCC,sortN)
library(ggplot2)
ggplot(dat, aes(y = factor(Number),x = factor(Genus))) +
geom_tile(aes(fill = propn)) +
theme_bw() +
theme(axis.text.x=element_text(angle=90)) +
facet_grid(Location~.)
To remove extra column , You can use subset:
subset(dat,select=-c(Location))
If you still want to merge data's by Genius, you can use do this for example:
sortCC <- subset(sortCC,select=-c(Location))
sortN <- subset(sortN,select=-c(Location))
merge(sortCC,sortN,by='Genus')

Resources