Format for discrete data in a ggplot barplot

Format for discrete data in a ggplot barplot - r

I'm sure this is stupidly simple but it's been some time since I've used R, and I have never made a barchart with ggplot.
I have the following averages from a larger dataset:
> mean_gc
PVC1 PVC2 PVC3 PVC4 PVC5 PVC6 PVC7 PVC8 PVC9 PVC10 PVC11
0.4019026 0.4479259 0.4494118 0.4729437 0.4800556 0.4492290 0.4905295 0.4457566 0.4271259 0.4850341 0.4369965
PVC12 PVC13 PVC14 PVC15 PVC16
0.4064052 0.3743776 0.3603853 0.3965469 0.3654610
My end goal is to plot a bar chart (since each "PVC#" is discrete), and fit a step-function across it in R to try and find subtle 'breakpoints' - but that's a problem for later...
The only way I've been able to achieve a barplot from this is using barplot which creates the graph below.
Which is fine, but it's ugly compared to ggplot.
I've tried setting the above data as a dataframe both with the PVC labels in the dataframe, and as rownames - but I just can't get the syntax right and I'm at my wits end!
What am I missing?
EDIT FOR CLARITY ON DATAFRAMES
The above was just the printed output in R (not the best way to show it - my apologies). I have the data in the following (column based format):
mean_gc
PVC1 0.4019026
PVC2 0.4479259
PVC3 0.4494118
PVC4 0.4729437
PVC5 0.4800556
PVC6 0.4492290
PVC7 0.4905295
PVC8 0.4457566
PVC9 0.4271259
PVC10 0.4850341
PVC11 0.4369965
PVC12 0.4064052
PVC13 0.3743776
PVC14 0.3603853
PVC15 0.3965469
PVC16 0.3654610
Where PVC# are the row.names. I also have the same dataset where the row.names are present as the first column, in case that is required (but I suspect not).

You have to melt() your data before you can use ggplot2, because it assumes a tidy data structure.
library(reshape2)
library(ggplot2)
ggplot(melt(df), aes(variable, value)) +
geom_bar(stat = "identity")
Data
df <- structure(list(PVC1 = 0.4019026, PVC2 = 0.4479259, PVC3 = 0.4494118,
PVC4 = 0.4729437, PVC5 = 0.4800556, PVC6 = 0.449229, PVC7 = 0.4905295,
PVC8 = 0.4457566, PVC9 = 0.4271259, PVC10 = 0.4850341, PVC11 = 0.4369965,
PVC12 = 0.4064052, PVC13 = 0.3743776, PVC14 = 0.3603853,
PVC15 = 0.3965469, PVC16 = 0.365461), .Names = c("PVC1",
"PVC2", "PVC3", "PVC4", "PVC5", "PVC6", "PVC7", "PVC8", "PVC9",
"PVC10", "PVC11", "PVC12", "PVC13", "PVC14", "PVC15", "PVC16"
), class = "data.frame", row.names = c(NA, -1L))

Related

Scatter plot with ggplot2 colored by specific dates interval in r

I'm trying to assign different colors to the scatterplot based on their dates, more specifically the year.
This is how my dataset looks like:
> dput(head(CORt_r100_stack_join_fspec,10))
structure(list(Date = structure(c(16779, 16834, 16884, 16924,
16973, 16997, 17031, 17184, 17214, 17254), class = "Date"), meanNDVIN_int = c(0.677501157246889,
0.632728796482024, 0.578636981692124, 0.547002029242488, 0.632635423362751,
NA, 0.699596252720458, 0.670059391804396, 0.643347941166436,
0.674034259709311), meanNDVIW_int = c(0.784142418592418, 0.652437451242156,
0.648319814752948, 0.593432266488189, 0.767890365415717, NA,
0.779249089832163, 0.71974944410843, 0.715777992826006, 0.685045115352089
), meanNDVIE_int = c(0.703614512017928, 0.701963337684803, 0.488628353756438,
0.631309466083632, 0.781589421376217, NA, 0.799663418920722,
0.78910564747191, 0.710962969930836, 0.715644011856453), meanNDVINr_int_f = c(0.677501157246889,
0.632728796482024, 0.578636981692124, 0.547002029242488, 0.632635423362751,
0.687343078509066, 0.699596252720458, 0.670059391804396, 0.643347941166436,
0.674034259709311), meanNDVIWr_int_f = c(0.784142418592418, 0.652437451242156,
0.648319814752948, 0.593432266488189, 0.767890365415717, 0.749505859407419,
0.779249089832163, 0.71974944410843, 0.715777992826006, 0.685045115352089
), meanNDVIEr_int_f = c(0.703614512017928, 0.701963337684803,
0.488628353756438, 0.631309466083632, 0.781589421376217, 0.625916155640988,
0.799663418920722, 0.78910564747191, 0.710962969930836, 0.715644011856453
), NDVI_N = c(0.17221248, 0.644239685, 0.57222623, 0.558666635,
0.51654034, 0.42053949, 0.396706695, 0.641767447, 0.641008268,
0.662841949), NDVI_W = c(0.08182944, 0.69112807, 0.637699375,
0.629429605, 0.658829525, 0.60621678, 0.57186129, 0.72636742,
0.724193596, 0.738424976), NDVI_E = c(0.17135712, 0.659222803,
0.58665977, 0.573081253, 0.533498035, 0.437643585, 0.412841468,
0.652057206, 0.651854988, 0.670345511), NDVI_U = c(0.40520304,
0.578414833, 0.455746833, 0.428289893, 0.208847548, 0, 0, 0.475193691,
0.478691084, 0.505043773)), row.names = c(NA, 10L), class = "data.frame")
I've been plotting meanNDVIN_int against NDVI_N using this code:
ggplot(CORt_r100_join_fspec_2NDVIday,aes(x=NDVI_N)) +
geom_point(aes(y=meanNDVIN_int), colour="red")
theme_bw()+
ylab("meanNDVIN_int")+
xlab("NDVI_N")
Now I want to color each point differently (no matter the color) based on their year, 2015, 2016, and 2017.
I've used the scale_color_manual function to introduce the dates but no success so far.
Any help will be much appreciated.

Here is an alternative where you substring the first 4 characters from Date in color
df
ggplot(df,aes(x=NDVI_N)) +
geom_point(aes(y=meanNDVIN_int, color=substring(Date,1,4))) +
labs(color="Year")+
theme_bw()+
ylab("meanNDVIN_int")+
xlab("NDVI_N")

I created a year variable with lubridate and stored it asfactor for discrete colouring. You were just missing moving color inside the aes() to color it by year.
# Add year Variable;
CORt_r100_stack_join_fspec <- CORt_r100_stack_join_fspec %>% mutate(
year = as.factor(lubridate::year(Date))
)
# Plot;
ggplot(CORt_r100_stack_join_fspec,aes(x=NDVI_N)) +
geom_point(aes(y=meanNDVIN_int, color = year)) +
theme_bw() +
ylab("meanNDVIN_int")+
xlab("NDVI_N")
Note: The data you provided, and named is not the same as in your plot-call. So I changed CORt_r100_join_fspec_2NDVIday to CORt_r100_join_fspec_2NDVIday to make the plot and mutate function properly.

Making a Stacked Bar Chart Out of Table Columns in R

I'm trying to create a stacked bar graph showing body composition. I have a table/data set (I don't know the correct term) that looks like this:
structure(list(data.Date = structure(1:7, .Label = c("2021-03-06",
"2021-03-07", "2021-03-08", "2021-03-09", "2021-03-10", "2021-03-11",
"2021-03-12"), class = "factor"), total_bf = c(19.6612, 18.2182,
19.6803, 21.7047, 18.126, 19.7, 19.1424), total_muscle = c(41.5948,
43.043, 42.1578, 42.1866, 43.4017, 42.2, 42.2728), other = c(37.544,
38.8388, 38.0619, 38.0087, 39.1723, 38.1, 38.2848)), class = "data.frame", row.names = c(NA,
-7L))
Each column is a weight in kilograms. Together they add up to the total body weight of the subject. What I want is a stacked bar graph where each bar represents a date and each bar is split by total_bf, total_muscle and other. All of the guides and Q&As I've seen don't seem to apply to my situation. Maybe this is because I am new but nothing I've tried has worked yet.
An example of what I'm trying to achieve:
The only difference is that on my graph blue would be body fat (total_bf), green would be other and red would be muscle (total_muscle).

You can convert data from the wide format to the long format using tidyr::pivot_longer() function:
library(ggplot2)
df <- structure(list(
data.Date = structure(
1:7,
.Label = c("2021-03-06", "2021-03-07", "2021-03-08", "2021-03-09",
"2021-03-10", "2021-03-11", "2021-03-12"), class = "factor"),
total_bf = c(19.6612, 18.2182, 19.6803, 21.7047, 18.126, 19.7, 19.1424),
total_muscle = c(41.5948, 43.043, 42.1578, 42.1866, 43.4017, 42.2, 42.2728),
other = c(37.544, 38.8388, 38.0619, 38.0087, 39.1723, 38.1, 38.2848)
), class = "data.frame", row.names = c(NA, -7L))
long <- tidyr::pivot_longer(df, -data.Date)
Then using ggplot2, the defaults already make a stacked bar chart, so you just need to specify x, y and fill aesthetics.
ggplot(long, aes(data.Date, value, fill = name)) +
geom_col()
Since your date is encoded as a factor, if you want to encode it as a real date you can convert it as follows:
long$date <- as.Date(strptime(as.character(long$data.Date), format = "%Y-%m-%d"))
ggplot(long, aes(date, value, fill = name)) +
geom_col()
Created on 2021-03-12 by the reprex package (v0.3.0)

How to edit the labels of a facet_wrap/grid if there are two variables?

In ggplot I have faceted by two variables (tau and z) but can only change the label of the first:
df<-data.frame(x=runif(1e3),y=runif(1e3),tau=rep(c("A","aBc"),each=500),z=rep(c("DDD","EEE"),each=500))
tauNames <- c(
`A` = "10% load",
`aBc` = "40% load"
)
df%>%
ggplot(aes(x=x,y=y))+
geom_point(alpha=0.4)+
xlab(label = "Time[s]")+
ylab(label = "Dose")+
facet_grid(tau~z,labeller = as_labeller(tauNames))+
ggpubr::theme_pubclean()
As you can see I can change one of the labels but not both. Any thoughts are much appreciated

In the documentation of ?as_labeller you can find in the examples how you get the labels for multiple faceting variables.
library(tidyverse)
df<-data.frame(x=runif(1e3),y=runif(1e3),tau=rep(c("A","aBc"),each=500),z=rep(c("DDD","EEE"),each=500))
tauNames <- c(
`A` = "10% load",
`aBc` = "40% load"
)
df%>%
ggplot(aes(x=x,y=y))+
geom_point(alpha=0.4)+
xlab(label = "Time[s]")+
ylab(label = "Dose")+
facet_grid(tau~z,labeller = labeller(tau = tauNames,
z = c("DDD" = "D", "EEE" = "E")))+
ggpubr::theme_pubclean()

Plot multiple rows as columns with ggplotly

I have the following data
dput(head(new_data))
structure(list(series = c("serie1", "serie2", "serie3",
"serie4"), Chr1_Coverage = c(0.99593043561, 0.995148711122,
0.996666194154, 1.00012127128), Chr2_Coverage = c(0.998909597935,
0.999350808049, 0.999696737431, 0.999091916132), Chr3_Coverage = c(1.0016871729,
1.00161108919, 0.997719609642, 0.999887319775), Chr4_Coverage = c(1.00238874787,
1.00024296426, 1.0032143002, 1.00118558895), Chr5_Coverage = c(1.00361001984,
1.00233184803, 1.00250793369, 1.00019989912), Chr6_Coverage = c(1.00145962318,
1.00085036645, 0.999767433622, 1.00018523387), Chr7_Coverage = c(1.00089620637,
1.00201715802, 1.00430458519, 1.00027257509), Chr8_Coverage = c(1.00130277775,
1.00332841536, 1.0027493578, 0.998107829176), Chr9_Coverage = c(0.998473062701,
0.999400379593, 1.00130178863, 0.9992796405), Chr10_Coverage = c(0.996508132358,
0.999973856701, 1.00180072957, 1.00172163916), Chr11_Coverage = c(1.00044015107,
0.998982489577, 1.00072330837, 0.998947935281), Chr12_Coverage = c(0.999707836898,
0.996654676531, 0.995380321719, 1.00116773966), Chr13_Coverage = c(1.00199118466,
0.99941499519, 0.999850500793, 0.999717689167), Chr14_Coverage = c(1.00133747054,
1.00232593477, 1.00059139379, 1.00233368187), Chr15_Coverage = c(0.997036875653,
1.0023727983, 1.00020943048, 1.00089130742), Chr16_Coverage = c(1.00527426537,
1.00318861724, 1.0004269482, 1.00471256502), Chr17_Coverage = c(0.995530811404,
0.995103514254, 0.995135851149, 0.99992196636), Chr18_Coverage = c(0.99893371568,
1.00452723685, 1.00006262572, 1.00418478844), Chr19_Coverage = c(1.00510422346,
1.00711968194, 1.00552123413, 1.00527171097), Chr20_Coverage = c(1.00113612137,
1.00130658886, 0.999390191542, 1.00178637085), Chr21_Coverage = c(1.00368753618,
1.00162782873, 1.00056883447, 0.999797571642), Chr22_Coverage = c(0.99677846234,
1.00168287612, 0.997645576841, 0.999297594524), ChrX_Coverage = c(1.04015901555,
0.934772492047, 0.98981339011, 0.999960536561), ChrY_Coverage = c(9.61374227868e-09,
2.50609172398e-07, 8.30448295172e-08, 1.23741398572e-08)), .Names = c("series",
"Chr1_Coverage", "Chr2_Coverage", "Chr3_Coverage", "Chr4_Coverage",
"Chr5_Coverage", "Chr6_Coverage", "Chr7_Coverage", "Chr8_Coverage",
"Chr9_Coverage", "Chr10_Coverage", "Chr11_Coverage", "Chr12_Coverage",
"Chr13_Coverage", "Chr14_Coverage", "Chr15_Coverage", "Chr16_Coverage",
"Chr17_Coverage", "Chr18_Coverage", "Chr19_Coverage", "Chr20_Coverage",
"Chr21_Coverage", "Chr22_Coverage", "ChrX_Coverage", "ChrY_Coverage"
), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
and I would like to plot it as this
I thought of transposing the data starting from the second column and name the new transposed data by the first column in the initial data with the following code:
output$Plot_1 <- renderPlotly({
Plot_1_new_data[,2:24] <- lapply(Plot_1_new_data[,2:24], as.numeric)
# first remember the names
n <- as.data.frame(Plot_1_new_data[0:nrow(Plot_1_new_data),1])
# transpose all but the first column (name)
Plot_1_new_data_T <- as.data.frame(t(Plot_1_new_data[,-1]))
colnames(Plot_1_new_data_T) <- n
#plot data
library(reshape)
melt_Transposed_Plot_1_new_data <- melt(Plot_1_new_data_T,id="series")
ggplotly(melt_Transposed_Plot_1_new_data,aes(x=series,y=value,colour=variable,group=variable)) + geom_line()
})
However, when I check the "Plot_1_new_data_T" it seems that the first column is named as c("serie1","serie2",..."serie14") and the rest is named as NA.
Any idea how to proceed because I am new to both R and shiny.

Something like this?
xm = melt(x)
ggplot(xm[xm$variable != 'ChrY_Coverage' & xm$variable != 'ChrX_Coverage', ],
aes(as.integer(variable), value, color=series)) +
geom_line() +
scale_x_continuous(breaks = as.integer(xm$variable),
labels = as.character(xm$variable)) +
theme(axis.text.x = element_text( angle=45, hjust = 1))
ggplotly()
Note that the last two columns were removed from this plot, because they are of such a different scale that including them masks any variation in the other columns. If you want to include all the columns, you could use this instead:
ggplot(xm, aes(as.integer(variable), value, color=series)) +
geom_line() +
...

time series plot in R

My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"

You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()

To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)

Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Format for discrete data in a ggplot barplot - r

Related

Scatter plot with ggplot2 colored by specific dates interval in r

Making a Stacked Bar Chart Out of Table Columns in R

How to edit the labels of a facet_wrap/grid if there are two variables?

Plot multiple rows as columns with ggplotly

time series plot in R

Categories

Resources