Sub-setting variable column in melt data frame

Sub-setting variable column in melt data frame - r

Data :
Cat <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
variable <- c("IL_1_Flag_p", "IL_1_Flag_p", "IL_1_Flag_p", "IL_1_Flag_p", "IL_2_Flag_p", "IL_2_Flag_p", "IL_2_Flag_p","IL_2_Flag_p", "IL_3_Flag_p", "IL_3_Flag_p", "IL_3_Flag_p", "IL_3_Flag_p", "IL_4_Flag_p", "IL_4_Flag_p", "IL_4_Flag_p", "IL_4_Flag_p", "IL_5_Flag_p", "IL_5_Flag_p", "IL_5_Flag_p", "IL_5_Flag_p")
value <- c(21,17,16,210,20,17,15,189,20,17,15,188,19,17,15,188,20,17,15,194)
agg_melt_p <- data.frame(cat, variable, value)
I want to plot line chart for only "IL5_Flag_p" which is in the variable column.Tried using subset from plyr package but it is not working and showing error . I am combining 2 plots (a bar chart and this line chart).Original data uses melted dataframe from melt in reshape2
For ggplot I am using this piece:
ggplot() + geom_line(data = agg_melt_p, aes(x=Category , y=value , colour=variable))
Please help

One solution using dplyr:
agg_melt_p %>% filter(variable == "IL_5_Flag_p") %>%
ggplot() +
geom_line(aes(x=Cat, y=value, colour = variable))
This subsets the data frame the way you want without altering the object itself and then passes it to your ggplot command. The colour=variable bit in your code is not necessary, but you can leave it in if you want to generate a legend automatically.

Related

Trying to Plot Each Column of Dataframe as its Own Histogram

What I'm currently stuck on is trying to plot each column of my dataframe as its own histogram in ggplot. I attached a screenshot below:
Ideally I would be able to compare the values in every 'Esteem' column side-by-side by plotting multiple histograms.
I tried using the melt() function to reshape my dataframe, and then feed into ggplot() but somewhere along the way I'm going wrong...

You could pivot to long, then facet by column:
library(tidyr)
library(ggplot2)
esteem81_long <- esteem81 %>%
pivot_longer(
Esteem81_1:Esteem81_10,
names_to = "Column",
values_to = "Value"
)
ggplot(esteem81_long, aes(Value)) +
geom_bar() +
facet_wrap(vars(Column))
Or for a list of separate plots, just loop over the column names:
plots <- list()
for (col in names(esteem81)[-1]) {
plots[[col]] <- ggplot(esteem81) +
geom_bar(aes(.data[[col]]))
}
plots[["Esteem81_4"]]
Example data:
set.seed(13)
esteem81 <- data.frame(Subject = c(2,6,7,8,9))
for (i in 1:10) {
esteem81[[paste0("Esteem81_", i)]] <- sample(1:4, 5, replace = TRUE)
}

esteem_long <- esteem81 %>% pivot_longer(cols = -c(Subject))
plot <- ggplot(esteem_long, aes(x = value)) +
geom_histogram(binwidth = 1) +
facet_wrap(vars(name))
plot
I'm using pivot_longer() from tidyr and ggplot2 for the plotting.
The line pivot_longer(cols = -c(Subject)) reads as "apart from the "Subject" column, all the others should be pivoted into long form data." I've left the default new column names ("name" and "value") - if you rename them then be sure to change the downstream code.
geom_histogram automates the binning and tallying of the data into histogram format - change the binwidth parameter to suit your desired outcome.
facet_wrap() allows you to specify a grouping variable (here name) and will replicate the plot for each group.

Making a ggplot boxplot where each column is it's own boxplot

when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)
this is an image of the boxplot made with the simple R boxplot function
However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?

geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...
library(tidyverse)
naive_capqx %>%
pivot_longer(everything(), values_to="Value", names_to="Variable") %>%
ggplot() +
geom_boxplot(aes(x=Variable, y=Value))
giving

Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).
pacman::p_load(ggplot2, tidyverse)
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
gather("key", "value")) %>%
group_by(key)
ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
geom_boxplot()

How to use ggplot to create facets with two factors?

I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.

ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")

Iterate over a dataframe and create one plot for each column

I would like to iterate over a data frame and plot each column against a particular column such as price.
What I have done so far is:
for(i in ncol(dat.train)) {
ggplot(dat.train, aes(dat.train[[,i]],price)) + geom_point()
}
What I want is to have the first introduction to my data (Approximately 300 columns) by plotting against the decision variable (i.e., price)
I know that there is a similar question, though I cannot really understand why the above is not really working.

You can do this, I have used mtcars data to plot other continuous variables with mpg. You have to melt the data into long form (use gather) and then use ggplot to plot these contiuous variables (disp,drat,qsec etc) against mpg. In your case instead of mpg you would take price and all the other continuous variables to be melted (like here disp,drat,qsec etc), the rest categorical variables can be taken for shape and colors etc (optional).
library(tidyverse)
mtcars %>%
gather(-mpg, -hp, -cyl, key = "var", value = "value") %>%
ggplot(aes(x = value, y = mpg, color = hp, shape = factor(cyl))) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
EDIT:
This is another solution in case we need separate graphs for each of the variables.
Create a list of variables like this: lyst <- list("disp","hp") , you can use colnames function to get all the variable names. Use lapply to to loop through all the "lyst" objects on your data frame.
setwd("path") ###set the working directory here, This is the place where all the files are saved.
pdf(file=paste0("one.pdf"))
lapply(lyst, function(i)ggplot(mtcars, aes_string(x=i, y="mpg")) + geom_point())
dev.off()
A pdf file wil. be generated with all the graphs pdfs at your working directory which you have set
Output from solution first:

ggplot bar chart for time series

I'm reading the book by Hadley Wickham about ggplot, but I have trouble to plot certain weights over time in a bar chart. Here is sample data:
dates <- c("20040101","20050101","20060101")
dates.f <- strptime(dates,format="%Y%m%d")
m <- rbind(c(0.2,0.5,0.15,0.1,0.05),c(0.5,0.1,0.1,0.2,0.1),c(0.2,0.2,0.2,0.2,0.2))
m <- cbind(dates.f,as.data.frame(m))
This data.frame has in the first column the dates and each row the corresponding weights. I would like to plot the weights for each year in a bar chart using the "fill" argument.
I'm able to plot the weights as bars using:
p <- ggplot(m,aes(dates.f))
p+geom_bar()
However, this is not exactly what I want. I would like to see in each bar the contribution of each weight. Moreover, I don't understand why I have the strange format on the x-axis, i.e. why there is "2004-07" and "2005-07" displayed.
Thanks for the help

Hope this is what you are looking for:
ggplot2 requires data in a long format.
require(reshape2)
m_molten <- melt(m, "dates.f")
Plotting itself is done by
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity")
You can add position="dodge" to geom_bar if you want then side by side.
EDIT
If you want yearly breaks only: convert m_molten$dates.f to date.
require(scales)
m_molten$dates.f <- as.Date(m_molten$dates.f)
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity") +
scale_x_date(labels = date_format("%y"), breaks = date_breaks("year"))
P.S.: See http://vita.had.co.nz/papers/tidy-data.pdf for Hadley's philosophy of tidy data.

To create the plot you need, you have to reshape your data from "wide" to "tall". There are many ways of doing this, including the reshape() function in base R (not recommended), reshape2 and tidyr.
In the tidyr package you have two functions to reshape data, gather() and spread().
The function gather() transforms from wide to tall. In this case, you have to gather your columns V1:V5.
Try this:
library("tidyr")
tidy_m <- gather(m, var, value, V1:V5)
ggplot(tidy_m,aes(x = dates.f, y=value, fill=var)) +
geom_bar(stat="identity")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Sub-setting variable column in melt data frame - r

Related

Trying to Plot Each Column of Dataframe as its Own Histogram

Making a ggplot boxplot where each column is it's own boxplot

How to use ggplot to create facets with two factors?

Iterate over a dataframe and create one plot for each column

ggplot bar chart for time series

Categories

Resources