Converting a factor into a string column in a dataset

Converting a factor into a string column in a dataset - r

I am trying to plot the Id column with some other variables which I have managed to do with geom_col but when my plot is retrieved I can see that R is taking the column "Id" as a factor or number and I am not getting the results I am looking for, here the graph:
How can I convert the column into a string so that it actually allow me to plot all the users that participated in the survey which are 33? Here is where I'm coming from:
activity_distance <-
merged_activity_calories %>%
group_by(Id) %>%
summarise(
mean_activity_distance= mean(VeryActiveDistance),
mean_ma_distance= mean(ModeratelyActiveDistance),
mean_la_distance= mean(LightActiveDistance),
mean_sa_distance= mean(SedentaryActiveDistance)
)
ggplot(data= activity_distance) +
geom_col(mapping= aes(x=Id , y= mean_activity_distance))

You can convert ID into a factor.
library(ggplot)
df <- data.frame(id = c(1232121321321321,123213213213,123213213213213),
y = c(123,234,22.4))
ggplot(df) +
geom_col(mapping = aes(x = (id), y = y))
ggplot(df) +
geom_col(mapping = aes(x = factor(id), y = y))

Related

Splitting a dataframe by every n unique values of a variable

I have a dataframe of Lots, Time, Value with the same structure as the sample data below.
df <- tibble(Lot = c(rep(123,4),rep(265,5),rep(132,3),rep(455,4)),
time = c(seq(4), seq(5), seq(3), seq(4)), Value = runif(16))
I'd like to split the dataframe by every N Lots and plot them. The Lots are different sizes so I can't subset the data by every n rows!
I've been using an approach like this but it's not scalable for a large dataset.
df %>% filter(Lot == c(123, 265)) %>% ggplot(., aes(x = time, y = Value)) +
geom_point() + stat_smooth()
How can I do this?

Create a lot number column and create a list of plots for every n unique lot values.
This would give you list of plots.
library(tidyverse)
lot_n <- 2
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
group_split(group) %>%
map(~ggplot(.x, aes(x = time, y = Value)) +
geom_point() + stat_smooth()) -> list_plots
list_plots
Individual plots can be accessed via list_plots[[1]], list_plots[[2]] etc.
You can also plot the data with facets.
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
ggplot(aes(x = time, y = Value)) +
geom_point() + stat_smooth() +
facet_wrap(~group, scales = 'free')

Method of ordering groups in ggplot line plot

I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.

The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")

Within a function, how to create a discrete axis with _repeated and ordered_ labels

I want to create a function that makes a heatmap where the y axis will have unique breaks, but repeated and ordered labels. I know that this is might not be a great practice. I am also aware that similar questions have been asked before. For example: ggplot in R, reordering the bars. But I want to achieve these repeated and ordered labels through sorting within a function, not by typing them manually. I am aware of solutions for reordering axes based on the values of factor (e.g., Order Bars in ggplot2 bar graph), but I don't think they apply or can't see how to apply these to my case, where the breaks are unique but the labels repeat.
Here is some code to reproduce the problem and some of my attempts:
Libraries and data
library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(4)
id <- LETTERS[1:10]
lab <- paste(c("AB", "CD"), 1:5, sep = "_") %>%
sample(., size = 10, replace = TRUE)
val <- sample.int(n = 6, size = 10, replace = TRUE)
tes <- ifelse(val >= 4, 1, 0)
dat <- data.frame(id, lab, val, tes)
A heatmap with unique breaks on the y axis
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)
A heatmap where the y axis is labeled with repeated labels instead of the unique breaks
This works, to the point that labels are used instead of unique ids, but the y axis is not ordered by the labels. Also, I am not sure about setting breaks and labels from the data frame in wide format (dat), rather than the data frame in long format used by ggplot (dat2).
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks=dat$id, labels=dat$lab)
Mapping the vector of with repeated values on the y axis obviously doesn't work
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = lab, fill = value), color="white", size=1)
Repeated and ordered labels, try 1
As expected, merely sorting the input data by the non-unique lab variable does not work.
dat2 <- dat %>% gather(kind, value, val:tes) %>%
arrange(lab)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks=id, label=lab)
Repeated and ordered labels, try 2
Try to create a named breaks vector ordered by the (repeating) labels. This gets me nowhere. Half the labels are missing and they are still not sorted.
dat2 <- dat %>% gather(kind, value, val:tes)
brks <- setNames(dat$id, dat$lab)[sort(dat$lab)]
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks = brks, labels = names(brks))
Repeated and ordered labels, try 3
Starting with the data frame sorted by label, try to create an ordered factor for lab. Then sort the table by this ordered factor. No luck.
dat2 <- dat %>% gather(kind, value, val:tes) %>% arrange(lab)
dat2 <- mutate(dat2, lab_f=factor(lab, levels=sort(unique(lab)), ordered = TRUE))
dat2 <- arrange(dat2, lab_f)
# check
dat2$lab_f
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks = dat2$id, labels = dat2$lab_f)
A workaround, which I can use if I have to, but I am trying to avoid
We can create a combination of id and lab which will be unique and use it for the y axis
dat2 <- dat %>% gather(kind, value, val:tes) %>%
mutate(id_lab=paste(lab, id, sep="_"))
ggplot(dat2) +
geom_tile(aes(x = kind, y = id_lab, fill = value), color="white", size=1)
I must be missing something. Any help is much appreciated.
The goal is to have a function that will take an arbitrarily long table and plot a y axis with unique breaks but (possibly) repeated and ordered labels.
heat <- function(dat) {
dat2 <- dat %>% gather(kind, value, val:tes)
# any other manipulation here
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)
# scale_y_discrete() (if needed)
}
The plot I am looking for is something like this (created in inkscape)

Using limits instead of breaks sets the order:
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
geom_text(aes(x = 1, y = id, label = id), col = 'white') +
scale_y_discrete(limits = dat$id[order(dat$lab)], labels = sort(dat$lab))

How to plot multiple facets histogram with ggplot in r?

i have a dataframe structured like this
Elem. Category. SEZa SEZb SEZc
A. ONE. 1. 3. 4
B. TWO. 4. 5. 6
i want to plot three histograms in three different facets (SEZa, SEZb, SEZc) with ggplot where the x values are the category values (ONE. e TWO.) and the y values are the number present in columns SEZa, SEZb, SEZc.
something like this:
how can I do? thank you for your suggestions!

Assume df is your data.frame, I would first convert from wide format to a long format:
new_df <- reshape2::melt(df, id.vars = c("Elem", "Category"))
And then make the plot using geom_col() instead of geom_histogram() because it seems you've precomputed the y-values and wouldn't need ggplot to calculate these values for you.
ggplot(new_df, aes(x = Category, y = value, fill = Elem)) +
geom_col() +
facet_grid(variable ~ .)

I think that what you are looking for is something like this :
library(ggplot2)
library(reshape2)
df <- data.frame(Category = c("One", "Two"),
SEZa = c(1, 4),
SEZb = c(3, 5),
SEZc = c(4, 6))
df <- melt(df)
ggplot(df, aes(x = Category, y = value)) +
geom_col(aes(fill = variable)) +
facet_grid(variable ~ .)
My inspiration is :
http://felixfan.github.io/stacking-plots-same-x/

How to not plot gaps in timeseries with R

I have some time series data with gaps.
df<-read.table(header=T,sep=";", text="Date;x1;x2
2014-01-10;2;5
2014-01-11;4;7
2014-01-20;8;9
2014-01-21;10;15
")
df$Date <- strptime(df$Date,"%Y-%m-%d")
df.long <- melt(df,id="Date")
ggplot(df.long, aes(x = Date, y = value, fill = variable, order=desc(variable))) +
geom_area(position = 'stack')
Now ggplot fills in the missing dates (12th, 13th, ...). What I want is just ggplot to not interpolate the missing dates and just draw the data available. I've tried filling NA with merge for the missing dates, which results in an error message of removed rows.
Is this possible? Thanks.

You can add an additional variable, group, to your data frame indicating whether the difference between two dates is not equal to one day:
df.long$group <- c(0, cumsum(diff(df.long$Date) != 1))
ggplot(df.long, aes(x = Date, y = value, fill = variable,
order=desc(variable))) +
geom_area(position = 'stack', aes(group = group))
Update:
In order to remove the space between the groups, I recommend facetting:
library(plyr)
df.long2 <- ddply(df.long, .(variable), mutate,
group = c(0, cumsum(diff(Date) > 1)))
ggplot(df.long2, aes(x = Date, y = value, fill = variable,
order=desc(variable))) +
geom_area(position = 'stack',) +
facet_wrap( ~ group, scales = "free_x")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Converting a factor into a string column in a dataset - r

You can convert ID into a factor. library(ggplot) df <- data.frame(id = c(1232121321321321,123213213213,123213213213213), y = c(123,234,22.4)) ggplot(df) + geom_col(mapping = aes(x = (id), y = y)) ggplot(df) + geom_col(mapping = aes(x = factor(id), y = y))

Related

Splitting a dataframe by every n unique values of a variable

Method of ordering groups in ggplot line plot

Within a function, how to create a discrete axis with _repeated and ordered_ labels

How to plot multiple facets histogram with ggplot in r?

How to not plot gaps in timeseries with R

Categories

Resources