ggplot2 - Pie/Bar Chart from Multiple Columns in Data Frame - r

I have a data frame that looks like the below. I have variables three variables per observation and I would like to create a bar graph per observation for each of these three variables. However, ggplot2 doesn't appear to have a way to specify multiple columns from the same data frame. What is the correct way to graph this data?
Aiming for something similar to the image below from Wikimedia (with a graph for each observation). Source:
x English German French
Sample 1 5 10 14
Sample 2 4 4 14
Sample 3 5 10 53

Don't know why there are 2 row's per x-value.
This makes no sense. What do you want to plot? The sum per A,B,C? The mean?
Assuming you want to take the mean: Just do
dat <- read.table(textConnection(
"x A B C
1 5 10 14
1 4 4 14
2 5 10 14
2 4 4 14
3 5 10 14
3 4 4 14
"), header=TRUE)
dat <- aggregate(. ~ x, data=dat, mean) # instead of mean you can take your function
dat_molten <- melt(dat,"x")
ggplot(dat_molten, aes(x=variable, y=value)) +
geom_bar(stat="identity") +


How do you randomly assign data into equal sized control and treatment groups in R?

resample(1:534, 90, replace = FALSE)
df.orig <- read.csv("project1data.csv")
df.groups <- filter(df.orig, participate == "y")
I have randomly selected 90 house numbers from 534 and entered whether or not they were willing to participate in the study into an excel sheet and then I filtered out the people who did not want to participate in the study. How do I now randomly assign the participants into two equally sized groups (control and treatment)
You haven't provided data or code that runs so I'll generate some code to show the idea
# Create dataset with three variables
# Participate are the ones that we wish to include in the study.
# You have those in your excel file.
fakedata <- data.frame(houseid=1:534,
size=rbinom(534, size=5, prob=.5),
participate=sample(c("y", "n"), size=534, replace=TRUE))
which produces
houseid size participate
1 1 3 y
2 2 4 n
3 3 2 n
4 4 2 y
5 5 4 y
6 6 2 n
Now we can use tidyverse to generate a random permutation of cases/controls. First we create a vector of the correct length (using rep with length) and then we shuffle them using sample.
fakedata %>% # Take data
filter(participate=="y") %>%
mutate(group=sample(rep(c("Case", "Ctrl"), length=n())))
This gives
houseid size participate group
1 1 3 y Case
2 4 2 y Case
3 5 4 y Ctrl
4 7 4 y Case
5 8 1 y Case
6 9 4 y Ctrl
7 13 3 y Case
8 16 1 y Ctrl

sum up certain variables (columns) by variable names

i want to sum up certain variables (columns in a data frame).
I would like to select those variables by parts of their names.
The complex thing is that i have various conditions. So, using a single contains from dplyr does not work.
Here is an example:
ab_yy <- c(1:5)
bc_yy <- c(5:9)
cd_yy <- c(2:6)
de_xx <- c(3:7)
ab_yy bc_yy cd_yy de_xx
1 1 5 2 3
2 2 6 3 4
3 3 7 4 5
4 4 8 5 6
5 5 9 6 7
dat <- data.frame(ab_yy,bc_yy,cd_yy,de_xx)
#sum up all variables that contain yy and certain extra conditions
#may look something like this: rowSums(select(dat, contains(("yy&ab")|("yy&bc")) ) )
desired result:
6 8 10 12 14
EDIT: Fixed, sorry, low on caffeine
If you want to use dplyr, try using matches:
dat %>%
select(matches("*yy", )) %>%
select(matches("ab*|bc*")) %>%
[1] 6 8 10 12 14
I don't think that it's the best way but u can do it like that with a grepl:
rowSums(dat[,grepl(pattern = "ab.*yy|bc.*yy",colnames(dat))==T])

R plotly: Customize x-axis values in box plot

I have a data frame with 3 variables and 260 rows. (Sample below)
HouseID Town Occupants
1 D 5
2 A 3
3 B 2
4 C 4
5 A 5
6 B 2
7 C 3
8 C 8
9 C 1
10 A 3
I want to create a box plot for the distribution of Occupants with the order of x-axis based on the descending order of frequencies of Towns
Town Freq
A 3
B 2
C 4
D 1
(Shown a sample image)
I tried sorting the data frame, but still, the box plot x-axis is displayed based on alphabetical order by default. Is there a way I could do this?
You simply have to use factor to reorder levels of df$Town according to their count summary(df$Town):
df$Town <- factor(df$Town, levels(df$Town)[order(summary(df$Town),decreasing = TRUE)])
plot_ly(df, x=~Town, y=~Occupants, type="box")

group and label rows in data frame by numeric in R

I need to group and label every x observations(rows) in a dataset in R.
I need to know if the last group of rows in the dataset has less than x observations
For example:
If I use a dataset with 10 observations and 2 variables and I want to group by every 3 rows.
I want to add a new column so that the dataset looks like this:
speed dist newcol
4 2 1
4 10 1
7 4 1
7 22 2
8 16 2
9 10 2
10 18 3
10 26 3
10 34 3
11 17 4
df$group <- rep(1:(nrow(df)/3), each = 3)
This works if the number of rows is an exact multiple of 3. Every three rows will get tagged in serial numbers.
A quick dirty way to tackle the problem of not knowing how incomplete the final group is to simply check the remained when nrow is modulus divided by group size: nrow(df) %% 3 #change the divisor to your group size
assuming your data is df you can do
df$newcol = rep(1:ceiling(nrow(df)/3), each = 3)[1:nrow(df)]

How to plot two data sets having different maximum X-axis values in a single plot?

For example, I have two data sets showed below. Using position as X, and count as Y, how can I plot them out in different color lines within a single plot using ggplot2 geom_line?
dataset a:
position count
1 3
2 9
3 10
4 15
5 19
6 28
7 15
8 13
9 11
10 5
dataset b:
position count
1 4
2 8
3 16
4 17
5 19
6 10
The trick is to combine your two data frames into a single data frame. First, we create a new identifier column on each data frame:
a$dataset = "a"
b$dataset = "b"
Then we combine them
dd = rbind(a, b)
All that's left is to add geom_line but condition on the dataset number:
ggplot(dd) + geom_line(aes(position, count, colour=dataset))
