For a report I am summarizing data by a group. Due to copyright issues I have created some dummy data below (first colum is group, then values):
X A B C D
1 1 12 0 12 0
2 2 24 0 15 0
3 3 56 0 48 0
4 4 89 0 96 0
5 5 13 3 65 0
6 6 11 16 0 0
7 7 25 19 0 0
8 8 24 98 0 0
9 9 18 111 0 0
10 10 173 125 0 0
11 11 10 65 0 0
I would like to create a barplot for every group (1:11) with a loop:
for(i in 1:11){x<-dummyloop[i,]
barplot(as.matrix(x), main=paste("Group", i), ylim=c(0,200))}
This works, I get a barplot for every loop, however they end up in one 4 by for plotting window as if I had used par(mfrow=c(4,4)).
I need individual bar plots.
So I used par(mfrow=c(1,1)), which for some reason fixed the problem (I don't use par EVER, because I am only exporting for a scientific report featuring individual graphs), however the "main" is cut off on the top.
I would also like each bar to be a different color, so I used:
for(i in 1:11){x<-dummyloop[i,]
barplot(as.matrix(x), main=paste("Group", i), col=c(1:5),
ylim=c(0,200))}
Realizing that the coloring vector then only uses the first color, I tried variations of this:
for(i in 1:11){x<-dummyloop[i,]
barplot(as.matrix(x), main=paste("Group", i), col=c(4:10)[1:ncol(x)],
ylim=c(0,200))}
which doesn't do the trick...
I seem to be missing some key detail in the for loop here, thanks for help. I'm an R novice getting better every day thanks to the people here ;).
No idea, why that is in base plot. Here is a alternative way with ggplot2.
for(i in 1:11){x<- gather(data[i,])
print(ggplot(data = x, aes(x = key, y = value, fill = cols)) +
geom_bar(stat = "identity", show.legend = FALSE) +
ggtitle(paste("Group ", i)) + theme(plot.title = element_text(hjust = 0.5)) +
ylim(0,200))
}
So is your mainstill cut off?
Then extend the margin on top of the plot. Execute:
par(mar = c(2, 2, 3 , 2)) # c(bottom, left, top, right)
Before plotting. You can reset your specifications with dev.off() when experimenting.
Staying base R, you simply could use by and set col according to the group.
colors <- rainbow(length(unique(dat$X))) # define colors, 11 in your case
by(dat, dat$X, function(x)
barplot(as.matrix(x), main=paste("Group", x$X), ylim=c(0, 200), col=colors[x$X]))
Data
dat <- structure(list(X = 1:11, A = c(12L, 24L, 56L, 89L, 13L, 11L,
25L, 24L, 18L, 173L, 10L), B = c(0L, 0L, 0L, 0L, 3L, 16L, 19L,
98L, 111L, 125L, 65L), C = c(12L, 15L, 48L, 96L, 65L, 0L, 0L,
0L, 0L, 0L, 0L), D = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11"))
Related
I made a frequency table which have overlapping strata, which lead to data as follows:
dat <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L,
`[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L,
`[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L,
`[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")
The brackets, which are the column names of dat are in alphabetical order. I need them to be in numerical order, but I find it quite tricky.
Desired order:
[0,25), [25,50), [25,100), [50,100), ..., [3000,1000000]
What would be the best way to do this?
Here is a bit tricky, but maybe useful way:
ord <- gsub("\\[|\\]|\\)", "", colnames(dat)) %>%
strsplit(",") %>%
lapply(as.numeric) %>%
lapply(sum) %>%
unlist %>%
order()
colnames(dat)[ord]
# [1] "[0,25)" "[25,50)" "[25,100)" "[50,100)"
# [5] "[100,250)" "[100,500)" "[250,500)" "[500,1000)"
# [9] "[1000,1500)" "[1500,3000)" "[500,1000000]" "[1000,1000000]"
# [13] "[3000,1000000]"
dat[ord]
# [0,25) [25,50) [25,100) [50,100) [100,250) [100,500) #[250,500) [500,1000)
#Type_A 5 0 38 0 43 0 #27 44
# [1000,1500) [1500,3000) [500,1000000] [1000,1000000] #[3000,1000000]
#Type_A 0 0 0 20 0
I want to compare the means of the variables in a barplot.
This is a portion of my dataframe.
Group Gender Age Anxiety_score Depression_score IUS OBSC
1 Anxiety 0 25 32 29 12
2 Anxiety 1 48 34 28 11
3 Anxiety 0 32 48 32 12
4 Anxiety 1 24 43 26 12
5 Anxiety 1 18 44 26 15
6 Control 0 45 12 11 3
7 Control 0 44 11 11 5
8 Control 1 26 21 10 5
9 Control 1 38 12 NA 2
10 Control 0 18 13 10 1
I'd like to create a barplot where each variable (Gender, Age, Anxiety_score, depression_score, IUS, ...) represents a bar and I'd like to have this for each group (anxiety vs control next to each other, not stacked) on the same graph. The height of the bar would represent the mean. For gender, I'd like to have the gender ratio. I also want to map the variables on the y axis. How do I do this in R?
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
Then, group by Group and name, compute the means and plot.
library(dplyr)
library(tidyr)
library(ggplot2)
df1 %>%
pivot_longer(-Group) %>%
group_by(Group, name) %>%
summarise(value = mean(value), .groups = "drop") %>%
ggplot(aes(name, value, fill = Group)) +
geom_col(position = position_dodge()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Data
df1 <-
structure(list(Group = c("Anxiety", "Anxiety", "Anxiety", "Anxiety",
"Anxiety", "Control", "Control", "Control", "Control", "Control"
), Gender = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L), Age = c(25L,
48L, 32L, 24L, 18L, 45L, 44L, 26L, 38L, 18L), Anxiety_score = c(32L,
34L, 48L, 43L, 44L, 12L, 11L, 21L, 12L, 13L), Depression_score = c(29L,
28L, 32L, 26L, 26L, 11L, 11L, 10L, NA, 10L), IUS = c(12L, 11L,
12L, 12L, 15L, 3L, 5L, 5L, 2L, 1L)), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
Are you looking for something like this?
library(tidyverse)
df %>%
pivot_longer(
-Group
) %>%
group_by(Group, name) %>%
summarise(Mean=mean(value, na.rm=TRUE)) %>%
ggplot(aes(x=factor(Group), y=Mean, fill=name))+
geom_col(aes(group=name), position = "dodge") +
geom_text(
aes(label = Mean, y = Mean + 0.05),
position = position_dodge(0.9),
vjust = 0
)
I have a dataset with the number of impressions from a unique user and whether this user has been converted = 1, or not (=0). I want to create a col chart that displays the conversion rate for intervals of 20 impressions. Meaning that for each interval, the conversion rate is the number of converted users in that interval of impressions, divided by the number of unique users in that interval.
So for instance, for this dataset:
# A tibble: 19 x 2
converted tot_impr
<dbl> <dbl>
1 0 19
2 0 4
3 1 19
4 0 13
5 0 18
6 1 9
7 1 17
8 1 8
9 1 8
10 1 11
11 0 8
12 0 19
13 1 8
14 0 8
15 1 18
16 0 12
17 1 5
18 1 12
19 0 1
I should be seeing those conversion rates:
I have managed to count the number of converted users per interval using ggplot2 geom_col using the following code:
ggplot(data = db) +
geom_col(mapping = aes(x = tot_impr, y = converted), width=5)
I am struggling to force geom_col to display not the converted count in the y-axis, but to display the percentage of converted in relation to the total number of individual samples in that interval of impressions.
Could someone help me out?
Thank you in advance!
Try with this. It is better to compute your variables before plotting:
library(dplyr)
library(ggplot2)
#Code
df %>% mutate(Cut=cut(tot_impr,breaks = seq(0,20,by=5),include.lowest = T,
right = T,dig.lab = 10)) %>%
group_by(Cut) %>%
summarise(N=n(),converted=sum(converted)) %>%
mutate(conv_rate=converted/N) %>%
ggplot(aes(x=Cut,y=conv_rate))+
geom_bar(stat='identity',fill='magenta')+
scale_y_continuous(labels = scales::percent)
Output:
Some data used:
#Data
df <- structure(list(converted = c(0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L), tot_impr = c(19L,
4L, 19L, 13L, 18L, 9L, 17L, 8L, 8L, 11L, 8L, 19L, 8L, 8L, 18L,
12L, 5L, 12L, 1L)), row.names = c(NA, -19L), class = "data.frame")
Please help i am trying to make all then columns into x-axis and the make side by side bars later by date
this is my data i really tried but to no avail
dateVisited hh_visited hh_ind_confirmed new_in_mig out_mig deaths HOH_death Preg_Obs Preg_Outcome child_forms
102 2020-07-21 292 1170 131 86 18 7 3 14 79
103 2020-07-22 400 1553 115 100 25 10 11 18 107
104 2020-07-23 381 1458 103 67 21 9 5 23 87
105 2020-07-24 345 1379 90 98 12 4 3 20 89
106 2020-07-25 436 1585 131 119 13 2 7 20 117
107 2020-07-26 0 0 0 0 0 0 0
0 0
I think you're looking for something like this:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(cols = -1) %>%
ggplot(aes(name, value)) +
geom_col(aes(fill = dateVisited), width = 0.6,
position = position_dodge(width = 0.8)) +
guides(x = guide_axis(angle = 45))
Reproducible Data from question
df <- structure(list(dateVisited = structure(1:6, .Label = c("2020-07-21",
"2020-07-22", "2020-07-23", "2020-07-24", "2020-07-25", "2020-07-26"
), class = "factor"), hh_visited = c(292L, 400L, 381L, 345L,
436L, 0L), hh_ind_confirmed = c(1170L, 1553L, 1458L, 1379L, 1585L,
0L), new_in_mig = c(131L, 115L, 103L, 90L, 131L, 0L), out_mig = c(86L,
100L, 67L, 98L, 119L, 0L), deaths = c(18L, 25L, 21L, 12L, 13L,
0L), HOH_death = c(7L, 10L, 9L, 4L, 2L, 0L), Preg_Obs = c(3L,
11L, 5L, 3L, 7L, 0L), Preg_Outcome = c(14L, 18L, 23L, 20L, 20L,
0L), child_forms = c(79L, 107L, 87L, 89L, 117L, 0L)), class = "data.frame",
row.names = c("102", "103", "104", "105", "106", "107"))
Your data cannot be used easily since it requires time to format it into something that could ingested by R. Here is something to get you started. I made up a hypothetical dataframe of 4 columns that resemble your data, use the function melt from reshape2 package to format the data such that it is understandable by ggplot2 package, and use ggplot2 package to generate a bar plot.
df <- data.frame(dateVisited = seq(as.Date('2019-01-01'), as.Date('2019-12-31'), 30),
hh_visited = runif(13, 0, 436),
hh_ind_confirmed = runif(13, 0, 1585),
new_in_mig = runif(13, 0, 131))
df <- reshape2::melt(df, id.vars = 'dateVisited')
ggplot(data = df, aes(x = dateVisited, y = value, fill = variable))+
geom_col(position = 'dodge')
I am a novice in R language. I am having text file separated by tab available with sales data for each day. The format will be like product-id, day0, day1, day2, day3 and so on. The part of the input file given below
productid 0 1 2 3 4 5 6
1 53 40 37 45 69 105 62
4 0 0 2 4 0 8 0
5 57 133 60 126 90 87 107
6 108 130 143 92 88 101 66
10 0 0 2 0 4 0 36
11 17 22 16 15 45 32 36
I used code below to read a file
pdInfo <- read.csv("products.txt",header = TRUE, sep="\t")
This allows to read the entire file and variable x is a data frame. I would like to change data.frame x to time series object in order for the further processing.On a stationary test, Dickey–Fuller test (ADF) it shows an error. I tried the below code
x <- ts(data.matrix(pdInfo),frequency = 1)
adf <- adf.test(x)
error: Error in adf.test(x) : x is not a vector or univariate time series
Thanks in advance for the suggestions
In R, time series are usually in the form "one row per date", where your data is in the form "one column per date". You probably need to transpose the data before you convert to a ts object.
First transpose it:
y= t(pdInfo)
Then make the top row (being the product id's) into the row titles
colnames(y) = y[1,]
y= y[-1,] # to drop the first row
This should work:
x = ts(y, frequency = 1)
library(purrr)
library(dplyr)
library(tidyr)
library(tseries)
# create the data
df <- structure(list(productid = c(1L, 4L, 5L, 6L, 10L, 11L),
X0 = c(53L, 0L, 57L, 108L, 0L, 17L),
X1 = c(40L, 0L, 133L, 130L, 0L, 22L),
X2 = c(37L, 2L, 60L, 143L, 2L, 16L),
X3 = c(45L, 4L, 126L, 92L, 0L, 15L),
X4 = c(69L, 0L, 90L, 88L, 4L, 45L),
X5 = c(105L, 8L, 87L, 101L, 0L, 32L),
X6 = c(62L, 0L, 107L, 66L, 36L, 36L)),
.Names = c("productid", "0", "1", "2", "3", "4", "5", "6"),
class = "data.frame", row.names = c(NA, -6L))
# apply adf.test to each productid and return p.value
adfTest <- df %>% gather(key = day, value = sales, -productid) %>%
arrange(productid, day) %>%
group_by(productid) %>%
nest() %>%
mutate(adf = data %>% map(., ~adf.test(as.ts(.$sales)))
,adf.p.value = adf %>% map_dbl(., "p.value")) %>%
select(productid, adf.p.value)