I have a data frame that looks like this:
Year Women Men
1 2013 145169 889190
2 2014 119064 849778
3 2015 210107 1079592
4 2016 221217 1427639
5 2017 205000 1692592
6 2018 273721 1703456
7 2019 434407 2010493
I want to make a geom_bar, where x is a year and every year has two bars for a number from Women and Men. I have found a solution where this table should looks different, but I'm wondering if there is an option to work with this one. Thank You for any help :)
You can use the following code
library(tidyverse)
df %>%
pivot_longer(cols = -c(Year,Sl), values_to = "Value", names_to = "Name") %>%
ggplot(aes(x = Year, y = Value, fill = Name))+geom_col(position = "dodge")
Data
df = structure(list(Sl = 1:7, Year = 2013:2019, Women = c(145169L,
119064L, 210107L, 221217L, 205000L, 273721L, 434407L), Men = c(889190L,
849778L, 1079592L, 1427639L, 1692592L, 1703456L, 2010493L)), class = "data.frame", row.names = c(NA,
-7L))
Related
I have a large dataset that is tidy and it is formatted as such
year Risk_score
2019 a
2019 b
2019 c
2020 d
2020 e
2020 f
2021 g
2021 h
2021 i
where the letters are different values
whenever I try to use geom_boxplot to visualize the data it just either shows me a big diagonal line or a straight line
senderoriskyy %>% ggplot(aes(x = factor(Risk_score), y = year)) + geom_boxplot()
I've tried changing the variables to factors and numeric variables but it just keeps giving me the same result
output from running
dput(head(senderoriskyy, 10))
structure(list(year = c("Risk_Score_in_2020", "Risk_Score_in_2021",
"Risk_Score_in_2020", "Risk_Score_in_2021", "Risk_Score_in_2019",
"Risk_Score_in_2020", "Risk_Score_in_2021", "Risk_Score_in_2019",
"Risk_Score_in_2020", "Risk_Score_in_2021"), Risk_score = c(`0.33040000000000003` = 0.3304,
`0.30687999999999999` = 0.30688, `2.9` = 2.9, `0.46500000000000002` = 0.465,
`1.16256` = 1.16256, `0.32256000000000001` = 0.32256, `0.27776000000000001` = 0.27776,
`0.19488` = 0.19488, `26.905999999999999` = 26.906, `23.581` = 23.581
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
How can I build stacked bar chart from this data? Where years will be x axis while OLD and NEW differentiated via colours in bars.
However I want to avoid manual coding and automatize the process.
structure(list(`1998` = c(11, 826), `2000` = c(217, 620), `2007` = c(625,
212), `2012` = c(836, 1)), class = "data.frame", row.names = c("NEW",
"OLD"))
1998 2000 2007 2012
NEW 11 217 625 836
OLD 826 620 212 1
Expected output:
Looking for something like this?
library(tidyverse)
df %>%
# rownames to column
mutate(type = rownames(.)) %>%
# convert to long data
pivot_longer(-"type") %>%
# plot
ggplot() +
geom_col(aes(x = name, y = value, fill = type))
I will put my data first, to better understand the question:
amount city agent address
1 Madras Vinod 45/BA
2 Kalkta Bola 56/AS
3 Mumbai Pavan 44/AA
4 Tasha Barez 58/SD
5 Tasha Khan 22/AW
6 Madras Baaz 56/QE
7 Mumbai Neer 99/CC
8 Mumbai Bazan 97/DF
I am learning R. In a scenario, I want to calculate the total numbers of amount in a specific city and then draw a bar chart for that, showing all cities. Considering the data above, I want something like this:
amount city
7 Madras
2 Kalkta
18 Mumbai
9 Tasha
After some searching I found that aggregate function can help, but I faced a problem that says the length is not the same.
Would you please tell me, how can I achieve this?
base R
res <- do.call(rbind,
by(dat, dat$city, FUN = function(z) data.frame(city = z$city[1], amount = sum(z$amount)))
)
barplot(res$amount, names.arg=res$city)
tidyverse
library(dplyr)
res <- dat %>%
group_by(city) %>%
summarize(amount = sum(amount))
barplot(res$amount, names.arg=res$city)
Data
dat <- structure(list(amount = 1:8, city = c("Madras", "Kalkta", "Mumbai", "Tasha", "Tasha", "Madras", "Mumbai", "Mumbai"), agent = c("Vinod", "Bola", "Pavan", "Barez", "Khan", "Baaz", "Neer", "Bazan"), address = c("45/BA", "56/AS", "44/AA", "58/SD", "22/AW", "56/QE", "99/CC", "97/DF")), class = "data.frame", row.names = c(NA, -8L))
Another way to do it using the tidyverse
amount <- c(1,2,3,4,5,6,7,8)
city <- c("Madras", "Kalkta", "Mumbai", "Tasha", "Tasha", "Madras", "Mumbai",
"Mumbai")
df <- tibble(amount = amount, city = city)
df %>%
group_by(city) %>%
summarise(amount = sum(amount, na.rm = T)) %>%
ggplot(aes(x = city, y = amount)) +
geom_col() +
geom_label(aes(label = amount)) +
theme_bw()
I'm trying to graph multiple dataframe columns in R.
(like this-> Graphing multiple variables in R)
bid ask date
1 20.12 20.14 2014-10-31
2 20.09 20.12 2014-11-03
3 20.03 20.06 2014-11-04
4 19.86 19.89 2014-11-05
This is my data.
And I can make one line graph like this.
`data%>% select(bid,ask,date) %>% hchart(type='line', hcaes(x='date', y='bid'))`
I want to add ask line graph in this graph.
One way is to reshape (gather) the values to plot and then add a group aesthetic to the hchart function:
library(tidyr)
data %>% select(bid,ask,date) %>%
gather("key", "value", bid, ask) %>%
hchart(type='line', hcaes(x='date', y='value', group='key'))
ps. Don't forget to load all the necessary libraries
You can use the following code
library(reshape2)
library(highcharter)
df_m <- melt(df, id="date")
hchart(df_m, "line", hcaes(x = date, y = value, group = variable))
Here is the data
df = structure(list(bid = c(20.12, 20.09, 20.03, 19.86), ask = c(20.14,
20.12, 20.06, 19.89), date = structure(c(4L, 1L, 2L, 3L), .Label = c("03/11/2014",
"04/11/2014", "05/11/2014", "31/10/2014"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()
To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)
Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])