How to use mutate function in R? - r

I am trying to create plot in R. For that I first need to mutate a new column called "export_ratio". I have tried using the below code, but I am getting error as given below the code. Can someone please help?
eu_macro %>%
mutate(export_ratio = (exports/gdp)*100) %>%
filter(year>1995) %>%
filter(country %in% c("Germany","France","Spain","Sweden")) %>%
ggplot(aes(year,export_ratio))+
geom_line(aes(color=country))
Error: Problem with mutate() column export_ratio.
i export_ratio = (exports/gdp) * 100.
x non-numeric argument to binary operator
Sample Data:

The problem here is that your exports variable is a character type, so it does not make sense to do (exports/gdp). So, you can convert it to numeric:
eu_macro %>%
mutate(export_ratio = (as.numeric(exports)/gdp)*100) %>%

Related

How do I convert a column from character to double in R?

I am trying to group a dataset on a certain value and then sum a column based on this grouped value.
UN.surface.area.share <- left_join(countries, UN.surface.area, by = 'country') %>% drop_na() %>%
rename('surface.area' = 'Surface.area..km2.') %>% group_by(region) %>% summarise(total.area = sum(surface.area))
When I run this I get this error:
Error: Problem with `summarise()` input `total.area`.
x invalid 'type' (character) of argument
i Input `total.area` is `sum(surface.area)`.
i The error occurred in group 1: region = "Africa".
I think the problem is that the 'surface.area' column is of the character type and therefore the sum function doesn't work. I tried adding %>% as.numeric('surface.area') to the previous code:
UN.surface.area.share <- left_join(countries, UN.surface.area, by = 'country') %>% drop_na() %>%
rename('surface.area' = 'Surface.area..km2.') %>% as.numeric('surface.area') %>% group_by(region) %>% summarise(total.area = sum(surface.area))
But this gives the following error:
Error in group_by(., region) :
'list' object cannot be coerced to type 'double'
I think this problem can be solved by changing the 'surface.area' column to a numeric datatype but I am not sure how to do this. I checked the column and it only consists of numbers.
Use dplyr::mutate()
So instead of:
... %>% as.numeric('surface.area') %>%...
do:
...%>% mutate(surface.area = as.numeric(surface.area)) %>%...
mutate() changes one or more variables within a dataframe. When you pipe to is.numeric, as you're currently doing, you're effectively asking R to run
as.numeric(data.frame.you.piped.in, 'surface.area')
as.numeric then tries to convert the data frame into a number, which it can't do since the data frame is a list object. Hence your error. It's also running with two arguments, which will cause a crash regardless of the structure of the first argument.

Applying map function to a nested tibble in R

I'm trying to replicate an 'old' R script I found for the tidyverse package.
library(dslabs)
DataTib<-as_tibble(us_contagious_diseases)
DataTib_nested <- DataTib %>%
group_by(disease) %>%
nest()
Mean_count_nested <- DataTib_nested %>%
mutate(mean_count = map(.x=DataTib_nested$data, ~mean(.x$count)))
As I understand, I have a tibble where data was grouped by disease and the remaining variables/data were nested, and then I'm trying to add a new column which should represent the average for variable "count" on that nested dataframe.
But I get the error, which I don't quite understand:
Error: Problem with `mutate()` input `mean_count`.
x Input `mean_count` can't be recycled to size 1.
i Input `mean_count` is `map(.x = DataTib_nested$data, ~mean(.x$count))`.
i Input `mean_count` must be size 1, not 7.
i The error occured in group 1: disease = "Hepatitis A".
Thanks in advance and best regards!
Your syntax is slightly wrong:
DataTib_nested <- DataTib %>%
group_by(disease) %>%
nest(data = - disease)
Mean_count_nested <- DataTib_nested %>%
mutate(mean_count = map_dbl(data, ~mean(.x$count)))
Note that I use map_dbl
instead of map since the return value is numeric.

R : doesn't recognise column in a new table

This is part of an online course I am doing, R for data analysis.
A tibble is created using the group_by and summarise functions on the diamonds data set - the new tibble indeed exists and looks as you would expect, I checked. Now a bar plot has to be created using these summary values in the new tibble, but it gives me all sorts of errors associated with not recognising the columns.
I transformed the tibble into a data frame, and still get the same problem.
Here is the code:
diamonds_by_color <- group_by(diamonds, color)
diamonds_mp_by_color <- summarise(diamonds_by_color, mean_price = mean(price))
diamonds_mp_by_color <- as.data.frame(diamonds_mp_by_color)
colorcounts <- count(diamonds_by_color$mean_price)
colorbarplot <- barplot(diamonds_by_color$mean_price, names.arg = diamonds_by_color$color,
main = "Average price for different colour diamonds")
The error I get when running the function count is:
Error in UseMethod("summarise_") :
no applicable method for 'summarise_' applied to an object of class "NULL"
In addition: Warning message:
Unknown or uninitialised column: 'mean_price'.
It's probably something trivial but I have been reading quite a lot and tried a few things and can't figure it out. Any help will be super appreciated :)
Your diamonds_by_color never has mean_price assigned to it.
Your last two lines of code work if you reference diamonds_mp_by_color instead:
colorcounts <- count(diamonds_mp_by_color, mean_price)
barplot(diamonds_mp_by_color$mean_price,
names.arg=diamonds_mp_by_color$color,
main="Average price for different colour diamonds")
Here is a way to summarise the price by color using dplyr and piping straight to a barplot using ggplot2.
diamonds %>% group_by(color) %>%
summarise(mean.price=mean(price,na.rm=1)) %>%
ggplot(aes(color,mean.price)) + geom_bar(stat='identity')
Best dplyr idiom is not to declare a temporary result for each operation. Just do one big pipe; also the %>% notation is clearer because you don't have to keep specifying which dataframe as the first arg in each operation:
diamonds %>%
group_by(color) %>%
summarise(mean_price = mean(price)) %>%
tally() %>% # equivalent to n() on a group
# may need ungroup() %>%
barplot(mean_price, names.arg = color,
main = "Average price for different colour diamonds")
(Something like that. You can assign the output of the pipe before the barplot if you like. I'm transiting through an airport so I can't check it in R.)

With dplyr and enquo my code works but not when I pass to purrr::map

I want to create a plot for each column in a vector called dates. My data frame contains only these columns and I want to group on it, count the occurrences and then plot it.
Below code works, except for map which I want to use to go across a previously unknown number of columns. I think I'm using map correctly, I've had success with it before. I'm new to using quosures but given that my function call works I'm not sure what is wrong. I've looked at several other posts that appear to be set up this way.
df <- data.frame(
date1 = c("2018-01-01","2018-01-01","2018-01-01","2018-01-02","2018-01-02","2018-01-02"),
date2 = c("2018-01-01","2018-01-01","2018-01-01","2018-01-02","2018-01-02","2018-01-02"),
stringsAsFactors = FALSE
)
dates<-names(df)
library(tidyverse)
dates.count<-function(.x){
group_by<-enquo(.x)
df %>% group_by(!!group_by) %>% summarise(count=n()) %>% ungroup() %>% ggplot() + geom_point(aes(y=count,x=!!group_by))
}
dates.count(date1)
map(dates,~dates.count(.x))
I get this error: Error in grouped_df_impl(data, unname(vars), drop) : Column .x is unknown
When you pass the variable names to map() you are using strings, which indicates you need ensym() instead of enquo().
So your function would look like
dates.count <- function(.x){
group_by = ensym(.x)
df %>%
group_by(!!group_by) %>%
summarise(count=n()) %>%
ungroup() %>%
ggplot() +
geom_point(aes(y=count,x=!!group_by))
}
And you would use the variable names as strings for the argument.
dates.count("date2")
Note that tidyeval doesn't always play nicely with the formula interface of map() (I think I'm remembering that correctly). You can always do an anonymous function instead, but in your case where you want to map the column names to a function with a single argument you can just do
map(dates, dates.count)
Using the formula interface in map() I needed an extra !!:
map(dates, ~dates.count(!!.x))

Passing column names as both variables and columns in a single dplyr function in R

I am writing a code in which a column name (e.g. "Category") is supplied by the user and assigned to a variable biz.area. For example...
biz.area <- "Category"
The original data frame is saved as risk.data. User also supplies the range of columns to analyze by providing column names for variables first.column and last.column.
Text in these columns will be broken up into bigrams for further text analysis including tf_idf.
My code for this analysis is given below.
x.bigrams <- risk.data %>%
gather(fields, alldata, first.column:last.column) %>%
unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>%
count(bigrams, biz.area, sort=TRUE) %>%
bind_tf_idf(bigrams, biz.area, n) %>%
arrange(desc(tf_idf))
However, I get the following error.
Error in grouped_df_impl(data, unname(vars), drop) : Column
x.biz.area is unknown
This is because count() expects a column name text string instead of variable biz.area. If I use count_() instead, I get the following error.
Error in compat_lazy_dots(vars, caller_env()) : object 'bigrams'
not found
This is because count_() expects to find only variables and bigrams is not a variable.
How can I pass both a constant and a variable to count() or count_()?
Thanks for your suggestion!
It looks to me like you need to enclosures, so that you can pass column names as variables, rather than as strings or values. Since you're already using dplyr, you can use dplyr's non-standard evaluation techniques.
Try something along these lines:
library(tidyverse)
analyze_risk <- function(area, firstcol, lastcol) {
# turn your arguments into enclosures
areaq <- enquo(area)
firstcolq <- enquo(firstcol)
lastcolq <- enquo(lastcol)
# run your analysis on the risk data
risk.data %>%
gather(fields, alldata, !!firstcolq:!!lastcolq) %>%
unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>%
count(bigrams, !!areaq, sort=TRUE) %>%
bind_tf_idf(bigrams, !!areaq, n) %>%
arrange(desc(tf_idf))
}
In this case, your users would pass bare column names into the function like this:
myresults <- analyze_risk(Category, Name_of_Firstcol, Name_of_Lastcol)
If you want users to pass in strings, you'll need to use rlang::expr() instead of enquo().

Resources