error argument "df1" is missing, with no default - r

a friend of mine is working with the r language and asked me what she did wrong, i can't seem to find the problem. does someone know what it is?
the code she send me:
# 10*. Pipe that to a ggplot command and create a histogram with 4 bins.
# Hint: you will NOT write ggplot(df, aes(...)) because the df is already piped in.
# Instead, just write: ggplot(aes(...)) etc.
# Title the histogram, "Distribution of Sunday tips for bills over $20"
# Feel free to style the plot (not required; this would be a typical exploratory
# analysis where only you will see it, so it doesn't have to be perfect).
df %>%
filter(total_bill > 20 & day == "Sun") %>%
ggplot(aes(x=total_bill, fill=size)) +
geom_histogram(bins=4) +
ggtitle("Distribution of Sunday tips for bills over $20")
the error:
Error in df(.) : argument "df1" is missing, with no default

Type ?df in your console, and you will see that df is a function with the following argument.
df(x, df1, df2, ncp, log = FALSE)
where df1 is an argument. So the error message is saying that R cannot find the first argument for the df function.
It seems like in this code example, your friend is trying to put a data frame called df into the filter function from the dplyr package and the ggplot function from the ggplot2 package to create a plot.
So my guess is your friend needs to define df as a data frame. Otherwise, R will think df is a function and keep throwing error.
By the way, since df is a defined function in R, it is not a good name for a data frame. However, people use df as a name for a data frame all the time. Try a different name, such as dat, for the name of a data frame next time.

Related

ENP function in mutate

currently, I am cleaning my dataset (Comparative Manifesto Project) and try to compute the effective number of parties using the enp function from the electoral package (https://www.rdocumentation.org/packages/electoral/versions/0.1.2/topics/enp). However, I am running in some issues.
When I run this code:
cmp_1990 %>%
mutate(enp_vote = round(pervote, digits = 2)) %>%
mutate(enp_vote = as.numeric(enp_vote)) %>%
relocate(enp_vote, .before = parfam) %>%
mutate(enp_vote = enp(votes = cmp_1990$enp_vote)) %>%
relocate(enp, .before = parfam)
I get the error message:
Fehler: Can't subset columns that don't exist.
x Column `enp` doesn't exist.
I suppose, r thinks of the function enp as single column even though I have installed and used library on the package.
I tried it with differently rounded numbers and by using the enp command outside of the rest of the command but up until now nothing worked. Oh and the cmp_1990$enp_vote command was necessary as otherwise the enp function thought of enp_vote as categorical and not numerical value.
Sorry by the way if my code doesnt look like the nicest, its my first time using r haha.
Thanks very much in advance!

How to hot encode/generate dummy columns using sparklyr

I know there are number of questions similar to this here but 1) most of the solutions rely on deprecated functions like ml_create_dummy_variables and 2) other solutions are incomplete.
Is there a function or an approach to easily hot encode a categorical variable into multiple dummy variables in sparklyr?
This post asks for a solution in SparkR, incidentally a sparklyr solution is given that only works when the categories are unique in a given column, which renders its pointless.
This solution, results in a single dummy instead of a dummy for each category (grabs the first category). This is also the solution I stumbled onto (based on this post), which does not cut it:
iris_sdf <- copy_to(sc, iris, overwrite = TRUE)
iris_sdf %>%
ft_string_indexer(input_col = "Species", output_col = "species_num") %>%
mutate(cat_num = species_num + 1) %>%
ft_one_hot_encoder("species_num", "species_dum") %>%
ft_vector_assembler(c("species_dum"))
I'm looking for a solution that will take Species from the iris dataset and generate three columns -one for each category in Species (virginica, setosa, and versicolor). Using R, fastDummies package has what I need, but I'm left wondering how to achieve similar functionality in sparklyr.
Again, I'll note that ml_create_dummy_variables (suggested by this post) produced the following error:
Error in ml_create_dummy_variables(., "species_num", "species_dum") : Error in ml_create_dummy_variables(., "species_num", "species_dum") :
could not find function "ml_create_dummy_variables"
Note: I'm using sparklyr_1.3.1

R doesn't read in URL from data into Web-Crawler

Hello to all professionals out here,
I have created a csv which consists of cities and the corresponding Tripadvisor_Urls. If I now search for a specific link in my list, for example like here to Munich, the subset function ejects the URL. Now I try to read this URL, which is stored under search_url, using read_html. Unfortunately without success.
The relevant part of my code is the following.
search_url <- subset(data, city %in% "München", select = url)
pages <- read_html(search_url)
pages <- pages %>%
html_nodes("._15_ydu6b") %>%
html_attr('href')
When I run search_url I get the following output:
https://www.tripadvisor.de/Restaurants-g187323-Berlin.html
But when I use the above code and want to execute read_html, the following error occurs:
Error in UseMethod("read_xml") :
no applicable method for 'read_xml' applied to an object of class "data.frame"
I have now spent several hours on it, but unfortunately I have not received a suitable tip anywhere. It would be wonderful if you could help me here.
That's because the result of subset() is a data frame here, although the real result is simply one string. Check this simple example with mtcars:
# this will be data.frame although the result is one numeric value 21.4
class(subset(mtcars, disp == 258, select = mpg))
# [1] "data.frame"
So you probably can use
pages <- read_html(as.character(search_url))
if you are sure that your subset returns only 1 character value, otherwise
pages <- read_html(search_url[1, 1])
should work as well for the first result of your subset.

How do I solve the error object not found after I created this variable using mutate?

I have added a variable that is the sum of all policies for each customer:
mhomes %>% mutate(total_policies = rowSums(select(., starts_with("num"))))
However, when I now want to use this total_policies variable in plots or when using summary() it says: Error in summary(total_policies) : object 'total_policies' not found.
I don't understand what I did wrong or what I should do differently here.
May be slightly round about, but feel solves the purpose. Considering df is the dataset and it has customer_id, policy_id and policy_amount as variables then the below command should work
req_output = df %>% group_by(customer_id) %>% summarise (total_policies = sum (policy_amount)
if you still face the issue, kindly convert to data frame and try plotting
req_output = as.data.frame(req_output)

"could not find function %>%<-", issue with tidyr package and the %>% operator

I'm working on a script for a swirl lesson on using the tidyr package and I'm having some trouble with the %>% operator. I've got a data frame called passed that contains the name, class number, and final grade of 4 students. I want to add a new column called status and populate it with a character vector that says "passed". Before that, I used select to grab some columns from a data frame called students4 and stored it in a data frame called grade book
gradebook <- students4 %>%
select(id, class, midterm, final) %>%
passed<-passed %>% mutate(status="passed")
Swirl problems build on each other, and the last one just had me running the first to lines of code, so I think those two are correct. The third line was what was suggested after a couple of wrong attempts, so I think there's something about %>% that I'm not understanding. When I run the code I get an error that says;
Error in students4 %>% select(id, class, midterm, final) %>% passed <- passed %>% :
could not find function "%>%<-
I found another user who asked about the "could not find function "%>%" who was able to resolve the issue by installing the magrittr package, but that didn't do the trick for me. Any input on the issues in my code would be super appreciated!
It’s not a problem with the package or the operator. You’re trying to pipe into a new line with a new variable.
The %>%passes the previous dataframe into the next function as that functions df argument.
Instead of doing all of this:
Gradebook <- select(students4, id, class, midterm, final)
Gradebook2 <- mutate(Gradebook, test4 = 100)
Gradebook3 <- arrange(Gradebook2, desc(final))
You can pipe operator into the next argument if you’re working on the same dataframe.
Gradebook <- students4 %>%
select(students4, id, class, midterm, final) %>%
mutate(test4 = 100) %>%
arrange(desc(final))
Much cleaner and easier to read.
In your second line you’re trying to pass it to a new function but instead of there being a function you’re all of a sudden defining a variable. I don’t know the exercise you’re doing but you should remove the second operator.
gradebook <- students4 %>%
select(id, class, midterm, final)
passed <- passed %>% mutate(status="passed")

Resources