currently, I am cleaning my dataset (Comparative Manifesto Project) and try to compute the effective number of parties using the enp function from the electoral package (https://www.rdocumentation.org/packages/electoral/versions/0.1.2/topics/enp). However, I am running in some issues.
When I run this code:
cmp_1990 %>%
mutate(enp_vote = round(pervote, digits = 2)) %>%
mutate(enp_vote = as.numeric(enp_vote)) %>%
relocate(enp_vote, .before = parfam) %>%
mutate(enp_vote = enp(votes = cmp_1990$enp_vote)) %>%
relocate(enp, .before = parfam)
I get the error message:
Fehler: Can't subset columns that don't exist.
x Column `enp` doesn't exist.
I suppose, r thinks of the function enp as single column even though I have installed and used library on the package.
I tried it with differently rounded numbers and by using the enp command outside of the rest of the command but up until now nothing worked. Oh and the cmp_1990$enp_vote command was necessary as otherwise the enp function thought of enp_vote as categorical and not numerical value.
Sorry by the way if my code doesnt look like the nicest, its my first time using r haha.
Thanks very much in advance!
Related
I am trying to generate lagged variable in R using the following code
library(dpylr)
dataretail<-dataretail %>%
group_by(PERMNO) %>%
mutate(newsheat_lag = lag(newsheat, n = 1,order_by = YYYYQ,default = NA)
but for some reason my lagged variable is identical to the original one. The same code used to work correctly a few months ago. Any idea what is going wrong?
I would use data.table::shift() as I've find it to be more reliable.
mtcars$previousMPG<-data.table::shift(mtcars$mpg,1)
head(mtcars[,c(1,12)])
I know there are number of questions similar to this here but 1) most of the solutions rely on deprecated functions like ml_create_dummy_variables and 2) other solutions are incomplete.
Is there a function or an approach to easily hot encode a categorical variable into multiple dummy variables in sparklyr?
This post asks for a solution in SparkR, incidentally a sparklyr solution is given that only works when the categories are unique in a given column, which renders its pointless.
This solution, results in a single dummy instead of a dummy for each category (grabs the first category). This is also the solution I stumbled onto (based on this post), which does not cut it:
iris_sdf <- copy_to(sc, iris, overwrite = TRUE)
iris_sdf %>%
ft_string_indexer(input_col = "Species", output_col = "species_num") %>%
mutate(cat_num = species_num + 1) %>%
ft_one_hot_encoder("species_num", "species_dum") %>%
ft_vector_assembler(c("species_dum"))
I'm looking for a solution that will take Species from the iris dataset and generate three columns -one for each category in Species (virginica, setosa, and versicolor). Using R, fastDummies package has what I need, but I'm left wondering how to achieve similar functionality in sparklyr.
Again, I'll note that ml_create_dummy_variables (suggested by this post) produced the following error:
Error in ml_create_dummy_variables(., "species_num", "species_dum") : Error in ml_create_dummy_variables(., "species_num", "species_dum") :
could not find function "ml_create_dummy_variables"
Note: I'm using sparklyr_1.3.1
I have added a variable that is the sum of all policies for each customer:
mhomes %>% mutate(total_policies = rowSums(select(., starts_with("num"))))
However, when I now want to use this total_policies variable in plots or when using summary() it says: Error in summary(total_policies) : object 'total_policies' not found.
I don't understand what I did wrong or what I should do differently here.
May be slightly round about, but feel solves the purpose. Considering df is the dataset and it has customer_id, policy_id and policy_amount as variables then the below command should work
req_output = df %>% group_by(customer_id) %>% summarise (total_policies = sum (policy_amount)
if you still face the issue, kindly convert to data frame and try plotting
req_output = as.data.frame(req_output)
I'm working on a script for a swirl lesson on using the tidyr package and I'm having some trouble with the %>% operator. I've got a data frame called passed that contains the name, class number, and final grade of 4 students. I want to add a new column called status and populate it with a character vector that says "passed". Before that, I used select to grab some columns from a data frame called students4 and stored it in a data frame called grade book
gradebook <- students4 %>%
select(id, class, midterm, final) %>%
passed<-passed %>% mutate(status="passed")
Swirl problems build on each other, and the last one just had me running the first to lines of code, so I think those two are correct. The third line was what was suggested after a couple of wrong attempts, so I think there's something about %>% that I'm not understanding. When I run the code I get an error that says;
Error in students4 %>% select(id, class, midterm, final) %>% passed <- passed %>% :
could not find function "%>%<-
I found another user who asked about the "could not find function "%>%" who was able to resolve the issue by installing the magrittr package, but that didn't do the trick for me. Any input on the issues in my code would be super appreciated!
It’s not a problem with the package or the operator. You’re trying to pipe into a new line with a new variable.
The %>%passes the previous dataframe into the next function as that functions df argument.
Instead of doing all of this:
Gradebook <- select(students4, id, class, midterm, final)
Gradebook2 <- mutate(Gradebook, test4 = 100)
Gradebook3 <- arrange(Gradebook2, desc(final))
You can pipe operator into the next argument if you’re working on the same dataframe.
Gradebook <- students4 %>%
select(students4, id, class, midterm, final) %>%
mutate(test4 = 100) %>%
arrange(desc(final))
Much cleaner and easier to read.
In your second line you’re trying to pass it to a new function but instead of there being a function you’re all of a sudden defining a variable. I don’t know the exercise you’re doing but you should remove the second operator.
gradebook <- students4 %>%
select(id, class, midterm, final)
passed <- passed %>% mutate(status="passed")
Using the latest version of tibble the output of wide tibbles is not properly displayed when setting width = Inf.
Based on my tests with previous versions wide tibbles were printed nicely until versions later than 1.3.0. This is what I would like the output to be printed like:
...but this is what it looks like using the latest version of tibble:
I tinkered around with the old sources but to no avail. I would like to incorporate this in a package so the solution should pass R CMD check. When I just copied a load of functions from tibble v1.3.0 I managed to restore the old behavior but could not pass the check.
There's an open issue on Github related to this problem but it's apparently 'not high priority'. Is there a way to print tibbles properly with the new version?
Try out this function:
print_width_inf <- function(df, n = 6) {
df %>%
head(n = n) %>%
as.data.frame() %>%
tibble:::shrink_mat(width = Inf, rows = NA, n = n, star = FALSE) %>%
`[[`("table") %>%
print()
}
This seems to have change, now one can just use:
options(tibble.width = Inf)