I am trying to generate lagged variable in R using the following code
library(dpylr)
dataretail<-dataretail %>%
group_by(PERMNO) %>%
mutate(newsheat_lag = lag(newsheat, n = 1,order_by = YYYYQ,default = NA)
but for some reason my lagged variable is identical to the original one. The same code used to work correctly a few months ago. Any idea what is going wrong?
I would use data.table::shift() as I've find it to be more reliable.
mtcars$previousMPG<-data.table::shift(mtcars$mpg,1)
head(mtcars[,c(1,12)])
Related
I currently have a dataset that has two variables, winner_entry and winner_seed. There are a few instances on where the data was incorrectly inputed. The winner_entry was incorrectly put into the winner_seed variable on a few instances.
Atp_singles_2022 %>%
filter(winner_seed == "WC") %>%
select(winner_seed, winner_entry, winner_name, tourney_name) %>%
print(n=10)
This produces the output below
Atp_singles_2022 %>%
mutate(winner_seed == str_replace_all(tourney_name, fixed("WC"),"NA"))
I was thinking to do this, but that wouldn't fix the winner_entry which needs to be changed to WC
This may be solved using an ifelse statement within mutate.
Atp_singles_2022 <- Atp_singles_2022 %>%
mutate(winner_entry = ifelse(is.na(winner_entry),"WC",winner_entry))
This code says that if is.na(winner_entry) (AKA if winner_entry is NA), change it to WC, else leave it as winner_entry. With this code, you can change the contents of a column based on values in that column, or you could change it based on another column.
currently, I am cleaning my dataset (Comparative Manifesto Project) and try to compute the effective number of parties using the enp function from the electoral package (https://www.rdocumentation.org/packages/electoral/versions/0.1.2/topics/enp). However, I am running in some issues.
When I run this code:
cmp_1990 %>%
mutate(enp_vote = round(pervote, digits = 2)) %>%
mutate(enp_vote = as.numeric(enp_vote)) %>%
relocate(enp_vote, .before = parfam) %>%
mutate(enp_vote = enp(votes = cmp_1990$enp_vote)) %>%
relocate(enp, .before = parfam)
I get the error message:
Fehler: Can't subset columns that don't exist.
x Column `enp` doesn't exist.
I suppose, r thinks of the function enp as single column even though I have installed and used library on the package.
I tried it with differently rounded numbers and by using the enp command outside of the rest of the command but up until now nothing worked. Oh and the cmp_1990$enp_vote command was necessary as otherwise the enp function thought of enp_vote as categorical and not numerical value.
Sorry by the way if my code doesnt look like the nicest, its my first time using r haha.
Thanks very much in advance!
I know there are number of questions similar to this here but 1) most of the solutions rely on deprecated functions like ml_create_dummy_variables and 2) other solutions are incomplete.
Is there a function or an approach to easily hot encode a categorical variable into multiple dummy variables in sparklyr?
This post asks for a solution in SparkR, incidentally a sparklyr solution is given that only works when the categories are unique in a given column, which renders its pointless.
This solution, results in a single dummy instead of a dummy for each category (grabs the first category). This is also the solution I stumbled onto (based on this post), which does not cut it:
iris_sdf <- copy_to(sc, iris, overwrite = TRUE)
iris_sdf %>%
ft_string_indexer(input_col = "Species", output_col = "species_num") %>%
mutate(cat_num = species_num + 1) %>%
ft_one_hot_encoder("species_num", "species_dum") %>%
ft_vector_assembler(c("species_dum"))
I'm looking for a solution that will take Species from the iris dataset and generate three columns -one for each category in Species (virginica, setosa, and versicolor). Using R, fastDummies package has what I need, but I'm left wondering how to achieve similar functionality in sparklyr.
Again, I'll note that ml_create_dummy_variables (suggested by this post) produced the following error:
Error in ml_create_dummy_variables(., "species_num", "species_dum") : Error in ml_create_dummy_variables(., "species_num", "species_dum") :
could not find function "ml_create_dummy_variables"
Note: I'm using sparklyr_1.3.1
I have added a variable that is the sum of all policies for each customer:
mhomes %>% mutate(total_policies = rowSums(select(., starts_with("num"))))
However, when I now want to use this total_policies variable in plots or when using summary() it says: Error in summary(total_policies) : object 'total_policies' not found.
I don't understand what I did wrong or what I should do differently here.
May be slightly round about, but feel solves the purpose. Considering df is the dataset and it has customer_id, policy_id and policy_amount as variables then the below command should work
req_output = df %>% group_by(customer_id) %>% summarise (total_policies = sum (policy_amount)
if you still face the issue, kindly convert to data frame and try plotting
req_output = as.data.frame(req_output)
Using the latest version of tibble the output of wide tibbles is not properly displayed when setting width = Inf.
Based on my tests with previous versions wide tibbles were printed nicely until versions later than 1.3.0. This is what I would like the output to be printed like:
...but this is what it looks like using the latest version of tibble:
I tinkered around with the old sources but to no avail. I would like to incorporate this in a package so the solution should pass R CMD check. When I just copied a load of functions from tibble v1.3.0 I managed to restore the old behavior but could not pass the check.
There's an open issue on Github related to this problem but it's apparently 'not high priority'. Is there a way to print tibbles properly with the new version?
Try out this function:
print_width_inf <- function(df, n = 6) {
df %>%
head(n = n) %>%
as.data.frame() %>%
tibble:::shrink_mat(width = Inf, rows = NA, n = n, star = FALSE) %>%
`[[`("table") %>%
print()
}
This seems to have change, now one can just use:
options(tibble.width = Inf)