Problem with mutate keyword and functions in R - r

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.

Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)

FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

Related

Using a for loop in R to assign value labels

Context: I have a large dataset (CoreData) with an accompanying datafile (CoreValues) that contains the code and values for each variable within the dataset.
Problem: I want to use a loop to assign each variable within the dataset (CoreData) the correct value labels (from the CoreValues data).
What I've tried so far:
I have created a character vector that identifies which variables within my main data (CoreData) have values that need to be added:
Core_VarwithValueLabels<- unique(CoreValues$Abbreviation)
I have tried a for loop using the vector created , to create vectors for both the label and level arguments that feed into the factor() function.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=paste0(i, 'Levels'), labels = paste0(i, 'Labels'))
}
This creates the correct label and level vectors, however, they are not being picked up properly within the factor function.
Question: Can you help me identify how to get my factor function to work within this loop or if there is a more appropriate method?
Sample data:
CoreValues:
example data from CoreValues
CoreData:
example data from CoreData
UPDATE: RESOLVED
I have now resolved this by using the get() function within my factor() function as it uses the strings I've created with paste0() and find the vector of that name.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=get(paste0(i, 'Levels')), labels = get(paste0(i, 'Labels')))
}

How to properly parse (?) mdsets in expss within a loop?

I'm new to R and I don't know all basic concepts yet. The task is to produce a one merged table with multiple response sets. I am trying to do this using expss library and a loop.
This is the code in R without a loop (works fine):
#libraries
#blah, blah...
#path
df.path = "C:/dataset.sav"
#dataset load
df = read_sav(df.path)
#table
table_undropped1 = df %>%
tab_cells(mdset(q20s1i1 %to% q20s1i8)) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
There are 10 multiple response sets therefore I need to create 10 tables in a manner shown above. Then I transpose those tables and merge. To simplify the code (and learn something new) I decided to produce tables using a loop. However nothing works. I'd looked for a solution and I think the most close to correct one is:
#this generates a message: '1' not found
for(i in 1:10) {
assign(paste0("table_undropped",i),1) = df %>%
tab_cells(mdset(assign(paste0("q20s",i,"i1"),1) %to% assign(paste0("q20s",i,"i8"),1)))
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
Still it causes an error described above the code.
Alternatively, an SPSS macro for that would be (published only to better express the problem because I have to avoid SPSS):
define macro1 (x = !tokens (1)
/y = !tokens (1))
!do !i = !x !to !y.
mrsets
/mdgroup name = !concat($SET_,!i)
variables = !concat("q20s",!i,"i1") to !concat("q20s",!i,"i8")
value = 1.
ctables
/table !concat($SET_,!i) [colpct.responses.count pct40.0].
!doend
!enddefine.
*** MACRO CALL.
macro1 x = 1 y = 10.
In other words I am looking for a working substitute of !concat() in R.
%to% is not suited for parametric variable selection. There is a set of special functions for parametric variable selection and assignment. One of them is mdset_t:
for(i in 1:10) {
table_name = paste0("table_undropped",i)
..$table_name = df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
However, it is not good practice to store all tables as separate variables in the global environment. Better approach is to save all tables in the list:
all_tables = lapply(1:10, function(i)
df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
)
UPDATE.
Generally speaking, there is no need to merge. You can do all your work with tab_*:
my_big_table = df %>%
tab_total_row_position("none")
for(i in 1:10) {
my_big_table = my_big_table %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_stat_cpct()
}
my_big_table = my_big_table %>%
tab_pivot(stat_position = "inside_columns") # here we say that we need combine subtables horizontally

How to make this work without global variable in R?

I am parsing some metadata containing json files to similar dataframes. I am using tidyjson. Finally i made it work like this:
append_arrays_and_objects <- function (tbl) {
objs <- tbl %>%
filter(is_json_object(.)) %>% gather_object %>%
append_values_string
arr <- tbl %>%
filter(is_json_array(.)) %>% gather_array %>%
append_values_string
if (nrow(objs) > 0) append_arrays_and_objects(objs)
if (nrow(arr) > 0) append_arrays_and_objects(arr)
print(objs)
print(arr)
res1 <- merge(objs,arr, all=TRUE)
result <<- merge(result,res1, all=TRUE)
result
}
#parse microdata
result <- data.frame()
md <- dataHighest$JSON %>%
enter_object(microdata) %>%
append_arrays_and_objects
rm(result)
It just bothers me that I can't make it work without the global dataframe result. When i tried it by returning any combination of local dataframes it always ends up with a dataframe with the "first level" depth dataframe.. I think when it has all the data collected i cannot seem to pass it back anymore. Should be trivial to solve?

How do I make a for loop with the filter function?

I'm having a problem with using the filter() function inside a for loop, it doesn't filter the data frame and instead creates an i value. The code is below:
library(tidyverse)
library(magrittr)
library(dplyr)
funcexrds <- readRDS("C:/Users/chlav/Dropbox/Antidumping/Data/ano_pais_imp/funcex.rds")
funcexrds <- funcexrds %>% arrange(desc_cnae, pais)
View(funcexrds)
funcexpais_lista <- funcexrds %>% select(pais) %>% as.list()
funcexcnae_lista <- funcexrds %>% select(desc_cnae) %>% as.list()
subset1 <- filter(funcexrds, pais == "África do Sul", desc_cnae == "Abate de reses, exceto suínos")
for (i in 1:length(unique(funcexpais_lista))) {
funcexrds_t <- filter(funcexrds, pais == "i")
}
As you can see if you reproduce the code, subset1 returns the filtered dataset as you expect, but the for loop doesn't
I agree with #Clemsang. If you're trying to get the for loop to pull out whatever relevant information is at Pais == 1, Pais == 2, etc. putting i outside of quotes effectively shows the for loop where to put the number you indicated in
for (i in 1:length(unique(funcexpais_lista)))
Also just some housekeeping to keep in mind, since tidyverse already contains the dplyr and magrittr functions, you should only need to load tidyverse before starting your code!

Simple loop with subset and variable name assignment

I am actually learning R and I don't understand why this simple assignment does not works. I would like to subset by year using the filter function of the dplyr package. After several tentatives, here are a reproducible example using the gapminder dataset.
I could use the subset function, lapply, or even anonymous function to solve this problem, but here, I just want to understand why this specific code is not working.
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
paste0("gapminder", i) <- print(gapminder %>%
filter(year == i))
}
With or without print, same problem
It's because your assignment is to a function (paste0).
If you remove that part it prints each filtered dataframe:
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
print(gapminder %>% filter(year == i))
}
You could assign each to a list, like so:
my_list <- list()
library(gapminder)
library(dplyr)
for (i in seq_along(unique(gapminder$year))) {
year_filter <- unique(gapminder$year)[i] # each iteration we get another year
my_list[[i]] <- gapminder %>% filter(year == year_filter)
cat(paste0("gapminder", year_filter, " ")) # use cat if you want to print at each iteration
}
paste0 just concatenates vectors after converting to character.
Use assign function to store the output.
for (i in unique(gapminder$year))
{
assign(paste0("gapminder", i),print(gapminder %>%filter(year == i)))
}
If you want to get the specific output, use get function.
out_i = get(paste0("gapminder", i))

Resources