I have a list of species and I am running an ensemble SDM modelling function on the datset filtering by each species, to give an ensemble SDM per species from the dataset.
I have used purrr package to get it running, and the code works fine when there is no naming convention added in. However, when it outputs the Ensemble.SDM for each species, they are all named the same thing "ensemble.sdm", so when I want to stack them, I cannot as they are all named the same thing.
I would like to be able to name each output of the model something different, ideally linked to the species name picked out in the line: data <- Occ_full %>% filter(NAME == .x)
The working code is written below:
list_of_species <- unique(unlist(Occ_full$NAME))
# Return unique values
output <- purrr::map(limit_list_of_species, ~ {
data <- Occ_full %>% filter(NAME == .x)
ensemble_modelling(c('GAM'), data, Env_Vars,
Xcol = 'LONGITUDE', Ycol = 'LATITUDE', rep = 1)
})
The code I have tried to get it named within it, is below, but it does not work, it names it with lots of repeitions of the row number.
output <- purrr::map(limit_list_of_species, ~ {
data <- Occ_full %>% filter(NAME == .x)
label <- as.character(data)
ensemble_modelling(c('GAM'), data, Env_Vars,
Xcol = 'LONGITUDE', Ycol = 'LATITUDE', rep = 1, name = label )
})
Could anyone help me please? I simply want each "output" to be named with the species name specified in the filter. Thank you
Try using split with imap -
list_of_species <- split(Occ_full, Occ_full$NAME)
output <- purrr::imap(list_of_species,~{
ensemble_modelling(c('GAM'), .x, Env_Vars,Xcol = 'LONGITUDE',
Ycol = 'LATITUDE', rep = 1, name = .y)
})
split would ensure that the list_of_species is named which can be used in imap.
Related
I tried to run a panel var on dataset I got from Statistics Sweden and here is what I get:
df<- read_excel("Inkfördelning per kommun.xlsx")
nujavlar <- pvarfeols(dependent_vars = c("Kvintil-1", "Kvintil-4", "Kvintil-5"),
lags = 1,
transformation = "demean",
data = df,
panel_identifier = c("Kommun", "Year")
)
Error: Can't subset columns that don't exist.
x Column `Kvintil-1` doesn't exist.
I often get this message too:
Warning in xtfrm.data.frame(x) : cannot xtfrm data frames
Error: Can't subset columns that don't exist.
x Location 2 doesn't exist.
ℹ There are only 1 column.
I have made sure that all data is numeric. I have also tried cleaning my workspace and restarted the programme. I also tried to convert it into a paneldata frame with palm package. I also tried converting my entity variable "Kommun" (Municipality) into factors and it still doesn't work.
Here's the data if someone wants to give it a go.
https://docs.google.com/spreadsheets/d/16Ak_Z2n6my-5wEw69G29_NLryQKcrYZC/edit?usp=sharing&ouid=113164216369677216623&rtpof=true&sd=true
The column names in your dataframe are Kvintil 1, not Kvintil-1, so the variable you are referring to really does not exist. Please be aware that in R, variable names cannot have hyphens and it is good practice to avoid spaces in variable names because it is annoying to refer to variables with spaces. I have included a reproducible example below.
library(tidyverse)
library(gsheet)
library(panelvar)
url <- 'docs.google.com/spreadsheets/d/16Ak_Z2n6my-5wEw69G29_NLryQKcrYZC'
df <- gsheet2tbl(url) %>%
rename(Kvintil1 = `Kvintil 1`) %>%
rename(Kvintil2 = `Kvintil 2`) %>%
rename(Kvintil3 = `Kvintil 3`) %>%
rename(Kvintil4 = `Kvintil 4`) %>%
rename(Kvintil5 = `Kvintil 5`) %>%
as.data.frame()
nujavlar <- pvarfeols(
dependent_vars = c("Kvintil1", "Kvintil4", "Kvintil5"),
lags = 1,
transformation = "demean",
data = df,
panel_identifier = c("Kommun", "Year"))
I found a useful and simple function which creates a codebook for .dta datasets in R, the code is the following:
codebook <- map_df(dt, function(x) attributes(x)$label) %>%
gather(key = Code, value = Label) %>%
mutate(Type = map_chr(dt, typeof),
Mean = map_dbl(dt, mean, na.rm = T),
Prop_miss = map_dbl(dt, function(x) mean(is.na(x))))
The function works just fine, unless there are variables in the dataset that are unlabaled, in that case it doesn't. I would like to modify it so that it also reports variables with no labels as "UNLABELED" along with the rest of the information. I tried something like this:
if (map_df(dt, function(x) attributes(x)$label) == NULL) {
attr(function(x) dt$(x), "label") <- "NO LABEL"
}
But it doesn't work (I am not really skilled with coding on R).
Thanks in advance for the help.
The `if/else condition would be inside
library(purrr)
map_dfr(dt, ~ {
if(is.null(attributes(.x)$label)) {
attr(.x, "label") <- "NO LABEL"
}
attributes(.x)$label})
I desperately need help!
I am trying to predict drug use based on 5 characteristics: Age, Gender, Education, Ethnicity, Country. I already build a tree model in R with rpart
DrugTree3 <- rpart(formula = DrugUser ~ Age+Gender+Education+Ethnicity+Country, data = traindata)
, a logistic regression model
DrugLog <- glm(formula = DrugUser ~ Age+Gender+Ethnicity+Education+Country,data = traindata, family = binomial)
, and a knn model
KnnModel <- train(form = DrugUser~., data = ModelData,method ='knn',tuneGrid=expand.grid(.k=1:100),metric='Accuracy',trControl=trainControl(method='repeatedcv',number=10,repeats=10)) .
I saved those as RDS files and uploaded them successfully in Power BI.
I then created tables for each characterization and created okviz filters for them.
Then I tried to predict whether a customer gets predicted as a drug user or a non-drug user based on the selections in the okviz filters. This is when everything went horribly wrong:
I created a custom R visual vor each model prediction and inserted the following code in each visual:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset <- data.frame(chunk_id, model_id, model_str, AgeLabel, GenderLabel, CountryLabel, EducationLabel, EthnicityLabel)
# dataset <- unique(dataset)
# Paste or type your script code here:
library(dplyr)
from_byte_string = function(x) {
xcharvec = strsplit(x, " ")[[1]]
xhex = as.hexmode(xcharvec)
xraw = as.raw(xhex)
unserialize(xraw)
}
# R Visual imports tables with read.csv but no argument for strings_as_factors = F.
# This means some of the chunks are truncated (ie if they had a " " at the end).
# If you convert to a character and add a space if nchar == 9999 the deserialization works.
# (Thanks to Danny Shah)
dataset <- dataset %>%
mutate( model_str = as.character(model_str) ) %>%
mutate( model_str = ifelse(nchar(model_str) == 9999, paste0(model_str, " "), model_str) )
model_vct <- dataset %>%
filter(model_id == 1) %>%
distinct(model_id, chunk_id, model_str) %>%
arrange(model_id, chunk_id) %>%
pull(model_str)
finalfit.str <- paste( model_vct, collapse = "" )
finalfit <- from_byte_string(finalfit.str)
# get the user parameters
userdata <- dataset %>% select(AgeLabel,GenderLabel,CountryLabel,EducationLabel,EthnicityLabel) %>% unique()
# and then using them to make a prediction
myprediction <- predict(finalfit,newdata=data.frame(Age=userdata$AgeLabel,Gender=userdata$GenderLabel,Country=userdata$CountryLabel, Education=userdata$EducationLabel,Ethnicity=userdata$EthnicityLabel))
maxpred <- which(myprediction==max(myprediction))
myclass <- maxpred - 1
myprob <- myprediction[[maxpred]]
plot.new()
text(0.5,0.5,labels=sprintf("P(class = %s) = %s",myclass,as.character(round(myprob,2))),cex=3.5)
Error: Can't determine relationship between fields.
What has gone wrong here?
When I then clicked on the diagonal arrow to get to R Studio, this happens: Unable to construct R script data for use in external R IDE.
I need help as I am literally going crazy over this and I don't know how to resolve the issue! I would be really happy if you can help me
enter image description here
You made a error in line 34, and line 25.
Below is a fixed version of your code.
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset <- data.frame(chunk_id, model_id, model_str, AgeLabel, GenderLabel, CountryLabel, EducationLabel, EthnicityLabel)
# dataset <- unique(dataset)
# Paste or type your script code here:
library(dplyr)
from_byte_string = function(x) {
xcharvec = strsplit(x, " ")[[1]]
xhex = as.hexmode(xcharvec)
xraw = as.raw(xhex)
unserialize(xraw)
}
# R Visual imports tables with read.csv but no argument for strings_as_factors = F.
# This means some of the chunks are truncated (ie if they had a " " at the end).
# If you convert to a character and add a space if nchar == 9999 the deserialization works.
# (Thanks to Danny Shah)
dataset <- dataset %>%
mutate( model_str = as.character(model_str) ) %>%
mutate( model_str = ifelse(nchar(model_str) == 9999, paste0(model_str, " "), model_str) )
model_vct <- dataset %>%
filter(model_id == 1) %>%
distinct(model_id, chunk_id, model_str) %>%
arrange(model_id, chunk_id) %>%
pull(model_str)
finalfit.str <- paste( model_vct, collapse = "" )
finalfit <- from_byte_string(finalfit.str)
# get the user parameters
userdata <- dataset %>% select(AgeLabel,GenderLabel,CountryLabel,EducationLabel,EthnicityLabel) %>% unique()
# and then using them to make a prediction
myprediction <- predict(finalfit,newdata=data.frame(Age=userdata$AgeLabel,Gender=userdata$GenderLabel,Country=userdata$CountryLabel, Education=userdata$EducationLabel,Ethnicity=userdata$EthnicityLabel))
maxpred <- which(myprediction==max(myprediction))
myclass <- maxpred - 1
myprob <- myprediction[[maxpred]]
plot.new()
text(0.5,0.5,labels=sprintf("P(class =
Good Luck!
I'm a newbie in R, so please have some patience and... tips are most welcome.
My goal is to create tibble that holds a "Full Name" (of a person, that may have 2 to 4 names) and his/her gender. I must start from a tibble that contains typical Male and Female names.
Below I present a minimum working example.
My problem: I can call get_name() multiple time (in 10.000 for loop!!) and get the right answer. But, I was looking for a more 'elegant' way of doing it. replicate() unfortunately returns a vector... which make it unusable.
My doubts: I know I have some (very few... right!!) issues, like the if statement, that is evaluated every time (which is redundant), but I don't find another way to do it. Any suggestion?
Any other suggestions about code struct are also welcome.
Thank you very much in advance for your help.
# Dummy name list
unit_names <- tribble(
~Women, ~Man,
"fem1", "male1",
"fem2", "male2",
"fem3", "male3",
"fem4", "male4",
"fem5", "male5",
"fem6", NA,
"fem7", NA
)
set.seed(12345) # seed for test
# Create a tibble with the full names
full_name <- tibble("Full Name" = character(), "Gender" = character() )
get_name <- function() {
# Get the Number of 'Unit-names' to compose a 'Full-name'
nbr_names <- sample(2:4, 1, replace = TRUE)
# Randomize the Gender
gender <- sample(c("Women", "Man"), 1, replace = TRUE)
if (gender == "Women") {
lim_names <- sum( !is.na(unit_names$"Women"))
} else {
lim_names <- sum( !is.na(unit_names$"Man"))
}
# Sample the Fem/Man List names (may have duplicate)
sample(unlist(unit_names[1:lim_names, gender]), nbr_names, replace = TRUE) %>%
# Form a Full-name
paste ( . , collapse = " ") %>%
# Add it to the tibble (INCLUDE the Gender)
add_row(full_name, "Full Name" = . , "Gender" = gender)
}
# How can I make 10k of this?
full_name <- get_name()
If you pass a larger number than 1 to sample this problem becomes easier to vectorise.
One thing that currently makes your problem much harder is the layout of your unit_names table: you are effectively treating male and female names as individually paired, but they clearly aren’t: hence they shouldn’t be in columns of the same table. Use a list of two vectors, for instance:
unit_names = list(
Women = c("fem1", "fem2", "fem3", "fem4", "fem5", "fem6", "fem7"),
Men = c("male1", "male2", "male3", "male4", "male5")
)
Then you can generate random names to your heart’s delight:
generate_names = function (n, unit_names) {
name_length = sample(2 : 4, n, replace = TRUE)
genders = sample(c('Women', 'Men'), n, replace = TRUE)
names = Map(sample, unit_names[genders], name_length, replace = TRUE) %>%
lapply(paste, collapse = ' ') %>%
unlist()
tibble(`Full name` = names, Gender = genders)
}
A note on style, unlike your function the above doesn’t use any global variables. Furthermore, don’t "quote" variable names (you do this in unit_names$"Women" and for the arguments of add_row). R allows this, but this is arguably a mistake in the language specification: these are not strings, they’re variable names, making them look like strings is misleading. You don’t quote your other variable names, after all. You do need to backtick-quote the `Full name` column name, since it contains a space. However, the use of backticks, rather than quotes, signifies that this is a variable name.
I am not 100% of what you are trying to get, but if I got it right...did you try with mutate at dplyr? For example:
result= mutate(data.frame,
concated_column = paste(column1, column2, column3, column4, sep = '_'))
With a LITTLE help from Konrad Rudolph, the following elegant (and vectorized ... and fast) solution that I was looking. map2 does the necessary trick.
Here is the full working example if someone needs it:
(Just a side note: I kept the initial conversion from tibble to list because the data arrives to me as a tibble...)
Once again thanks to Konrad.
# Dummy name list
unit_names <- tribble(
~Women, ~Men,
"fem1", "male1",
"fem2", "male2",
"fem3", "male3",
"fem4", "male4",
"fem5", "male5",
"fem6", NA,
"fem7", NA
)
name_list <- list(
Women = unit_names$Women[!is.na(unit_names$Women)],
Men = unit_names$Men[!is.na(unit_names$Men)]
)
generate_names = function (n, name_list) {
name_length = sample(2 : 4, n, replace = TRUE)
genders = sample(c('Women', 'Men'), n, replace = TRUE)
#names = lapply(name_list[genders], sample, name_length) %>%
names = map2(name_list[genders], name_length, sample) %>%
lapply(paste, collapse = ' ') %>%
unlist()
tibble(`Full name` = names, Gender = genders)
}
full_name <- generate_names(10000, name_list)
I have a data frame (all.table) that i have subsetted into 3 different data plots name (A1.table, B25.table, and C48.table)
all.table = read.table(file.path(input_file_name), header=T, sep = "\t")
A1.table = subset(all.table, ID == "A1")
B25.table = subset(all.table, ID == "B25")
C48.table = subset(all.table, ID == "C48")
For each graph type I want, I want to generate it based on all 4 tables
for (i in list(all.table, A1.table, B25.table, C48.table)){
ggplot(i, aes(x=Position, fill=Frequency)) + #other plot options
ggsave(file.path(full_output_path, "uniqueFileName.pfd")
#additional plots
#additional saves
}
my problem comes in the ggsave command with how to generate the 'uniqueFileName.pdf'. I would like to name it as some form of all.table.graph1.pdf, all.table.graph2.pdf and A1.table.graph1.pdf, A1.table.graph2.pdf etc
My question is how do I turn the name of the iterator i into a string, and add that string to a '.graph1.pdf' string?
Coming from a python background this seems like it should be rather simple. I am not very versed in R (as is likely obvious from this question) and anything resembling an answer I have found seems incredibly over complicated.
This is a workflow that uses the tidyverse suite of functions. iwalk is similar to lapply in base, but it requires a function that takes 2 arguments, and it automatically inputs the names of the list as the 2nd argument.
The short answer for what you want is paste0, which lets you combine strings.
library(tidyverse)
all.table %>%
filter(ID %in% c("A1", "B25", "C48")) %>% # only needed if there are more IDs than the 3 explictly listed
split(., .$ID) %>% # creates the list of data frames
c(list(all.table = all.table), .) %>% # adds "all.table" as a list element
iwalk(function(df, label) {
ggplot(df, aes(x = Position, fill = Frequency)) +
...
ggsave(file.path(full_output_path, paste0(label, ".graph1.pdf")))
})
Figured out a solution by looking for a python dictionary equivalent:
all.table = read.table(file.path(input_file_name), header=T, sep = "\t")
A1.table = subset(all.table, ID == "A1")
B25.table = subset(all.table, ID == "B25")
C48.table = subset(all.table, ID == "C48")
#Generate a named list of tables
list_of_tables = list(all = all.table, A1 = A1.table, B25 = B25.table, C48 = C48.table)
for (i in 1:length(list_of_tables)){
ggplot(list_of_tables[[i]], aes(x=Frequency, fill=Category)) + #more options
ggsave(file.path(full_output_path, paste0(names(list_of_tables[i]), ".graph1.pdf"))
}
I'm not sure if there is a downside to not using other libraries (ie tidyverse), but this seems like the simplest answer?