Purrring through - r

I'm using fingertipsR to obtain public health data.
There are indicators at different geographic levels and these indicators are also grouped at profile level.
Here's some code:
library(fingertipsR)
library(fingertipscharts)
library(tidyverse)
library(ggthemes)
fingertips_stats()
inds<-indicators_unique()
profs<-profiles()
it's possible to pull unique indicators for profiles like this and then to add a column like this
smoking<-indicators_unique(ProfileID = 18,DomainID = NULL)%>%mutate(prof_id="18")
What I'd like to do is:
for each unique profile ID generate a dataframe of indicators. There are 53 unique profiles
uniqueprofs<-as_tibble(unique(profs$ProfileID))
How can I purr through this? or loop?
I am routinely stuck on these iteration type problems.
EDIT:
so. if you ctrl + click on
indicators_unique
you'll see the bit:
df <- unique(df[, c("IndicatorID", "IndicatorName")])
I copied all of the function and called it something else
function (ProfileID = NULL, DomainID = NULL, path)
{
if (missing(path))
path <- fingertips_endpoint()
#fingertips_ensure_api_available(endpoint = path)
df <- indicators(ProfileID, DomainID, path = path)
df <- unique(df[, c("IndicatorID", "IndicatorName","ProfileID")])
return(df)
}
And I now get a dataframe containing the ProfileID. If I add "DomainID" I can have that too....
Edit:
Annoyingly, I've asked a similar question and updated it with dplyr group_by and group_walk
I can do this:
inds%>%group_by(ProfileID)%>%group_walk(~ write.csv(.x, paste0(.y$ProfileID, ".csv")))
How do I group_walk and write the dataframes/tibbles to the environment rather than writing them a drive and then loading them in?

Start with some minimal initial code
library(fingertipsR)
library(tidyverse)
profs<-profiles()
indictators_unique is already vectorized so rather than loading the ProfileIDs into a tibble, put them in a list and then you can do a simple
unique_profs <- list(unique(profs$ProfileID))
indicators_unique(ProfileID = unique_profs, DomainID = NULL)
The issue is adding your desired prof_id column. I'm not familiar with these packages. Is there any dataframe that links ProfileID to either IndicatorID or IndicatorName that you can do a join on?

Related

select a value from several dataframes (file.csv) and merge them into a new dataframe with R

Maybe I'm asking for something too simple, but I can't solve this problem.
I want to create a script that recursively enters the folders present in a base_folder, opens a specific file with a name that is always the same (w3nu) and selects a precise value (I need to select the email of the subject belonging to the Response column, filtering for the corresponding heat in the Question.Key column).
I want my script to repeat itself in the same way for all the folders present in the base folder.
Finally, I want to merge all the emails into a new dataframe.
I have created this script but it does not work.
library(tidyverse)
base_folder <- "data/raw/exp_1_participants/sbj"
files <- list.files(base_folder, recursive = TRUE, full.names = TRUE)
demo_email <- files[str_detect(files, "w3nu")]
email_extraction <- function(demo_email){
demo_email <- read.csv(task,header = T)
demo_email <- demo_email %>%
filter(Question.Key == "respondent-email") %>%
select(Response)
}
email_list_jolly <- vector(mode = "list", length = length(demo_email))
for(i in 1:length(email_list_jolly)){
email_list_jolly[[i]] <- email_extraction(demo_email[i])
}
email_list_stud <- cbind(email_list_jolly)
write.csv(email_list_stud, 'data/cleaned/email_list_stud.csv')
can you help me? thanks
From comments:
Looks like you haven't defined task within the script shown above, but you're telling read.csv to find it. Did you mean to pass demo_email to read.csv instead? task is probably a random vector in your workspace.

extracting list-in-a-list-in-a-list to build dataframe in R

I am trying to build a data frame with book id, title, author, rating, collection, start and finish date from the LibraryThing api with my personal data. I am able to get a nested list fairly easily, and I have figured out how to build a data frame with everything but the dates (perhaps in not the best way but it works). My issue is with the dates.
The list I'm working with normally has 20 elements, but it adds the startfinishdates element only if I added dates to the book in my account. This is causing two issues:
If it was always there, I could extract it like everything else and it would have NA most of the time, and I could use cbind to get it lined up correctly with the other information
When I extract it using the name, and get an object with less elements, I don't have a way to join it back to everything else (it doesn't have the book id)
Ultimately, I want to build this data frame and an answer that tells me how to pull out the book id and associate it with each startfinishdate so I can join on book id is acceptable. I would just add that to the code I have.
I'm also open to learning a better approach from the jump and re-designing the entire thing as I have not worked with lists much in R and what I put together was after much trial and error. I do want to use R though, as ultimately I am going to use this to create an R Markdown page for my web site (for instance, a plot that shows finish dates of books).
You can run the code below and get the data (no api key required).
library(jsonlite)
library(tidyverse)
library(assertr)
data<-fromJSON("http://www.librarything.com/api_getdata.php?userid=cau83&key=392812157&max=450&showCollections=1&responseType=json&showDates=1")
books.lst<-data$books
#create df from json
create.df<-function(item){
df<-map_df(.x=books.lst,~.x[[item]])
df2 <- t(df)
return(df2)
}
ids<-create.df(1)
titles<-create.df(2)
ratings<-create.df(12)
authors<-create.df(4)
#need to get the book id when i build the date df's
startdates.df<-map_df(.x=books.lst,~.x$startfinishdates) %>% select(started_stamp,started_date)
finishdates.df<-map_df(.x=books.lst,~.x$startfinishdates) %>% select(finished_stamp,finished_date)
collections.df<-map_df(.x=books.lst,~.x$collections)
#from assertr: will create a vector of same length as df with all values concatenated
collections.v<-col_concat(collections.df, sep = ", ")
#assemble df
books.df<-as.data.frame(cbind(ids,titles,authors,ratings,collections.v))
names(books.df)<-c("ID","Title","Author","Rating","Collections")
books.df<-books.df %>% mutate(ID=as.character(ID),Title=as.character(Title),Author=as.character(Author),
Rating=as.character(Rating),Collections=as.character(Collections))
This approach is outside the tidyverse meta-package. Using base-R you can make it work using the following code.
Map will apply the user defined function to each element of data$books which is provided in the argument and extract the required fields for your data.frame. Reduce will take all the individual dataframes and merge them (or reduce) to a single data.frame booksdf.
library(jsonlite)
data<-fromJSON("http://www.librarything.com/api_getdata.php?userid=cau83&key=392812157&max=450&showCollections=1&responseType=json&showDates=1")
booksdf=Reduce(function(x,y){rbind(x,y)},
Map(function(x){
lenofelements = length(x)
if(lenofelements>20){
if(!is.null(x$startfinishdates$started_date)){
started_date = x$startfinishdates$started_date
}else{
started_date=NA
}
if(!is.null(x$startfinishdates$started_stamp)){
started_stamp = x$startfinishdates$started_date
}else{
started_stamp=NA
}
if(!is.null(x$startfinishdates$finished_date)){
finished_date = x$startfinishdates$finished_date
}else{
finished_date=NA
}
if(!is.null(x$startfinishdates$finished_stamp)){
finished_stamp = x$startfinishdates$finished_stamp
}else{
finished_stamp=NA
}
}else{
started_stamp = NA
started_date = NA
finished_stamp = NA
finished_date = NA
}
book_id = x$book_id
title = x$title
author = x$author_fl
rating = x$rating
collections = paste(unlist(x$collections),collapse = ",")
return(data.frame(ID=book_id,Title=title,Author=author,Rating=rating,
Collections=collections,Started_date=started_date,Started_stamp=started_stamp,
Finished_date=finished_date,Finished_stamp=finished_stamp))
},data$books))

Using R, group data together based on two values in a table

I need to build a "profile" of a data set, showing the number of data entries that lie between two values. I have been able to achieve the result using the "group_by" function, however the resultant output is not in a format that I can use further down my workflow. Here is that output:
What I need, is something that looks like this:
The "Data Count" column, I've not been able to populate but is there for illustration.
The code I am using is as follows;
library(formattable)
PML_Start = 0
PML_Max = 100000000
PML_Interval = 5000000
Lower_Band <- currency(seq(PML_Start, PML_Max-PML_Interval, PML_Interval),digits=0)
Upper_Band <- currency(seq(PML_Start+PML_Interval,PML_Max,PML_Interval),digits = 0)
PML_Profile <- data.frame("Lower Band"=Lower_Band,"Upper Band"=Upper_Band,"Data Count")
I know cannot figure out how to further populate this table. I gave this a go, but didn't really believe it would work.
PML_Profile <- Profiles_on_Data_Provided_26_9_17 %>%
group_by (Lower_Band) %>%
summarise("Premium" = sum(Profiles_on_Data_Provided_26_9_17$`Written Premium - Total`))
Any thoughts?

creating data frame using for loop for string data in R

I have csv file with following data.
i wanted to put this data in dataframe "dfSubClass".
After i will find unique subject list as "uniquesubject" and unique class list as "uniqueclass"form "dfSubClass".
Using "uniquesubject", "uniqueclass" and for loop i wanted to create all subject and class combinations as
csv and expected data
I tried following but its not working.
dfSubClass <- read.csv("SubjectClass.csv",header = TRUE)
uniquesubject = unique(planningItems["Subject"])
uniqueclass = unique(planningItems["Class"])
newDF <- data.frame()
for(Subject in 1:nrow(uniquesubject)){
for(Class in 1:nrow(uniqueclass)){
newDF = rbind(newDF,c(uniquesubject[Subject,],uniqueclass[Class,]))
}
}
this not giving me desired output please help .
I would suggest using the function expand.grid which will automatically generate all the combinations.
Also in your code unique(planningItems["Subject"]), it will return a data frame which is actually not a good idea for this case. A vector would be better.
Here is my code:
uniquesubject = unique(dfSubClass$Subject)
uniqueclass = unique(dfSubClass$Class)
newDF=expand.grid(uniquesubject,uniqueclass)
If using for loops, the main issue in your code is about the rbind function. Here is my code:
uniquesubject = unique(dfSubClass$Subject)
uniqueclass = unique(dfSubClass$Class)
newDF = data.frame()
for (Subject in 1:length(uniquesubject)){
for (Class in 1:length(uniqueclass)){
newDF=rbind(newDF,data.frame("Subject"=uniquesubject[Subject],"Class"=uniqueclass[Class]))
}
}
I think the main different to your code is that I created a dataframe inside the rbind() instead of creating a vector using c(). This is to make sure the result is in dataframe structure instead of a matrix.

Dynamic variable in grepl()

This is the continuation of the following thread:
Creating Binary Identifiers Based On Condition Of Word Combinations For Filter
Expected output is the same as per the said thread.
I am now writing a function that can take dynamic names as variables.
This is the code that I am aiming at, if I am to run it manually:
df <- df %>% group_by(id, date) %>% mutate(flag1 = if(eval(parse(text=conditions))) grepl(pattern, item_name2) else FALSE)
To make it take into consideration dynamic variable names, I have been doing the code this way:
groupcolumns <- c(id, date)
# where id and date will be entered into the function as character strings by the user
variable <- list(~if(eval(parse(text=conditions))) grepl(pattern, item) else FALSE)
# converting to formula to use with dynamically generated column names
# "conditons" being the following character vector, which I can automatically generate:
conditons <- "any(grepl("Alpha", Item)) & any(grepl("Bravo", Item))"
This becomes:
df <- df %>% group_by_(.dots = groupcolumns) %>% mutate_(.dots = setNames(variable, flags[1]))
# where flags[1] is a predefined vector of columns names that I have created
flags <- paste("flag", seq(1:100), sep = "")
The problem is, I am unable to do anything to the grepl function; to specify the "item" dynamically. If I do it this way, as "df$item", and do a eval(parse(text="df$item")), the intention of piping fails as I am doing a group_by_ and it results in an error (naturally). This also applies to the conditions that I set.
Does a way exists for me to tell grepl to use a dynamic variable name?
Thanks a lot (especially to akrun)!
edit 1:
tried the following, and now there is no problem of passing the name of the item into grepl.
variable <- list(~if(eval(parse(text=conditions))) grepl(pattern, as.name(item)) else FALSE)
However, the problem lies in that piping seems not to work, as the output of as.name(item) is seen as an object, which does not exist in the environment.
edit 2:
trying do() in dplyr:
variable <- list(~if(eval(parse(text=conditions))) grepl(pattern, .$deparse(as.name(item))) else FALSE)
df <- df %>% group_by_(.dots = groupcolumns) %>% do_(.dots = setNames(variable, combiflags[1]))
which throws me the error:
Error: object 'Item' not found
If I understand your question correctly, you want to be able to dynamically input both patterns and the object to be searched by these patterns in grepl? The best solution for you will depend entirely on how you choose to store the patterns and how you choose to store the objects to be searched. I have a few ideas that should help you though.
For dynamic patterns, try inputting a list of patterns using the paste function. This will allow you to search many different patterns at once.
grepl(paste(your.pattern.list, collapse="|"), item)
Lets say you want to set up a scenario where you are storing many patterns of interest in a directory. Perhaps collected automatically from a server, or from some other output. You can create lists of patterns if they are in separate files using this:
#set working directory
setwd("/path/to/files/i/want")
#make a list of all files in this directory
inFilePaths = list.files(path=".", pattern=glob2rx("*"), full.names=TRUE)
#perform a function for each file in the list
for (inFilePath in inFilePaths)
{
#grepl function goes here
#if each file in the folder is a table/matrix/dataframe of patterns try this
inFileData = read_csv(inFilePath)
vectorData=as.vector(inFileData$ColumnOfPatterns)
grepl(paste(vectorData, collapse="|"), item)
}
For dynamically specifying the item, you can use an almost identical framework
#set working directory
setwd("/path/to/files/i/want")
#make a list of all files in this directory
inFilePaths = list.files(path=".", pattern=glob2rx("*"), full.names=TRUE)
#perform a function for each file in the list
for (inFilePath in inFilePaths)
{
#grepl function goes here
#if each file in the folder is a table/matrix/dataframe of data to be searched try this
inFileData = read_csv(inFilePath)
grepl(pattern, inFileData$ColumnToBeSearched)
}
If this is too far off from what you envisioned, please update your question with details about how the data you are using is stored.

Resources