Issue scraping flairs from Reddit: Arguments imply differing number of rows? - r

I'm trying to scrape a subreddit using a RedditExtractoR library that I've modified, but I keep running into this error. I modified the get_thread_content.R file in the source code so that it looks like this:
#' Get thread contents of Reddit URLs
#'
#' This function takes a collection of URLs and returns a list with 2 data frames:
#' 1. a data frame containing meta data describing each thread
#' 2. a data frame with comments found in all threads
#'
#' The URLs are being retained in both tables which would allow you to join them if needed
#'
#' #param urls A vector of strings pointing to a Reddit thread
#' #return A list with 2 data frames "threads" and "comments"
#' #export
get_thread_content <- function(urls){
data <- lapply(urls, parse_thread_url)
list(
threads = lapply(data, function(z) z[["thread"]]) |> rbind_list(),
comments = lapply(data, function(z) z[["comments"]]) |> remove_na() |> rbind_list()
)
}
# Build a data frame with thread attributes of interest
build_thread_content_df <- function(json, request_url) {
if (is.null(json$link_flair_text))
{thread_flair_info <- 'no flair'}
else {thread_flair_info <- json$link_flair_text}
if (is.null(json$author_flair_text))
{author_flair_info <- 'no flair'}
else {author_flair_info <- json$author_flair_text}
df <- data.frame(
url = strip_json(request_url),
author = json$author,
author_flair_text = author_flair_info,
date = timestamp_to_date(json$created_utc),
timestamp = json$created_utc,
title = json$title,
text = json$selftext,
thread_flair = thread_flair_info,
subreddit = json$subreddit,
score = json$score,
upvotes = json$ups,
downvotes = json$downs,
up_ratio = json$upvote_ratio,
total_awards_received = json$total_awards_received,
golds = json$gilded,
cross_posts = json$num_crossposts,
comments = json$num_comments,
stringsAsFactors = FALSE
)
return(df)
}
nullfix <- function(x){
if(is.null(x))
{x <- "no flair"}
else {x}
}
# Build a data frame with comments and their attributes.
build_comments_content_df <- function(json, request_url) {
data.frame(
url = strip_json(request_url),
author = extract_comments_attributes(json, "author"),
comment_author_flair = nullfix(extract_comments_attributes(json, "author_flair_text")),
date = extract_comments_attributes(json, "created_utc") |> timestamp_to_date(),
timestamp = extract_comments_attributes(json, "created_utc"),
score = extract_comments_attributes(json, "score"),
upvotes = extract_comments_attributes(json, "ups"),
downvotes = extract_comments_attributes(json, "downs"),
golds = extract_comments_attributes(json, "gilded"),
comment = extract_comments_attributes(json, "body"),
comment_id = build_comment_ids(json),
stringsAsFactors = FALSE
)
}
Everything works except the comment_author_flair part. I initially tried using similar code to what I used for the get_thread_content function, but that failed, so I separated out the function (see: nullfix) and tried to apply it to the basic code, but that still isn't working, as I get the same error.
As a reproducable example, after modifying that library:
thread <- get_thread_content("https://www.reddit.com/r/SSBM/comments/10ys20y/who_were_the_least_clutch_players/")
I would expect to get 69 flair values with this code, one for each comment, with NULL values replaced by "no flair" because of the nullfix function. Instead I get the error:
Error in data.frame(url = strip_json(request_url), author = extract_comments_attributes(json, :
arguments imply differing number of rows: 1, 69, 29

Related

R: Collect All Function Definitions from a Library

I am working with R. I found this previous post on stackoverflow which shows how to get a "list" of all functions that belong to a given library:
How to find all functions in an R package?
For example:
#load desired library
library(ParBayesianOptimization)
#find out all functions from this library
getNamespaceExports("ParBayesianOptimization")
[1] "addIterations" "getLocalOptimums" "bayesOpt" "getBestPars" "changeSaveFile" "updateGP"
The above code tells me the name of all functions that are used in the "ParBayesianOptimization" library. From here, I could manually inspect each one of these functions - for example:
# manually inspect any one of these functions
getAnywhere(bayesOpt)
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
#function stats here
function (FUN, bounds, saveFile = NULL, initGrid, initPoints = 4,
iters.n = 3, iters.k = 1, otherHalting = list(timeLimit = Inf,
minUtility = 0), acq = "ucb", kappa = 2.576, eps = 0,
parallel = FALSE, gsPoints = pmax(100, length(bounds)^3),
convThresh = 1e+08, acqThresh = 1, errorHandling = "stop",
plotProgress = FALSE, verbose = 1, ...)
{
startT <- Sys.time()
optObj <- list()
etc etc etc ...
saveFile = saveFile, verbose = verbose, ...)
return(optObj)
}
#function ends here
<bytecode: 0x000001cbb4145db0>
<environment: namespace:ParBayesianOptimization>
Goal : Is it possible to take each one of these functions and create a notepad file with their full definitions?
Something that would look like this:
My attempt:
I thought I could first make an "object" in R that contained all the functions found in this library:
library(plyr)
a = getNamespaceExports("ParBayesianOptimization")
my_list = do.call("rbind.fill", lapply(a, as.data.frame))
X[[i]]
1 addIterations
2 getLocalOptimums
3 bayesOpt
4 getBestPars
5 changeSaveFile
6 updateGP
Then, I could manually create an "assignment arrow":
header_text <- rep("<-")
Then, "paste" this to each function name:
combined_list <- as.character(paste(my_list, header_text, sep = ""))
But this is not looking correct:
combined_list
[1] "c(\"addIterations\", \"getLocalOptimums\", \"bayesOpt\", \"getBestPars\", \"changeSaveFile\", \"updateGP\")<- "
The goal is to automate the process of manually copying/pasting :
function_1 = getAnywhere("first function ParBayesianOptimization library")
function_2 = getAnywhere("second function ParBayesianOptimization library")
etc
final_list = c(function_1, function_2 ...)
And removing the generic description from each function:
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
In the end, if I were to "call" the final_list object, all the functions from this library should get recreated and reassigned.
Can someone please show me how to do this?
Thanks
You can use the dump function for this
pkg <- "ParBayesianOptimization"
dump(getNamespaceExports(pkg), file="funs.R", envir = asNamespace(pkg))
This code will help you write the function definitions of all the functions in a library to a text file.
fn_list <- getNamespaceExports("ParBayesianOptimization")
for(i in seq_along(fn_list)) {
header <- paste('\n\n####Function', i, '\n\n\n')
cat(paste0(header, paste0(getAnywhere(fn_list[i]), collapse = '\n'), '\n\n'),
file = 'function.txt', append = TRUE)
}

Problem with for loop when downloading species occurrence data

I want to download the occurrence data from gbif website and I use the following R script. When I run the script, I got an error with the following message "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0)". It would be highly appreciated if anyone could help me with this.
My data: data
My R script:
flist<-read_excel("Mekong fish.xlsx",sheet="Sheet1")
##Loop
fname<-list()
Occ<-list()
datfish<-list()
name_list<-unique(flist$Updated_name)
# create for loop to produce ggplot2 graphs
for (i in seq_along(name_list)) {
# create plot for each Occurrence in df
Occ[[i]] <-occ_search(scientificName = name_list[i], limit=2)
fname[[i]]<-occ_search(scientificName = name_list[i],
fields = c("species", "country","decimalLatitude", "decimalLongitude"),
hasCoordinate=T, limit= Occ[[i]]$meta[4],return ="data")
datfish[[i]]<-as.data.frame(fname[[i]]$data)
}
I got a different error:
Expecting logical in D1424 / R1424C4: got 'in Lao'Expecting logical in D1426 / R1426C4: got 'in China'Expecting logical in D1467 / R1467C4: got 'only Cambodia'Expecting logical in D1469 / R1469C4: got 'only in VN'Expecting logical in D1473 / R1473C4: got 'only in China'Expecting logical in D1486 / R1486C4: got 'only in Malaysia'Expecting logical in D1488 / R1488C4: got 'only 1 point in VN'
I think the problem is caused in some fields in the 4th column. I don't have the right packages installed to run your code. But I got a different error (package missing) once i dropped the fourth column.
flist<-read_excel("~/Downloads/Mekong fish.xlsx",sheet="Sheet1")
flist <=subset(flist, select = -4)
...
EDIT:
This worked for me. read_excel assigned column 4 the type boolean. When I explicitly set it to text it worked.
library(readxl)
library(rgbif)
library(raster)
flist<-read_excel("~/Downloads/Mekong fish.xlsx",
sheet="Sheet1",
col_types = c("numeric", "text", "numeric", "text"))
flist
##Loop
fname<-list()
Occ<-list()
datfish<-list()
name_list<-unique(flist$Updated_name)
# create for loop to produce ggplot2 graphs
for (i in seq_along(name_list[1:2])) {
message(i)
# # create plot for each Occurrence in df
Occ[[i]] <-occ_search(scientificName = name_list[i], limit=2)
message(Occ[[i]])
fname[[i]]<-occ_search(scientificName = name_list[i],
fields = c("species", "country","decimalLatitude", "decimalLongitude"),
hasCoordinate=T, limit= Occ[[i]]$meta[4],return ="data")
message(fname[[i]])
datfish[[i]]<-as.data.frame(fname[[i]]$data)
message(datfish[[i]])
}
> 1
> list(offset = 0, limit = 2, endOfRecords = FALSE, count = >15)list(list(name = c("Animalia", "Chordata", "Actinopterygii",
> "Cypriniformes", "Cyprinidae", "Aaptosyax", "Aaptosyax grypus"), key = > > c("1", "44", "204", "1153", "7336", "2363805", "2363806"),
> etc...

Difficulty in downloading TCGA data

I am trying to download the TCGA data but I am getting this error:
Error in summarizeMaf(maf = maf, anno = clinicalData, chatty =
verbose): Tumor_Sample_Barcode column not found in provided clinical
data. Rename column containing sample names to Tumor_Sample_Barcode if
necessary.
This is my code:
library("TCGAbiolinks")
library("tidyverse")
library(maftools)
query <- GDCquery( project = "TCGA-LIHC",
data.category = "Clinical",
file.type = "xml",
legacy = FALSE)
GDCdownload(query,directory = ".")
clinical <- GDCprepare_clinic(query, clinical.info = "patient",directory = ".")
#getting the survival time of event data
survival_data <- as_tibble(clinical[,c("days_to_last_followup","days_to_death","vital_status","bcr_patient_barcode","patient_id")])
survival_data <- filter(survival_data,!is.na(days_to_last_followup)|!is.na(days_to_death)) #not both NA
survival_data <- filter(survival_data,!is.na(days_to_last_followup)|days_to_last_followup>0 &is.na(days_to_death)|days_to_death > 0 ) #ensuring positive values
survival_data <- survival_data[!duplicated(survival_data$patient_id),] #ensuring no duplicates
dim(survival_data) #should be 371
maf <- GDCquery_Maf("LIHC", pipelines = "muse")
#maf <- GDCquery_Maf("LIHC", pipelines = "somaticsniper")
#clin <- GDCquery_clinic("TCGA-LIHC","clinical")
#print(clin )
laml = read.maf(
maf,
clinicalData = clinical,
removeDuplicatedVariants = TRUE,
useAll = TRUE,
gisticAllLesionsFile = NULL,
gisticAmpGenesFile = NULL,
gisticDelGenesFile = NULL,
gisticScoresFile = NULL,
cnLevel = "all",
cnTable = NULL,
isTCGA = TRUE,
vc_nonSyn = NULL,
verbose = TRUE
)
You should have: a) loaded with library(maftools) and b) included what was printed out before that error message:
-Validating
-Silent variants: 18306
-Summarizing
--Possible FLAGS among top ten genes:
TTN
MUC16
OBSCN
FLG
-Processing clinical data
Available fields in provided annotations..
[1] "bcr_patient_barcode" "additional_studies"
[3] "tissue_source_site" "patient_id"
# snipped remaining 78 column names
Notice that the first column is not named "Tumor_Sample_Barcode", so you need to follow the helpful error message directions and rename the appropriate column which appears to be the first one:
ns. After doing so I get:
-Validating
-Silent variants: 18306
-Summarizing
--Possible FLAGS among top ten genes:
TTN
MUC16
OBSCN
FLG
-Processing clinical data
-Finished in 1.911s elapsed (2.470s cpu)

Why does the font size change after I click the save button?

first post and I hope I'm doing everything correctly.
I'm using RStudio: Version 1.1.423; R-Version: 3.4.2, OS: Windows 10
Short problem summary: I start a gwidgets based GUI and the text shows in the desired font size, but when I click the save button, it suddenly changes to a smaller font.
Context: I'm currently doing a systematic literature review and I use the Metagear package by mjlajeunesse (https://github.com/cran/metagear) to screen the abstracts and decide if I shuold include them. Here's a picture of the original GUI:
Original Abstract Screener GUI
Short description of functionality: I just click on one of the buttons (Yes, No, Maybe) and the abstract screener moves on to the next abstract in the file. I then click save to save my progress.
Changes I made to the code: However, I added some additional fields so I had to change that behaviour, so that it doesn't immediately jump to the next entry as soon as I click one of the buttons. So I merged that part of the function with the save function.
Now I get the following behaviour: When I first open the abstract screener the fontsize is as desired as seen in this picture:
Adapted Abstract Screener before clicking Save & Next
As soon as I click Save & Next it shows me the next abstract (as desired), but suddenly the font changes to a much smaller one, making it rather unpleasant to read, because it's so small, as seen here
Adapted Abstract Screener after clicking Save & Next
Here's the code that I use/adapted to override the original functions of the metagear package. I iteratively adapted the code so I only included the functions of the metagear package that I needed to make things work.
Here's the code I'm using(I included the code I use to execute the different functions at the bottom, including some sample data frame, instead of my CSV file):
# Load libraries
library(metagear)
library(EBImage)
library(cairoDevice)
library(gWidgets)
library(gWidgetsRGtk2)
library(gWidgets2)
library(gWidgets2RGtk2)
library(RGtk2)
library(utils)
##############################
# dirty adaptations for the METAGEAR package start here
##############################
##########################
# better_effort_utils (important for effort_save)
##########################
effort_save <- function (aDataFrame,
column_name = "REVIEWERS",
directory = getwd(),
quiet = FALSE) {
split_List <- split(aDataFrame, f = aDataFrame[, column_name])
lapply(split_List, function(x, column_name) {
fileName <- paste0("effort_", x[1, column_name], ".csv")
if(file.exists(file.path(directory, fileName))) {
while (file.exists(file.path(directory, fileName)) != FALSE) fileName <- renameFile(fileName)
if(!quiet) message(paste0("File already existed, renamed: ", fileName))
}
fileName <- file.path(directory, fileName)
write.csv(x, file = fileName, na = "NA", row.names = FALSE)
}, column_name = column_name)
if(!quiet) message(paste0(length(split_List), " files saved in: ", directory))
}
######################
# better_effort_initialize
######################
better_effort_initialize <- function(aDataFrame,
study_ID = TRUE,
unscreenedValue = "not vetted",
unscreenedCommentValue = "Please paste text here",
studyTypeValue = "not vetted",
studyTypeCommentValue = "Please paste text here",
dual = FALSE,
front = TRUE) {
# adding new columns start, add the new columns as part of the "if" statement as well as the "else" statement
if(dual == TRUE) {
new_cols <- data.frame(REVIEWERS_A = NA,
INCLUDE_A = unscreenedValue,
REVIEWERS_B = NA,
INCLUDE_B = unscreenedValue,
INCLUDE_COMMENT = unscreenedCommentValue,
TYPE_OF_STUDY = studyTypeValue,
TYPE_OF_STUDY_COMMENT = studyTypeCommentValue) # There will have to be a version A and version B, just like with the reviewers
}
else new_cols <- data.frame(REVIEWERS = NA,
INCLUDE = unscreenedValue,
INCLUDE_COMMENT = unscreenedCommentValue,
TYPE_OF_STUDY = studyTypeValue,
TYPE_OF_STUDY_COMMENT = studyTypeCommentValue)
if(study_ID == TRUE) new_cols <- cbind(data.frame(STUDY_ID = 1:nrow(aDataFrame)),
new_cols)
if(front == TRUE) newDataFrame <- cbind(new_cols, aDataFrame)
else newDataFrame <- cbind(aDataFrame, new_cols)
return(newDataFrame)
}
#################################
# better_effort_distribute
#################################
better_effort_distribute <- function (aDataFrame = NULL,
dual = FALSE,
reviewers = theTeam,
column_name = "REVIEWERS",
effort = NULL,
initialize = FALSE,
save_split = FALSE,
directory = getwd() ) {
if(is.null(reviewers)) .metagearPROBLEM("error",
"no reviewers were assigned")
if(is.null(aDataFrame)) .metagearPROBLEM("error",
"a dataframe with refs was not specified")
number_REFS <- nrow(aDataFrame)
number_reviewers <- length(reviewers)
# add REVIEWER, STUDY_ID, and INCLUDE columns to reference library
if(initialize == TRUE) aDataFrame <- better_effort_initialize(aDataFrame,
dual = dual)
if(dual == TRUE) {
if(!is.null(effort)) .metagearPROBLEM("error",
"can only assign dual effort evenly among reviewers")
if(number_reviewers %% 2 == 1) .metagearPROBLEM("error",
"can only assign dual effort with an even number of reviewers")
reviewers_A <- reviewers[1:number_reviewers %% 2 == 1]
theEffort_A <- gl(length(reviewers_A),
ceiling(number_REFS / length(reviewers_A)),
number_REFS,
labels = reviewers_A)
reviewers_B <- reviewers[1:number_reviewers %% 2 == 0]
theEffort_B <- gl(length(reviewers_B),
ceiling(number_REFS / length(reviewers_B)),
number_REFS,
labels = reviewers_B)
dualEffort <- data.frame(A = theEffort_A, B = theEffort_B)
theEffort <- dualEffort[sample(nrow(dualEffort),
nrow(dualEffort), replace = FALSE), ]
aDataFrame["REVIEWERS_A"] <- theEffort["A"]
aDataFrame["REVIEWERS_B"] <- theEffort["B"]
}
else {
# generate reviewers tasks evenly or via custom 'effort'
if(is.vector(effort) && length(unique(effort)) != 1) {
if(sum(effort) != 100) .metagearPROBLEM("error",
"Effort does not sum to 100.")
theEffort <- rep(reviewers, round((number_REFS * (effort / 100))))
} else {
theEffort <- gl(number_REFS,
ceiling(number_REFS / number_reviewers),
number_REFS,
labels = reviewers)
}
# randomly populate REVIEWERS column with tasks
aDataFrame[, column_name] <- sample(theEffort,
length(theEffort),
replace = FALSE)
}
# splits reference library into seperate reviewer csv files and
# hides teams if dual reviewing
if(save_split == TRUE) {
if(dual == TRUE) {
removeVars <- names(aDataFrame) %in% c("REVIEWERS_B", "INCLUDE_B")
effort_save(aDataFrame[!removeVars],
column_name = "REVIEWERS_A", directory)
removeVars <- names(aDataFrame) %in% c("REVIEWERS_A", "INCLUDE_A")
effort_save(aDataFrame[!removeVars],
column_name = "REVIEWERS_B", directory)
}
else effort_save(aDataFrame, column_name, directory)
}
return(aDataFrame)
}
#######################################
# better_abstract_screener
#######################################
#' #param file The file name and location of a .csv file containing the
#' abstracts and titles. The .csv file should have been initialized with
#' \code{effort_initialize} and populated with screeners (reviewers) using
#' \code{better_effort_distribute}.
#' #param aReviewer The name (a string) of the reviewer to screen abstracts.
#' It is used when there are multiple reviewers assigned to screen abstracts.
#' The default column label is "REVIEWERS" as initialized with
#' \code{better_effort_distribute}.
#' #param reviewerColumnName The name of the column heading in the .csv file
#' that contains the reviewer names that will screen abstracts. The default
#' column label is "REVIEWERS".
#' #param unscreenedColumnName The name of the column heading in the .csv file
#' that contains the screening outcomes (i.e. vetting outcomes by a reviewer).
#' Unscreened references are by default labeled as "not vetted". The
#' reviewer then can code to "YES" (is a relevant study), "NO" is not relevant
#' and should be excluded, or "MAYBE" if the title/abstract is missing or
#' does not contains enough information to fully assess inclusivity.
#' The default label of this column is "INCLUDE".
#' #param unscreenedValue Changes the default coding (a string) of "not vetted"
#' that designates whether an abstract remains to be screened or vetted.
#' #param abstractColumnName The name of the column heading in the .csv file
#' that contains the abstracts. The default label of this column is
#' "ABSTRACT".
#' #param titleColumnName The name of the column heading in the .csv file
#' that contains the titles. The default label of this column is "TITLE".
#' #param browserSearch Change the url for the browser title search; the
#' default is Google.
#' #param protect When \code{"TRUE"}, prevents the title and abstract from being
#' clicked, selected or edited.
#' #param fontSize Change the font size of the title and abstract text.
#' #param windowWidth Change the default width of the GUI window.
#' #param windowHeight Change the default height of the GUI window.
#' #param buttonSize Change the default size of the "YES" and "NO" buttons.
#'
#'
#' #return NULL
#'
#' #examples \dontrun{
#'
#' data(example_references_metagear)
#' better_effort_distribute(example_references_metagear,
#' initialize = TRUE, reviewers = "marc", save_split = TRUE)
#' abstract_screener("effort_marc.csv", aReviewer = "marc")
#'}
#'
#' #note \strong{Installation and troubleshooting}\cr\cr Upon first use,
#' \code{abstract_screener} will download the gWidgets package
#' and associated toolkits needed to build GUI interfaces. A small window will
#' also prompt you to download GTK+ asking "Need GTK+ ?". From the listed
#' options answer: "Install GTK+" and click 'OK'. Once installed these will
#' not be downloaded again. Sometimes there is an issue with the installation
#' of GTK+, see \url{http://www.learnanalytics.in/blog/?p=31} for advice based
#' on the \code{Rattle} R Package (both \code{Rattle} and \code{metagear} use
#' the same GUI dependencies). \cr\cr \strong{How to use the screener}
#' \cr\cr The GUI itself will appear as a single window with the first
#' title/abstract listed in the .csv file. If abstracts have already been
#' screened/coded, it will begin at the nearest reference labeled as
#' "not vetted". The SEARCH WEB button opens the default browser and
#' searches Google with the title of the reference. The YES, MAYBE, NO
#' buttons, which also have shortcuts ALT-Y and ALT-N, are used to code the
#' inclusion/exclusion of the reference. Once clicked/coded the next
#' reference is loaded. The SAVE button is used to save the coding progress
#' of screening tasks. It will save coding progress directly to the
#' loaded .csv file. \strong{Closing the GUI and not saving will result in
#' the loss of screening efforts relative to last save.} \cr\cr There is
#' also an ISSUE FIXES menu bar with quick corrections to screening errors.
#' These include ISSUE FIXES: REFRESH TITLE AND ABSTRACT TEXT which reloads
#' the text of the current abstract in case portions were deleted when
#' copying and pasting sections (this can be avoided if
#' \code{protect = TRUE} is enabled), ISSUE FIXES: STATUS OF CURRENT ABSTRACT
#' which provides information on whether or not the abstract was previously
#' screened, and ISSUE FIXES: RETURN TO PREVIOUS ABSTRACT that
#' backtracks to the previous abstract if a selection error occurred (note a
#' warning will appear of there is a change to its inclusion/exclusion
#' coding).
#'
#' #import gWidgets
#' #import gWidgetsRGtk2
#' #importFrom utils browseURL read.csv write.csv
#' #export abstract_screener
############################
better_abstract_screener <- function(file = file.choose(),
aReviewer = NULL,
reviewerColumnName = "REVIEWERS",
unscreenedColumnName = "INCLUDE",
unscreenedValue = "not vetted",
unscreenedCommentColumnName = "INCLUDE_COMMENT",
unscreenedCommentValue = "",
studyTypeColumnName = "TYPE_OF_STUDY",
studyTypeValue = "not vetted",
studyTypeCommentColumnName = "TYPE_OF_STUDY_COMMENT",
studyTypeCommentValue = "",
abstractColumnName = "Abstract",
titleColumnName = "Title",
browserSearch = "https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=",
protect = FALSE,
fontSize = 13,
windowWidth = 700,
windowHeight = 400,
buttonSize = 30) {
# get file with abstract
aDataFrame <- read.csv(file, header = TRUE)
# subset abstracts based on the current screener (aka 'reviewer')
subData <- subset(aDataFrame, aDataFrame[reviewerColumnName] == aReviewer)
subData <- data.frame(lapply(subData, as.character), stringsAsFactors = FALSE)
# check if all abstracts have been already vetted
if(unscreenedValue %in% subData[, unscreenedColumnName] ) {
# start screener at first unvetted abstract
currentItem <- max.col(t(subData[unscreenedColumnName] == unscreenedValue),
"first")
} else {
.metagearPROBLEM("error",
paste("all abstracts have already been screened,
no more abstracts coded as:", unscreenedValue))
}
options("guiToolkit" = "RGtk2")
# used to update the reference list
theAnswer <- function(theValue, ...) {
updateAll(theValue, ...)
}
theAnswer_INCLUDE_COMMENT <- function(theValue_INCLUDE_COMMENT, ...) {
updateAll_INCLUDE_COMMENT(theValue_INCLUDE_COMMENT, ...)
}
theAnswer_ToS <- function(theValue_ToS, ...) {
updateAll_ToS(theValue_ToS, ...)
}
theAnswer_ToS_COMMENT <- function(theValue_ToS_COMMENT, ...) {
updateAll_ToS_COMMENT(theValue_ToS_COMMENT, ...)
}
# helper function used to update and keep track of screened abstracts
updateAll <- function(theValue, ...) {
if(currentItem <= nrow(subData)) {
subData[[currentItem, unscreenedColumnName]] <<- theValue
}
}
updateAll_INCLUDE_COMMENT <- function(theValue_INCLUDE_COMMENT, ...) { ################### EXPERIMENTAL
if(currentItem <= nrow(subData)) {
subData[[currentItem, unscreenedCommentColumnName]] <<- theValue_INCLUDE_COMMENT
}
}
updateAll_ToS <- function(theValue_ToS, ...) {
if(currentItem <= nrow(subData)) {
subData[[currentItem, studyTypeColumnName]] <<- theValue_ToS
}
}
updateAll_ToS_COMMENT <- function(theValue_ToS_COMMENT, ...) {
if(currentItem <= nrow(subData)) {
subData[[currentItem, studyTypeCommentColumnName]] <<- theValue_ToS_COMMENT
}
}
####################
# START of SCREENER GUI
####################
win <- gwindow("metagear: Abstract Screener", visible = TRUE)
paned <- ggroup(container = win, horizontal = FALSE)
# Frame(s) for title, abstract and websearch button
#####
#beginnig frame_TITLE
frame_TITLE <- gframe("Title", container = paned, horizontal = TRUE)
text_TITLE <- gtext(subData[currentItem, titleColumnName],
container = frame_TITLE,
expand = TRUE,
font.attr = list(style = "normal", size = fontSize))
size(text_TITLE) <- c(windowWidth, 50)
if(protect == TRUE) enabled(text_TITLE) <- FALSE
addSpace(frame_TITLE, 2)
aButton_webSearch <- gbutton("Search\n Web",
container = frame_TITLE,
handler = function(h, ...)
browseURL(paste0(browserSearch,
subData[currentItem, titleColumnName])))
size(aButton_webSearch) <- c(50, 40)
addSpace(frame_TITLE, 5)
# end of frame_TITLE
# beginning frame_Abstract
frame_ABSTRACT <- gframe("Abstract", container = paned, horizontal = FALSE)
text_ABSTRACT <- gtext(subData[currentItem, abstractColumnName],
container = frame_ABSTRACT,
expand = TRUE,
font.attr = list(style = "normal", size = fontSize)
)
size(text_ABSTRACT) <- c(windowWidth + 50, 300)
if(protect == TRUE) enabled(text_ABSTRACT) <- FALSE
# end of frame_ABSTRACT
# beginning: Type of Study (ToS), buttons and comment
frame_TYPE_OF_STUDY <- gframe("Type of Study", container = paned, horizontal = TRUE)
addSpace(frame_TYPE_OF_STUDY, 20)
radioButtonNames_ToS <- c("not vetted", "empirical", "conceptual", "review")
radioButtons_ToS <- gradio(radioButtonNames_ToS, selected = which(radioButtonNames_ToS == subData[currentItem, studyTypeColumnName]), horizontal = FALSE, container = frame_TYPE_OF_STUDY,
handler = function(h,...){
svalue(h) <- subData[currentItem, studyTypeColumnName]
theAnswer_ToS(theValue_ToS = radioButtonNames_ToS[svalue(radioButtons_ToS, index = TRUE)], ...) # EXPERIMENTAL: Need to find a way t make the 3 a dynamic option that shows current selection (Current solution seems to be working)
}
)
frame_TYPE_OF_STUDY_COMMENT <- gframe("Please copy the text snippet upon which you base your decision",
container = frame_TYPE_OF_STUDY, horizontal = TRUE)
text_TYPE_OF_STUDY_COMMENT <- gtext(subData[currentItem, studyTypeCommentColumnName],
container = frame_TYPE_OF_STUDY_COMMENT,
#container = frame_TYPE_OF_STUDY, ### change to this frame and disable frame_TYPE_OF_STUDY_COMMENT, if you want the text field width to be expandable
expand = TRUE,
font.attr = list(style = "normal", size = fontSize),
handler = function(h, ...) {
theAnswer_ToS_COMMENT(theValue_ToS_COMMENT = svalue(text_TYPE_OF_STUDY_COMMENT))
}
)
size(text_TYPE_OF_STUDY_COMMENT) <- c(windowWidth, 70)
# end: Type of Study (ToS), buttons and comment
# beginning: Include/Exclude (YES/NO), buttons and comment
frame_INCLUDE <- gframe("Should the Abstract be included?", container = paned, horizontal = TRUE)
addSpace(frame_INCLUDE, 20)
radioButtonNames_INCLUDE <- c("not vetted", "YES", "NO", "MAYBE")
radioButtons_INCLUDE <- gradio(radioButtonNames_INCLUDE, selected = which(radioButtonNames_INCLUDE == subData[currentItem, unscreenedColumnName]),
horizontal = FALSE, container = frame_INCLUDE,
handler = function(h,...){
svalue(h) <- subData[currentItem, unscreenedColumnName]
theAnswer(theValue = radioButtonNames_INCLUDE[svalue(radioButtons_INCLUDE, index = TRUE)], ...)
}
)
frame_INCLUDE_COMMENT <- gframe("Please copy the text snippet upon which you base your decision", container = frame_INCLUDE,
horizontal = TRUE)
text_INCLUDE_COMMENT <- gtext(subData[currentItem, unscreenedCommentColumnName],
container = frame_INCLUDE_COMMENT,
# container = frame_INCLUDE, ### change to this frame and disable frame_INCLUDE_COMMENT, if you want the text field width to be expandable
expand = TRUE,
font.attr = list(style = "normal", size = fontSize),
handler = function(h, ...) {
theAnswer_INCLUDE_COMMENT(theValue_INCLUDE_COMMENT = svalue(text_INCLUDE_COMMENT))
}
)
size(text_INCLUDE_COMMENT) <- c(windowWidth, 70)
#end: Include/Exclude (YES/NO), buttons and comment
#beginning: Progress and Save
buttons_paned <- ggroup(container = paned)
addSpace(buttons_paned, 50)
frame_PROGRESS <- gframe("Progress", container = buttons_paned, expand = TRUE)
addSpace(frame_PROGRESS, 10)
text_progress <- glabel(paste0("Reviewer: ",
aReviewer,
" | ",
round(((currentItem - 1.0)/nrow(subData)) * 100, digits = 1),
"% complete (", currentItem, " of ",
nrow(subData) , ")"),
container = frame_PROGRESS)
aButton_save <- gbutton("SAVE & CONTINUE",
container = frame_PROGRESS,
handler = function(h, ...)
{
{
write.csv(subData,
file = file, #file was dataFilename
row.names = FALSE)
svalue(text_lastSaved) <- paste("last saved: ", Sys.time())
}
currentItem <<- currentItem + 1
if(currentItem > nrow(subData))
{svalue(text_ABSTRACT) <- "You have screened all the Abstracts!"
svalue(text_TITLE) <- ""
} else
{svalue(text_ABSTRACT) <- subData[currentItem, abstractColumnName]
svalue(text_TITLE) <- subData[currentItem, titleColumnName]
svalue(text_progress) <- paste0(
round(((currentItem - 1.0)/nrow(subData)) * 100, digits = 1),
"% complete (", currentItem, " of ", nrow(subData), ")")
svalue(text_TYPE_OF_STUDY_COMMENT) <- subData[currentItem, studyTypeCommentColumnName]
svalue(radioButtons_ToS) <- subData[currentItem, studyTypeColumnName]
svalue(radioButtons_INCLUDE) <- subData[currentItem, unscreenedColumnName]
svalue(text_INCLUDE_COMMENT) <- subData[currentItem, unscreenedCommentColumnName]
}
}
)
size(aButton_save) <- c(130, 40)
text_lastSaved <- glabel(paste(" last saved: not this session"), # empty space at beginning on purpose to leave some distance to button, as addSpace() screwed things up
container = frame_PROGRESS)
#end of Progress and Save
#end of buttons_paned
#end of paned
visible(win) <- TRUE
# end of win
}
###############
# Preparation of the files and starting of the GUI
###############
# Set working directory
setwd("C:/Users/Hilser/Documents/R")
# Add team member(s)
theTeam <- c("Stefan")
#import .csv-file and name it review" ##### REPLACED BY SAMPLE DATA BELOW
# review <- read.csv("C:/Users/Hilser/Documents/R/Metagear/Better_Metagear/review_duplicates_removed.csv", header = TRUE, sep = ",")
# Create Sample Data instead of importing .csv-file
Title <- c("Title 1", "Title 2", "Title 3", "Title 4")
Abstract <- c( "Abstract 1", "Abstract 2", "Abstract 3", "Abstract 4")
review <- data.frame(Title, Abstract)
# Prepare file: Adding columns "STUDY_ID", "REVIEWERS", "INCLUDE" (and fill with "not vetted"), "INCLUDE_COMMENT", "TYPE_OF_STUDY"
review_initialized <- better_effort_initialize(review, study_ID = TRUE, unscreenedValue = "not vetted", dual = FALSE, front = TRUE)
# distribute effort amongst Reviewers, adding their name (reviewers = ) to the column "REVIEWERS"
review_distributed <- better_effort_distribute(review_initialized, dual = FALSE, reviewers = theTeam, column_name = "REVIEWERS",
effort = 100, initialize = FALSE, save_split = TRUE)
# opens a graphical user interface (GUI) that allows for abstract screening
better_abstract_screener("effort_Stefan.csv", aReviewer = "Stefan")
From the described behaviour, I assume that the error lies somewhere in one of the following functions:
aButton_save # The button that saves the file and moves to the next abstract
the theAnswer functions # used to update the reference list
the updateAll functions # helper function used to update and keep track of screened abstracts
Sorry, my guess is that the font.attribute you specify for text_TITLE isn't being preserved after you call svalue(text_TITLE) <-. The easy solution would be to call font(text_TITLE)<- ... after setting the text. The proper solution would be to fix this method: https://github.com/jverzani/gWidgets2RGtk2/blob/master/R/gtext.R#L84 (this should use the method insert_text https://github.com/jverzani/gWidgets2RGtk2/blob/master/R/gtext.R#L160 which does adjust for the font attribute.
With that in mind, you might try this 1-2 punch to set text:
svalue(text_TITLE) <- ""
text_TITLE$insert_text(new_text, "beginning")

How to access data saved in an assign construct?

I made a list, read the list into a for loop, do some calculations with it and export a modified dataframe to [1] "IAEA_C2_NoStdConditionResiduals1" [2] "IAEA_C2_EAstdResiduals2" ect. When I do View(IAEA_C2_NoStdConditionResiduals1) after the for loop then I get the following error message in the console: Error in print(IAEA_C2_NoStdConditionResiduals1) : object 'IAEA_C2_NoStdConditionResiduals1' not found, but I know it is there because RStudio tells me in its Environment view. So the question is: How can I access the saved data (in this assign construct) for further usage?
ResidualList = list(IAEA_C2_NoStdCondition = IAEA_C2_NoStdCondition,
IAEA_C2_EAstd = IAEA_C2_EAstd,
IAEA_C2_STstd = IAEA_C2_STstd,
IAEA_C2_Bothstd = IAEA_C2_Bothstd,
TIRI_I_NoStdCondition = TIRI_I_NoStdCondition,
TIRI_I_EAstd = TIRI_I_EAstd,
TIRI_I_STstd = TIRI_I_STstd,
TIRI_I_Bothstd = TIRI_I_Bothstd
)
C = 8
for(j in 1:C) {
#convert list Variable to string for later usage as Variable Name as unique identifier!!
SubNameString = names(ResidualList)[j]
SubNameString = paste0(SubNameString, "Residuals")
#print(SubNameString)
LoopVar = ResidualList[[j]]
LoopVar[ ,"F_corrected_normed"] = round(LoopVar[ ,"F_corrected_normed"] / mean(LoopVar[ ,"F_corrected_normed"]),
digit = 5
)
LoopVar[ ,"F_corrected_normed_error"] = round(LoopVar[ ,"F_corrected_normed_error"] / mean(LoopVar[ ,"F_corrected_normed_error"]),
digit = 5
)
assign(paste(SubNameString, j), LoopVar)
}
View(IAEA_C2_NoStdConditionResiduals1)
Not really a problem with assign and more with behavior of the paste function. This will build a variable name with a space in it:
assign(paste(SubNameString, j), LoopVar)
#simple example
> assign(paste("v", 1), "test")
> `v 1`
[1] "test"
,,,, so you need to get its value by putting backticks around its name so the space is not misinterpreted as a parse-able delimiter. See what happens when you type:
`IAEA_C2_NoStdCondition 1`
... and from here forward, use paste0 to avoid this problem.

Resources