Loop returning svalue from combobox - r

This is my first post here but I am a regular stackoverflow visitor.
For uploading new datasets, I am processing a dataframe in which one column has some typing mistake. I want users to modify the error from a gcombobox, thus the errors and the correct value will be stored and automatically corrected the next time.
# Sample data which includes a wrong countryid
Incorrect_Country = data.frame(id=c(1,2,3), countryid=c("Canadada", "Peruru", "Chinanan"), othercolumn=c("777", "111", "333"))
# Dataframe where some previous pitfalls have been stored
#(it´s useful because the model can learn from previous pitfalls)
Country_Recode = data.frame(id=c(1,2,3), Remote.Name=c("Frankekz", "Potuugal", "Mexxico"), Name=c("France", "Portugal", "Mexico"))
# This table presents values for the combobox
Master_Country = data.frame(name=c("Canada", "Peru", "China", "France", "Portugal", "Mexico"))
This is the code: ( GUI toolkit :gWidgetstcltk)
# Define errors in country
Rewrite_Country = unique(sqldf("SELECT * FROM Incorrect_Country WHERE countryid NOT IN (SELECT 'Remote.Name' FROM Country_Recode)"))
B <- 0
# Dataframe where to store the wrong names which the respective correction
error <- data.frame(x=integer(0), y= character(0), z = character(0))
# Loop for each row with typing errors
for (i in Rewrite_Country["countryid"]) {
B <- B + 1
# I create a dialog for preventing several windows to pop up
# as this produced that the returned value from a combobox was assigned to the wrong recode name
gconfirm("New Value not specified. Do u want to change it?", handler=function(h,...){
# I create the window which will include a combobox of correct values
w <- gwindow("Recode Country for:")
gp <- ggroup(container=w)
## A group for the message and buttons
i.gp <- ggroup(horizontal=FALSE, container = gp)
glabel(i, container=i.gp)
## Combobox including the correct names
cb <- gcombobox(Master_Country[["name"]], selected=0, container=i.gp)
addHandlerChanged(cb,handler=function(h,...) {
# I assign the combobox's svalue to a new global variable
aNew <- as.character()
assign("aNew", svalue(cb), envir = as.environment(1))
print(svalue(cb))
})
## A group to organize the buttons
button.group <- ggroup(container = i.gp)
## Push buttons to right
addSpring(button.group)
# Ok Button for storing the resuts: (index, wrong value, correct value)
button <- gbutton("ok", handler = function(h,...) {
error <- rbind(error, c(B,i, aNew))
# In one of the last tries I set the new environment for the table
assign("error", error, envir = as.environment(1))
print(error)
dispose(w)
}, container = button.group)
gbutton("cancel", handler = function(h,...) dispose(w), container=button.group)
})
}
I don't get my expected outcome. I find very hard to retrieve the svalue from the combobox and impossible to store several results from the variable "aNew" when running the loop. Also happens these other two incidents:
1 - when I run the code including the loop: It does not "use to!" enter the widgets (confirm popup)
2 - The loop exists after disposing the first "Recode Country" window, so to say, processing "canadada"
What I really want is that the user can fix the errors in the data.frame Incorrect_Country. Then the error and the solution are stored (data frame: error) for the program to know how to deal with it for future uploads.
How it should work:
1- confirm window (for stopping the loop till the user has corrected the previous error)
2- pop up shows error "canadada"
3- user selects from combobox "canada"
4- Pressing ok will store an integer, the error, and the corrected name in the table error
5- The loop runs again (press confirm and shows "Peruru")
6- Finally I get the error table such as
x, y, z
1, canadada, Canada
2, Perurur, Peru
3, chinanan, China
Any advice would be appreciated. Thanks

Related

Why does one computer throw an error and another not, same code?

I have a report that I "flatten" because it comes out of the reporting interface rather kludgy. Something like this, repeating for every employee in the call center:
df <- data.frame(Date = c(NA, 2017, 2018, 2019, 2020), AH = c(NA, 1,2,3,4), NAH = c(NA,5,6,7,8), TH = c(NA,9,10,11,12))
df[1,3] <- "Supervisor: Malcolm Reynolds"
df[1,1] <- "Employee: Jane Cobb"
I have code to remove the Emp name and Super name from the row they are in, and add them to new columns I created earlier by entering the Emp name in each row for that person. Like this:
l <- grep("Employee:", adh$Date) ##find emp/super rows
m <- l[-1] ##new list without first row
m <- append(m,nrow(adh)) ##add last row to new list
##fill in new columns, column 8 is EmpName, 9 is SuperName
for (i in 1:length(l)) {
adh[as.integer(l[i]):as.integer(m[i]),8] <- adh[as.integer(l[i]),1]
adh[as.integer(l[i]):as.integer(m[i]),9] <- adh[as.integer(l[i]),3]
}
On my machine, everything works fine, it's all base package code. My employee runs this, and on the first line inside the for loop, it throws this error:
Error: Assigned data adh[as.integer(l[i]), 1] must be compatible with existing data.
i Error occurred for column Emp_Name.
x Can't convert to .
He just installed R and RStudio, I have a ton of other packages installed. I have no idea why this is happening for him and not me. Note, I'm not an under the hood user.

Wait for rgbif download to complete before proceeding

I am developing a small application in R Shiny. Part of the application will need to query GBIF to download species occurrence data. This is possible using rgbif. The function rgbif::occ_download() will download the data and rgbif::occ_download_meta() will check whether GBIF has fulfilled your request. For example:
geometry <- "POLYGON((30.1 10.1,40 40,20 40,10 20,30.1 10.1))"
res <- occ_download(paste0("geometry within ", geometry), type = "within", format = "SPECIES_LIST")
occ_download_meta(res)
<<gbif download metadata>>
Status: RUNNING
Format: SPECIES_LIST
Download key: 0004089-190415153152247
Created: 2019-04-25T09:18:20.952+0000
Modified: 2019-04-25T09:18:21.045+0000
Download link: http://api.gbif.org/v1/occurrence/download/request/0004089-190415153152247.zip
Total records: 0
So far, so good. However, the following function rgbif::occ_download_get() can't download the data for downstream analysis until occ_download_meta(res) has completed (when Status = SUCCEEDED).
How can I make the session wait until the download from GBIF has been completed? I cannot hard code a wait time into the script as different sized extents will take GBIF longer or shorter amounts of time to process. Also, the number of other active users querying the service could also alter wait times. I therefore need some sort of flag where Status == Succeeded before proceeding.
I have copied some skeleton code with comments below.
library(rgbif)
geometry <- "POLYGON((30.1 10.1,40 40,20 40,10 20,30.1 10.1))" # Define boundary
res <- occ_download(paste0("geometry within ", geometry), type = "within", format = "SPECIES_LIST")
# WAIT HERE UNTIL Status == SUCCEEDED
occ_download_meta(res)
x <- occ_download_get(res, overwrite = TRUE) # Download data
data<-occ_download_import(x) # Import into R
rgbif maintainer here. You could do something like we have within the occ_download_queue() function:
res <- occ_download(paste0("geometry within ", geometry), type = "within", format = "SPECIES_LIST")
still_running <- TRUE
status_ping <- 3
while (still_running) {
meta <- occ_download_meta(res)
status <- meta$status
still_running <- status %in% c("succeeded", "killed")
Sys.sleep(status_ping) # sleep between pings
}
you probably want to check for succeeded and killed, and do something different if killed

Null datatable Shiny

I have a function to create a datatable in Shiny based on department numbers and how many times an event happened in that department during a time period. My issue is that if the date range is short enough, no departments will have had the event occur. In those instances, I get the error Error in rowSums(x) : 'x' must be an array of at least two dimensions which initially just appeared within the Shiny app and you could just ignore it. Now, the app crashes and you have to go back to R to look at it.
I understand why the error is occurring but I don't know if there's a way around it for my situation because I don't know if the events occur until the data is subset. The function is called a number of times in my code, so I don't want to write an if statement outside the function each time it is used.
I tried adding if(length(b$Department <= 1)){tab<-renderDataTable({datatable(NULL)})} right after defining b and then had an else statement around the remainder of the function, but I get the message Warning: Error in [.data.frame: undefined columns selected
I have also tried other if statements such as creating a dataframe full of NAs but this returned the original error message.
dept.table<-function(df, date1, date2){
a<-df[which(DATE >= as.Date(date1) & DATE <= as.Date(date2)),]
b<-as.data.frame(table((a[,c("Event", "Department")])))
d<-reshape(b, direction="wide", idvar="Event", timevar="Department")
names(d)<-sub('^Freq\\.', '', names(d))
d$Total<-round(rowSums(d[,-1]), 0)
levels(d$Event)<-c(levels(d$Event), "Total")
d<-rbind(d, c("Total", colSums(d[,-1])))tab<-DT::renderDataTable({
datatable(d, extensions="FixedColumns", options=list(dom='t', scrollX=T, fixedColumns=list(leftColumns=1, rightColumns=1)), rownames=FALSE)
})
}
Sample data
df<-data.frame(Department=rep(100:109, 3), Event=rep(c("A", "B", "C"),10),
Date=sample(seq(as.Date('2018/01/01'), as.Date('2018/09/01'), by="day"), 30))
It's not pretty, but I figured out a solution. There were two different issues. One when there was no data and another when there was only 2 departments, so I needed two if statements.
dept.table<-function(df, date1, date2) {a<-df[DATE >= as.Date(date1) & DATE <= as.Date(date2)),]
b<-as.data.frame(table((a[,c("Event", "Department")])))
if(nrow(b)==0){tab<-DT::renderDataTable(NULL)}
else{d<-reshape(b, direction="wide", idvar="CODE", timevar="Department")
names(d)<-sub('^Freq\\.', '', names(d))
if(ncol(d)>3){d$Total<-round(rowSums(d[,-1]), 0)
levels(d$Event)<-c(levels(d$Event), "Total")
d<-rbind(d, c("Total", colSums(d[,-1])))
tab<-DT::renderDataTable({
datatable(d, extensions="FixedColumns", options=list(dom='t', scrollX=T, fixedColumns=list(leftColumns=1, rightColumns=1)), rownames=FALSE)})}
else{tab<-DT::renderDataTable(datatable(d))}
}
tab
}

Nestled Loop not Working to gather data from NOAA

I'm using the R package rnoaa(along with it required other packages) to gather historical weather data. I wrote this nestled loop to gather all the data sets but I keep getting errors when I run it. It seems to run for a second fine
The loop:
require('triebeard')
require('bindr')
require('colorspace')
require('mime')
require('curl')
require('openssl')
require('R6')
require('urltools')
require('httpcode')
require('stringr')
require('assertthat')
require('bindrcpp')
require('glue')
require('magrittr')
require('pkgconfig')
require('rlang')
require('Rcpp')
require('BH')
require('plogr')
require('purrr')
require('stringi')
require('tidyselect')
require('digest')
require('gtable')
require('plyr')
require('reshape2')
require('lazyeval')
require('RColorBrewer')
require('dichromat')
require('munsell')
require('labeling')
require('viridisLite')
require('data.table')
require('rjson')
require('httr')
require('crul')
require('lubridate')
require('dplyr')
require('tidyr')
require('ggplot2')
require('scales')
require('XML')
require('xml2')
require('jsonlite')
require('rappdirs')
require('gridExtra')
require('tibble')
require('isdparser')
require('geonames')
require('hoardr')
require('rnoaa')
install.package('ncdf4')
install.packages("devtools")
library(devtools)
install_github("rnoaa", "ropensci")
library(rnoaa)
list <- buoys(dataset='wlevel')
lid <- data.frame(list$id)
foo <- for(range in 1990:2017){
for(bid in lid){
bid_range <- buoy(dataset = 'wlevel', buoyid = bid, year = range)
bid.year.data <- data.frame(bid.year$data)
write.csv(bid.year.data, file='cwind/bid_range.csv')
}
}
The response:
Using c1990.nc
Using
Error: length(url) == 1 is not TRUE
It saves the first data-set but it does not apply the for in the file name it just names it bid_range.csv.
This error message shows that there are no any data of a given station id in 1990. Because you were using for loop, once it gots an error, it stops.
Here I introduce the use of tidyverse to download the NOAA buoy data. A lot of the following functions are from the purrr package, which is part of the tidyverse.
# Load packages
library(tidyverse)
library(rnoaa)
Step 1: Create a "Grid" containing all combination of id and year
The expand function from tidyr can create the combination of different values.
data_list <- buoys(dataset = 'wlevel')
data_list2 <- data_list %>%
select(id) %>%
expand(id, year = 1990:2017)
Step 2: Create a "safe" version that does not break when there is no data.
Also make this function suitable for the map2 function
Because we will use map2 to loop through all the combination of id and year using the map2 function by its .x and .y argument. We modified the sequence of argument to create buoy_modify. We also use the safely function to create a safe version of buoy_modify. Now when it meets error, it will store the error message and moves to the next one rather than breaks.
# Modify the buoy function
buoy_modify <- function(buoyid, year, dataset, ...){
buoy(dataset, buoyid = buoyid, year = year, ...)
}
# Creare a safe version of buoy_modify
buoy_safe <- safely(buoy_modify)
Step 3: Apply the buoy_safe function
wlevel_data <- map2(data_list2$id, data_list2$year, buoy_safe, dataset = "wlevel")
# Assign name for the element in the list based on id and year
names(wlevel_data) <- paste(data_list2$id, data_list2$year, sep = "_")
After this step, all the data were downloaded in wlevel_data. Each element in wlevel_data has two parts. $result shows the data if the download is successful, otherwise, it shows NULL. $error shows NULL if the download is successful, otherwise, it shows the error message.
Step 4: Access the data
transpose can turn a list "inside out". So now wlevel_data2 has two elements: result and error. We can store these two and access the data.
# Turn the list "inside out"
wlevel_data2 <- transpose(wlevel_data)
# Get the error message
wlevel_error <- wlevel_data2$error
# Get he result
wlevel_result <- wlevel_data2$result
# Remove NULL element in wlevel_result
wlevel_result2 <- wlevel_result[!map_lgl(wlevel_result, is.null)]

R & xml2: Locate elements by specific text value, store all children values in data.frame

I work with regularly refreshed XML reports and I would like to automate the munging process using R & xml2.
Here's a link to an entire example file.
Here's a sample of the XML:
<?xml version="1.0" ?>
<riDetailEnrolleeReport xmlns="http://vo.edge.fm.cms.hhs.gov">
<includedFileHeader>
<outboundFileIdentifier>f2e55625-e70e-4f9d-8278-fc5de7c04d47</outboundFileIdentifier>
<cmsBatchIdentifier>RIP-2015-00096</cmsBatchIdentifier>
<cmsJobIdentifier>16220</cmsJobIdentifier>
<snapShotFileName>25032.BACKUP.D03152016T032051.dat</snapShotFileName>
<snapShotFileHash>20d887c9a71fa920dbb91edc3d171eb64a784dd6</snapShotFileHash>
<outboundFileGenerationDateTime>2016-03-15T15:20:54</outboundFileGenerationDateTime>
<interfaceControlReleaseNumber>04.03.01</interfaceControlReleaseNumber>
<edgeServerVersion>EDGEServer_14.09_01_b0186</edgeServerVersion>
<edgeServerProcessIdentifier>8</edgeServerProcessIdentifier>
<outboundFileTypeCode>RIDE</outboundFileTypeCode>
<edgeServerIdentifier>2800273</edgeServerIdentifier>
<issuerIdentifier>25032</issuerIdentifier>
</includedFileHeader>
<calendarYear>2015</calendarYear>
<executionType>P</executionType>
<includedInsuredMemberIdentifier>
<insuredMemberIdentifier>ARS001</insuredMemberIdentifier>
<memberMonths>12.13</memberMonths>
<totalAllowedClaims>1000.00</totalAllowedClaims>
<totalPaidClaims>100.00</totalPaidClaims>
<moopAdjustedPaidClaims>100.00</moopAdjustedPaidClaims>
<cSRMOOPAdjustment>0.00</cSRMOOPAdjustment>
<estimatedRIPayment>0.00</estimatedRIPayment>
<coinsurancePercentPayments>0.00</coinsurancePercentPayments>
<includedPlanIdentifier>
<planIdentifier>25032VA013000101</planIdentifier>
<includedClaimIdentifier>
<claimIdentifier>CADULT4SM00101</claimIdentifier>
<claimPaidAmount>100.00</claimPaidAmount>
<crossYearClaimIndicator>N</crossYearClaimIndicator>
</includedClaimIdentifier>
</includedPlanIdentifier>
</includedInsuredMemberIdentifier>
<includedInsuredMemberIdentifier>
<insuredMemberIdentifier>ARS002</insuredMemberIdentifier>
<memberMonths>9.17</memberMonths>
<totalAllowedClaims>0.00</totalAllowedClaims>
<totalPaidClaims>0.00</totalPaidClaims>
<moopAdjustedPaidClaims>0.00</moopAdjustedPaidClaims>
<cSRMOOPAdjustment>0.00</cSRMOOPAdjustment>
<estimatedRIPayment>0.00</estimatedRIPayment>
<coinsurancePercentPayments>0.00</coinsurancePercentPayments>
<includedPlanIdentifier>
<planIdentifier>25032VA013000101</planIdentifier>
<includedClaimIdentifier>
<claimIdentifier></claimIdentifier>
<claimPaidAmount>0</claimPaidAmount>
<crossYearClaimIndicator>N</crossYearClaimIndicator>
</includedClaimIdentifier>
</includedPlanIdentifier>
</includedInsuredMemberIdentifier>
</riDetailEnrolleeReport>
I would like to:
Read in the XML into R
Locate a specific insuredMemberIdentifier
Extract the planIdentifier and all claimIdentifier data associated with the member ID in (2)
Store all text and values for insuredMemberIdentifier, planIdentifier, claimIdentifier, and claimPaidAmount in a data.frame with a row for each unique claim ID (member ID to claim ID is a 1 to many)
So far, I have accomplished 1 and I'm in the ballpark on 2:
## Step 1 ##
ride <- read_xml("/Users/temp/Desktop/RIDetailEnrolleeReport.xml")
## Step 2 -- assume the insuredMemberIdentifier of interest is 'ARS001' ##
memID <- xml_find_all(ride, "//d1:insuredMemberIdentifier[text()='ARS001']", xml_ns(ride))
[I know that I can then use xml_text() to extract the text of the element.]
After the code in Step 2 above, I've tried using xml_parent() to locate the parent node of the insuredMemberIdentifier, saving that as a variable, and then repeating Step 2 for claim info on that saved variable node.
node <- xml_parent(memID)
xml_find_all(node, "//d1:claimIdentifier", xml_ns(ride))
But this just results in pulling all claimIdentifiers in the global file.
Any help/information on how to get to step 4, above, would be greatly appreciated. Thank you in advance.
Apologies for the late response, but for posterity, import data as above using xml2, then parse the xml file by ID, as hinted by har07.
# output object to collect all claims
res <- data.frame(
insuredMemberIdentifier = rep(NA, 1),
planIdentifier = NA,
claimIdentifier = NA,
claimPaidAmount = NA)
# vector of ids of interest
ids <- c('ARS001')
# indexing counter
starti <- 1
# loop through all ids
for (ii in seq_along(ids)) {
# find ii-th id
## Step 2 -- assume the insuredMemberIdentifier of interest is 'ARS001' ##
memID <- xml_find_all(x = ride,
xpath = paste0("//d1:insuredMemberIdentifier[text()='", ids[ii], "']"))
# find node for
node <- xml_parent(memID)
# as har07's comment find claim id within this node
cid <- xml_find_all(node, ".//d1:claimIdentifier", xml_ns(ride))
pid <- xml_find_all(node, ".//d1:planIdentifier", xml_ns(ride))
cpa <- xml_find_all(node, ".//d1:claimPaidAmount", xml_ns(ride))
# add invalid data handling if necessary
if (length(cid) != length(cpa)) {
warning(paste("cid and cpa do not match for", ids[ii]))
next
}
# collect outputs
res[seq_along(cid) + starti - 1, ] <- list(
ids[ii],
xml_text(pid),
xml_text(cid),
xml_text(cpa))
# adjust counter to add next id into correct row
starti <- starti + length(cid)
}
res
# insuredMemberIdentifier planIdentifier claimIdentifier claimPaidAmount
# 1 ARS001 25032VA013000101 CADULT4SM00101 100.00

Resources