dataframe does not work inside of a function - r

When trying to generate a data.frame inside of a function, found that when calling the function, despite everything apparently worked well outside of the function, the data.frame was not generated.
Anybody could tell me how is this possible?
Species=c("a","b","c")
data=data.frame(Species)
df=data.frame(matrix(nrow=length(levels(data$Species)),ncol=43))
rm(df)
f<-function(data)
{
df=data.frame(matrix(nrow=length(levels(data$Species)),ncol=43))
}
f(data)
In my Rstudio no data.frame is generated when calling the function f!
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252
LC_CTYPE=English_Australia.1252
LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.7.1 maptools_0.8-14 lattice_0.20-0 foreign_0.8-48
geosphere_1.2-26
[6] rgdal_0.7-8 outliers_0.14 XML_3.9-4.1 PBSmapping_2.62.34
dismo_0.7-14
[11] raster_1.9-58 sp_0.9-93
loaded via a namespace (and not attached):
[1] grid_2.14.1 tools_2.14.1

This should not be surprising. You haven't specified anywhere in the function what the function should return. This can be done the same way you display an object that you have created at the command prompt in R: type the name of the object. Alternatively, you can use return().
In other words, modify your function as follows (I've changed "df" to "mydf" and "data" to "mydata" to avoid any potential conflicts with base R functions):
f <- function(mydata)
{
mydf = data.frame(matrix(nrow=length(levels(data$Species)), ncol=43))
mydf
## Or, more explicitly
## return(mydf)
}
You can now run it using f(data). However, note that this will just display the output, not assign it to an object. If you wanted it assigned to an object ("mydf", for example) you need to use mydf <- f(data).
There is another option, use <<- in your function.
f <- function(mydata)
{
mydf <<- data.frame(matrix(nrow=length(levels(data$Species)), ncol=43))
## uncomment the next line if you want to *display* the output too
## mydf
}
> rm(mydf)
> ls(pattern = "mydf")
character(0)
> f(data) ## No ouput is displayed when you run the function
> ls(pattern = "mydf")
[1] "mydf"

Related

Mysterious misspelt R error in assign 'argumemt is not a character' after updating shiny

I am running a shiny app that basically generates SQL code for word-searches within a column 'WordText'.
While the code is working fine for other users running R 3.1.1, it has started throwing errors after I updated shiny. Please note that I was running R 3.2.3 prior to updating shiny and the shiny app worked fine.
ERROR MESSAGE:
Warning: Error in FUN: argumemt is not a character vector
Stack trace (innermost first):
74: lapply
73: paste8
72: HTML
71: assign
70: renderUI [U:\00 R\Shiny - Coursera\08 Tech05/server.R#174]
69: func
68: output$key1A_main
1: runApp
Also this is the first time I'm getting a stack trace! Not sure what triggered these.
The code snippet in question:
########### GENERATING MULTIPLE OUTPUTS ################
#### MULTIPLE KEYWORD DEPENDENCIES !!!
lapply(1:5, function(x){
## Defining as many functions as the number of times displayed - Word Search Generator, (Complaints and Monthly Trends TAB ) X2 - SS + TD
output[[sprintf("key%dA_main",x)]] <- output[[sprintf("key%dA_main_SS_1",x)]] <- output[[sprintf("key%dA_main_SS_2",x)]] <-
output[[sprintf("key%dA_main_TD_1",x)]] <- output[[sprintf("key%dA_main_TD_2",x)]] <- renderUI({
## Main Keyword Case Summary LIKE Statement
assign(sprintf("key%dA_start",x),
if(input[[sprintf("key%dA",x)]]=="") {""}
else {HTML(paste0("(",br(),"WordText like '%",input[[sprintf("key%dA",x)]],"%'",br(),em(sprintf("/* Main Keyword %d */",x)),br()))}
)
## 'AND' and Starting Parenthesis if any dependent keywords
assign(sprintf("key%dA_start_OR",x),
if(nchar(input[[sprintf("key%d_temp_1",x)]],allowNA = TRUE)==0) {" "} else {paste0("AND",br()," (",br()) }
)
## 1st Dependent Keyword Case Summary LIKE Statment
assign(sprintf("key%dA_first",x),
if(nchar(input[[sprintf("key%d_temp_1",x)]],allowNA = TRUE)==0) {" "} else {paste0("WordText like '%", input[[sprintf("key%d_temp_1",x)]], "%'", br())}
)
## All other Dependent Keywords Case Summary LIKE Statments
assign(sprintf("key%dA_other",x),HTML(
lapply(2:10, function(i) {
xy <- input[[sprintf("key%d_temp_%d",x, i)]]
if (nchar(xy,allowNA = TRUE)>0) paste0("OR WordText like '%", xy, "%'", br())
else " "
})#END lapply
))
## Ending Parenthesis if any dependent keywords
assign(sprintf("key%dA_end_OR",x),
if(nchar(input[[sprintf("key%d_temp_1",x)]],allowNA = TRUE)==0) {" "} else {paste0(")",br(),em(sprintf("/* Dependent Keyword(s) for Keyword %d */",x))) }
)
## Ending Parenthesis for entire criteria (Main Keyword + dependent keywords)
assign(sprintf("key%dA_end",x),
if(input[[sprintf("key%dA",x)]]=="") {" "} else {HTML(paste0(br(),")")) }
)
## Collating outputs
HTML(paste0( get(sprintf("key%dA_start",x))
, get(sprintf("key%dA_start_OR",x))
, get(sprintf("key%dA_first",x))
, get(sprintf("key%dA_other",x))
, get(sprintf("key%dA_end_OR",x))
, get(sprintf("key%dA_end",x))) )
})#END renderUI
})#END LAPPLY
The session info is as below:
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_0.3.0 ggplot2_2.0.0 RODBC_1.3-12 shinythemes_1.0.1 DT_0.1 shiny_0.13.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 digest_0.6.9 mime_0.4 plyr_1.8.3 grid_3.2.3 R6_2.1.1 jsonlite_0.9.19 xtable_1.8-0 gtable_0.1.2 magrittr_1.5 tools_3.2.3
[12] htmlwidgets_0.5 munsell_0.4.2 httpuv_1.3.3 colorspace_1.2-6 htmltools_0.3

base::identical() returns TRUE, but the data frames is different

I've got a strange problem in dplyr (probably a bug?), but run into an even more strange problem when debugging.
The dplyr-part of code does already have an issue now, but please help me figure out why identical() doesn't detect differences?
The code (copied from the issue i created on dplyr's github) show the issue with swedish letters (å,ä,ö,Å,Ä,Ö), and as a result of that an example when base::identical(x,y) returns TRUE even when dataframe x and y are different.
# Script to show how dplyr::select() breakes dplyr::group_by() with swedish names
library(dplyr)
# Create data frame, column 1's name contains ä (specific swedish letters are åäöÅÄÖ)
my_df <- data.frame(användarnamn = letters[1:4], my_numvalues = 1:4,
my_text = c("stop","break","my","code"),
extra_col = LETTERS[1:4])
# use dplyr::select() to subset columns, then dplyr::group_by
# group_by fails on swedish column names if the df is subsetted with filter.
# If not subsetted or subsetted with [,1:3], everything works
my_df %>% select(1:3) %>% group_by(my_numvalues) # This works
my_df %>% select(1:3) %>% group_by(användarnamn) # This fails
my_df[,1:3] %>% group_by(användarnamn) # This works
my_df %>% group_by(användarnamn) # This works
# Same thing, but step by step
my_df_selected <- select(my_df, 1:3)
group_by(my_df_selected, användarnamn) # This fails
group_by(my_df_selected, my_numvalues) # This works
# and by %>%
my_df_selected %>% group_by(användarnamn) # This fails
my_df_selected %>% group_by(my_numvalues) # This works
# The names of the orignal df and the filtered is identical
identical(names(my_df)[1:3],names(my_df_selected))
# The function base::make.names() doesn't change the name, it's already valid
identical(names(my_df_selected), make.names(names(my_df_selected)))
# copy to a new df to rename
my_df_selected_renamed <- my_df_selected
# rename the df with it's own old names passing make.names()
names(my_df_selected_renamed) <- make.names(names(my_df_selected_renamed))
# The orignal subsetted and the renamed df is identical
# according to base::identical()
identical(my_df_selected, my_df_selected_renamed)
# Here's the strange thing, it works now! Why??? I REALLY don't understand!
my_df_selected_renamed %>% group_by(användarnamn) # This works now!
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252
[3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
[5] LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3
loaded via a namespace (and not attached):
[1] lazyeval_0.1.10 magrittr_1.5 R6_2.1.1 assertthat_0.1 parallel_3.2.2
[6] DBI_0.3.1 tools_3.2.2 Rcpp_0.12.1

release code error but not in debug or on source

My code runs fine but fails in a package
I boiled it down to
wtf<-function(r)
{
require(raster)
stopifnot(class(r) == "RasterLayer")
return(as.matrix(r))
}
When sourced, everything works fine. When the function is part of a package, it fails. It nicely runs in debug mode though, step by step.
library(mypackage)
r <- raster(ncol=6, nrow=6)
r[] <- runif(ncell(r),0,1)
extent(r) <- matrix(c(0, 0, 6, 6), nrow=2)
wtf(r)
# Error in as.vector(data) :
# no method for coercing this S4 class to a vector
# Traceback
# 5 as.vector(data)
# 4 array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),
# NULL) else NULL)
# 3 as.matrix.default(r)
# 2 as.matrix(r) at terrain.R#7
# 1 wtf(s)
I'm a bit puzzeled as to why this happens and to how proceed.
The build went fine, the check went clean, so what is going on?
What would be the next question to ask and explore in order to solve the problem?
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] raster_2.3-0 spdep_0.5-77 Matrix_1.1-4 minerva_1.4.1 gdata_2.13.3 rgdal_0.9-1 sp_1.0-15
loaded via a namespace (and not attached):
[1] boot_1.3-11 coda_0.16-1 deldir_0.1-5 grid_3.1.1 gtools_3.4.1 lattice_0.20-29 LearnBayes_2.15 MASS_7.3-33 nlme_3.1-118 parallel_3.1.1 splines_3.1.1 tools_3.1.1
The traceback shows that the default as.matrix is used, rather than the raster variant. I believe this problem goes away if you add this line to your Namespace file:
import(raster)
Or when you are explicit about which as.matrix you want:
wtf <- function(r) {
stopifnot(inherits(r, "RasterLayer"))
raster::as.matrix(r)
}
Rather than 'manually' testing for class membership, you might consider a more formal (S4) approach:
if (!isGeneric("wtf")) {
setGeneric("wtf", function(x, ...)
standardGeneric("wtf"))
}
setMethod("wtf", signature(x='RasterLayer'),
function(x, ...) {
raster::as.matrix(x)
}
)

Splitting a dataframe by group and printing group-specific rows to individual HTML files using pander and rapport

Say I have a tall dataframe with many rows per group, like so:
df <- data.frame(group = factor(rep(c("a","b","c"), each = 5)),
v1 = sample(1:100, 15, replace = TRUE),
v2 = sample(1:100, 15, replace = TRUE),
v3 = sample(1:100, 15, replace = TRUE))
What I want to do is split df into length(levels(df$group)) separate dataframes, e.g.,
df_a <- df[df$group=="a",]; df_b <- df[df$group == "b",] ; ...
And then print each dataframe in a separate HTML/PDF/DOCX file (probably using Rmarkdown and knitr).
I want to do this because I have a large dataframe and want to create a personalized report for each group a, b, c, etc. Thanks.
Update (11/18/14)
Following #daroczig 's advice in this thread and another thread, I attempted to make my own template that would simply print a nicely formatted table of all columns and rows per group to substitute into the "correlations" template call in the original sapply() function. I want to make my own template rather than just printing the nice table (e.g., the answer #Thomas graciously provided) because I'd like to build additional customization into the template once the simple printing works. Anyway, I've certainly butchered it:
<!--head
meta:
title: Sample Report
author: Nicapyke
description: This is a demo
packages: ~
inputs:
- name: eachgroup
class: character
standalone: TRUE
required: TRUE
head-->
### Records received up to present for Group <%= eachgroup %>
<%=
pandoc.table(df[df$group == eachgroup, ])
%>
Then, after saving that as groupreport.rapport in my working directory, I wrote the following R code, modeled after #daroczig's response:
allgroups <- unique(df$group)
library(rapport)
for (eachstate in allstates) {
rapport.docx("FILEPATHHERE", eachgroup = eachgroup)
}
I received the error:
Error in openFileInOS(f.out) : File not found!
I'm not sure what happened. I see from the pander documentation that this means it's looking for a system file, but that doesn't mean much to me. Anyway, this error doesn't get at the root of the problem, which is 1) what should go in the input section of the custom template YAML header, and 2) which R code should go in the rapport template vs. in the R script.
I realize I may be making a number of errors that reveal my lack of experience with rapport and pander. Thanks for your patience!
N.B.:
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.8 dplyr_0.3.0.2 rapport_0.51 yaml_2.1.13 pander_0.5.1
plyr_1.8.1 lattice_0.20-29
loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1 digest_0.6.4 evaluate_0.5.5 formatR_1.0 grid_3.1.2
[7] lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.2 Rcpp_0.11.3 reshape_0.8.5 stringr_0.6.2
[13] tools_3.1.2
A slightly off-topic, but still R/markdown one-liner for separate reports with report templates:
> library(rapport)
> sapply(levels(df$group), function(g) rapport.html('correlations', data = df[df$group == g, ], vars = c('v1', 'v2', 'v3')))
Exported to */tmp/RtmpYyRLjf/rapport-correlations-1-0.[md|html]* under 0.683 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-2-0.[md|html]* under 0.888 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-3-0.[md|html]* under 1.063 seconds.
The rapport package can run (predefined or custom) report templates on any (sub)dataset in markdown, then export it to HTML/docx/PDF/other formats. For a quick demo, I've uploaded the resulting documents:
rapport-correlations-1-0.html
rapport-correlations-2-0.html
rapport-correlations-3-0.html
You can do this with by (or split) and xtable (from the xtable package). Here I create xtable objects of each subset, and then loop over them to print them to file:
library('xtable')
s <- by(df, df$group, xtable)
for(i in seq_along(s)) print(s[[i]], file = paste0('df',names(s)[i],'.tex'))
If you use the stargazer package, you can get a nice summary of the dataframe instead of the dataframe itself in just one line:
library('stargazer')
by(df, df$group, stargazer, out = paste0('df',unique(df$group),'.tex'))
You should be able to easily include each of these files in, e.g., a PDF report. You could also use HTML markup using either xtable or stargazer.

Find whether a particular date is an Option Expiration Friday - problem with timeDate package

I am trying to write a simple function that (should) return true if the parameter date(s) is an Op-Ex Friday.
require(timeDate)
require(quantmod)
getSymbols("^GSPC", adjust=TRUE, from="1960-01-01")
assign("SPX", GSPC, envir=.GlobalEnv)
names(SPX) <- c("SPX.Open", "SPX.High", "SPX.Low", "SPX.Close",
"SPX.Volume", "SPX.Adjusted")
dates <- last(index(SPX), n=10)
from <- as.numeric(format(as.Date(min(dates)), "%Y"))
to <- as.numeric(format(as.Date(max(dates)), "%Y"))
isOpExFriday <- ifelse(
isBizday(
timeDate(as.Date(dates)),
holidayNYSE(from:to)) & (as.Date(dates) == as.Date(
format(timeNthNdayInMonth(timeFirstDayInMonth(dates), nday=5, nth=3)))
), TRUE, FALSE)
Now, the result should be [1] "2011-09-16". But instead I get [1] "2011-09-15":
dates[isOpExFriday]
[1] "2011-09-15"
Am I doing something wrong, expecting something that timeDate package is not doing by design or is there a bug in timeDate?
I am guessing it's a timezone problem. What happens if you use this:
format(dates[isOpExFriday], tz="UTC")
On second look, you probably need to put the 'tz=' argument inside the format call inside the as.Date(format(...)) call. The format function "freezes" that dates value as text.
EDIT: On testing however I think you are right about it being a bug. (And I sent a bug report to the maintainer with this response.) Even after trying to insert various timezone specs and setting myFinCenter in RmetricsOptions, I still get the which stems from this error deep inside your choice of functions:
timeNthNdayInMonth(as.Date("2011-09-01"), nday=5, nth=3)
America/New_York
[1] [2011-09-15]
I suspect it is because of this code since as I understand it Julian dates are not adjusted for timezones or daylight savings times:
ct = 24 * 3600 * (as.integer(julian.POSIXt(lt)) +
(nth - 1) * 7 + (nday - lt1$wday)%%7)
class(ct) = "POSIXct"
The ct value in seconds is then coverted to POSIXct from second since "origin" simply by coercion of class. If I change the code to:
ct=as.POSIXct(ct, origin="1970-01-01") # correct results come back
My quantmod and timeDate versions are both current per CRAN. Running Mac with R 2.13.1 in 64 bit mode with a US locale. I have not yet tried to reproduce with a minimal session so there could still be some collision or hijacking with other packages:
> sessionInfo()
R version 2.13.1 RC (2011-07-03 r56263)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid splines stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] quantmod_0.3-17 TTR_0.20-3 xts_0.8-2
[4] Defaults_1.1-1 timeDate_2130.93 zoo_1.7-4
[7] gplots_2.10.1 KernSmooth_2.23-6 caTools_1.12
[10] bitops_1.0-4.1 gdata_2.8.1 gtools_2.6.2
[13] wordnet_0.1-8 ggplot2_0.8.9 proto_0.3-9.2
[16] reshape_0.8.4 plyr_1.6 rattle_2.6.10
[19] RGtk2_2.20.17 rms_3.3-1 Hmisc_3.8-3
[22] survival_2.36-9 sos_1.3-0 brew_1.0-6
[25] lattice_0.19-30

Resources