R: trouble assigning values to a dynamic variable in a dataframe - r

I am trying to assign values to a dataframe variable defined by the user. The user specifies the name of the variable, let's call this x, in the dataframe df. For simplicity I want to assign a value of 3 to everything in the column the user specifies. The simplified code is:
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
But I get an error:
Error in file(filename, "r") : cannot open the connection
In addition: Warning message:
In file(filename, "r") :
cannot open file 'df$x': No such file or directory
I've tried all kinds of remedies to no avail. If I simply try to print the values of the column.
eval(parse(text=variableName))
I get no errors and it prints out ok. It's only when I try to give that column a value that I get the error. Any help would be appreciated.

I believe the issue is that there is no way to use the result of eval() on the LHS of an assignment.
df = data.frame(foo = 1:5,
bar = -3)
x = "bar"
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
#> Warning in file(filename, "r"): cannot open file 'df$bar': No such file or
#> directory
#> Error in file(filename, "r"): cannot open the connection
## This error is a bit misleading. Breaking it apart I get a different error.
eval(expression(df$bar)) <- 3
#> Error in eval(expression(df$bar)) <- 3: could not find function "eval<-"
## And it works if you put it all in the string to be parsed.
ex1 <- paste0("df$", x, "<-3")
eval(parse(text=ex1))
df
#> foo bar
#> 1 1 3
#> 2 2 3
#> 3 3 3
#> 4 4 3
#> 5 5 3
## But I doubt that's the best way to do it!

Related

Error Message when Opening a CSV File in R

I attempted opening a csv file in R using R studio but got this warning message:
In readLines("persons.csv") : incomplete final line found on 'persons.csv'
Please what is wrong with the file, and how can I fix it?
You can likely ignore this as it probably still worked. Here is an example without a final newline which gives that warning and another one which has the final newline which does not give the warning. Both worked.
cat("a,b\n1,2", file = "test1.csv")
read.csv("test1.csv")
## a b
## 1 1 2
## Warning message:
## In read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'test1.csv'
cat("a,b\n1,2\n", file = "test2.csv")
read.csv("test2.csv")
## a b
## 1 1 2
To address this try one of these:
Just ignore it as it probably worked.
Bring the file into a text editor and write it out again. That often eliminates the warning.
Use readr::read_csv. The indicated argument eliminates many messages that are otherwise output by that command.
library(readr)
read_csv("test1.csv", show_col_types = FALSE)
## # A tibble: 1 x 2
## a b
## <dbl> <dbl>
## 1 1 2
Use data.table::fread. It won't give that message.
library(data.table)
fread("test1.csv", data.table = FALSE)
## a b
## 1 1 2
From the Windows cmd line use this (note dot)
echo. >> test1.csv
or under bash (no dot)
echo >> test.csv

Issue reading data with ipumsr using PUMAs

I'm trying to read some data from ipums USA and it's worked before, but I'm suddenly getting the error "Error in levels<-(*tmp*, value = as.character(levels)) : factor level [2] is duplicated" Earlier, when just trying to display the PUMA data, I also got "Error: 'labels' must be unique" on a different computer. I'll put the code I was using below, but I've been using this data with PUMA and it hasn't happened before. Can anyone tell me what this means or what changed?
ddi <- read_ipums_ddi("usa_00021.xml")
data <- read_ipums_micro(ddi)
data[13] #13 is the IND column and this produces the error
data$IND #this does not produce an error
this gets the "Error in levels<-(*tmp*, value = as.character(levels)) : factor level [2] is duplicated" error on my current computer
ddi <- read_ipums_ddi("usa_00021.xml")
data <- read_ipums_micro(ddi)
data[8] #this is the PUMA column
this gets the 'Error: 'labels' must be unique' error on the other computer. This computer has the same issue listed above, but also gives me this. This is also the computer I had been using with no previous issue
(Sorry if anything is formated wrong--first question)
This is related to an error in the print formatting introduced by recent versions of ipumsr and haven.
It has been fixed as a pull request into haven, so if you're able to install C++ packages from github, you can run the following command:
# install.packages("devtools")
devtools::install_github("tidyverse/haven", pull = 425)
If that's not an option, you can disable the printing behavior by doing the following:
options(haven.show_pillar_labels = FALSE)
options(ipumsr.show_pillar_labels = FALSE)
Edit:
Just to confirm - this is how the options work on my computer - I'm curious why this wouldn't work on yours. If you have time, can you see if this code works for you?
library(ipumsr)
x <- tibble::tibble(x = haven::labelled(c(1, 2, 3), c(x = 1, x = 2)))
x
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated
options(haven.show_pillar_labels = FALSE)
options(ipumsr.show_pillar_labels = FALSE)
x
#> # A tibble: 3 x 1
#> x
#> <dbl+lbl>
#> 1 1
#> 2 2
#> 3 3
Created on 2019-04-10 by the reprex package (v0.2.1)

R, getting an invalid argument to unary operator when using order function

I'm essentially doing the exact same thing 3 times, and when adding a new variable I get this error
Error in -emps$EV : invalid argument to unary operator
The code chunk causing this is
evps<-aggregate(EV~player,s1k,mean)
sort2<-evps[order(-evps$EV),]
head(sort2,10)
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EV),]
head(sort3,10)
Works like a charm for the first list, but the identical code thereafter causes the error.
This specific line is causing the error
sort3<-emps[order(-emps$EV),]
How can I fix/workaround this?
Full Code
url <- getURL("https://raw.githubusercontent.com/M-ttM/Basketball/master/class.csv")
shots <- read.csv(text = url)
shots$make<-shots$points>0
shots2<-shots[which(!(shots$player=="Luc Richard Mbah a Moute")),]
fit1<-glm(make~factor(type)+factor(period), data=shots2,family="binomial")
summary(fit1)
shots2$makeodds<-fitted(fit1)
shots2$EV<-shots2$makeodds*ifelse(shots2$type=="3pt",3,2)
shots3<-shots2[which(shots2$y>7),]
locmakes<-data.frame(table(shots3[, c("x", "y")]))
s1k <- shots2[with(shots2, player %in% names(which(table(player)>=1000))), ]
pps<-aggregate(points~player,s1k,mean)
sort<-pps[order(-PPS$points),]
head(sort,10)
evps<-aggregate(EV~player,s1k,mean)
sort2<-evps[order(-evps$EV),]
head(sort2,10)
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EV),]
head(sort3,10)
The error message seems to occur when trying to order columns including chr type data. A possible workaround is to use the reverse function rev() instead of the minus sign, like so:
column_a = c("a","a","b","b","c","c")
column_b = seq(6)
df = data.frame(column_a, column_b)
df$column_a = as.character(df$column_a)
df[with(df, order(-column_a, column_b)),]
> Error in -column_a : invalid argument to unary operator
df[with(df, order(rev(column_a), column_b)),]
column_a column_b
5 c 5
6 c 6
3 b 3
4 b 4
1 a 1
2 a 2
Let me know if it works in your case.
On this line, emps$EV doesn't exist.
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EV),]
head(sort3,10)
You probably meant
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EM),]
head(sort3,10)

Reading large fixed format text file in r

I am trying to input a large (> 70 MB) fixed format text file into r. For a smaller file (< 1MB), I can use the read.fwf() function as shown below.
condodattest1a <- read.fwf(impfile1,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
When I try to run the line of code below,
condodattest1 <- read.fwf(impfile,widths=testcsv3$Varlen,col.names=testcsv3$Varname)
I get the following error message:
Error: cannot allocate vector of size 2 Kb
The only difference between the 2 lines is the size of the input file.
The formatting for the file I want to import is given in the dataframe called testcsv3. I show a small snippet of the dataframe below:
> head(testcsv3)
Varlen Varname Varclass Varsep Varforfmt
1 2 "V1" "character" 2 "A2.0"
2 15 "V2" "character" 17 "A15.0"
3 28 "V3" "character" 45 "A28.0"
4 3 "V4" "character" 48 "F3.0"
5 1 "V5" "character" 49 "A1.0"
6 3 "V6" "character" 52 "A3.0"
At least part of my problem is that I am reading in all the data as factors when I use read.fwf() and I end up exceeding the memory limit on my computer.
I tried to use read.table() as a way of formatting each variable but it seems I need a text delimiter with that function. There is a suggestion in section 3.3 in the link below that I could use sep to identify the column where every variable starts.
http://data.princeton.edu/R/readingData.html
However, when I use the command below:
condodattest1b <- read.table(impfile1,sep=testcsv3$Varsep,col.names=testcsv3$Varname, colClasses=testcsv3$Varclass)
I get the following error message:
Error in read.table(impfile1, sep = testcsv3$Varsep, col.names = testcsv3$Varname, : invalid 'sep' argument
Finally, I tried to use:
condodattest1c <- read.fortran(impfile1,lengths=testcsv3$Varlen, format=testcsv3$Varforfmt, col.names=testcsv3$Varname)
but I get the following message:
Error in processFormat(format) : missing lengths for some fields
In addition: Warning messages:
1: In processFormat(format) : NAs introduced by coercion
2: In processFormat(format) : NAs introduced by coercion
3: In processFormat(format) : NAs introduced by coercion
All I am trying to do at this point is format the data when they come into r as something other than factors. I am hoping this will limit the amount of memory I am using and allow me to actually input the file. I would appreciate any suggestions about how I can do this. I know the Fortran formats for all the variables and the column at which each variable begins.
Thank you,
Warren
Maybe this code works for you. You have to fill varlen with the field sizes and add the corresponding type strings (e.g. numeric, character, integer) to colclasses
my.readfwf <- function(filename,varlen,colclasses) {
sidx <- cumsum(c(1,varlen[1:(length(varlen)-1)]))
eidx <- sidx+varlen-1
filecontent <- scan(filename,character(0),sep="\n")
if (any(diff(nchar(filecontent))!=0))
stop("line lengths differ!")
nlines <- length(filecontent)
res <- list()
for (i in seq_along(varlen)) {
res[[i]] <- sapply(filecontent,substring,first=sidx[i],last=eidx[i])
mode(res[[i]]) <- colclasses[i]
}
attributes(res) <- list(names=paste("V",seq_along(res),sep=""),row.names=seq_along(res[[1]]),class="data.frame")
return(res)
}

What arguments were passed to the functions in the traceback?

In R, if execution stops because of an error, I can evaluate traceback() to see which function the error occurred in, which function was that function called from, etc. It'll give something like this:
8: ar.yw.default(x, aic = aic, order.max = order.max, na.action = na.action,
series = series, ...)
7: ar.yw(x, aic = aic, order.max = order.max, na.action = na.action,
series = series, ...)
6: ar(x[, i], aic = TRUE)
5: spectrum0.ar(x)
4: effectiveSize(x)
Is there a way to find what arguments were passed to these functions? In this case, I'd like to know what arguments were passed to effectiveSize(), i.e. what is x.
The error does not occur in my own code, but in a package function. Being new to R, I'm a bit lost.
Not knowing how to do this properly, I tried to find the package function's definition and modify it, but where the source file should be I only find an .rdb file. I assume this is something byte-compiled.
I'd suggest setting options(error=recover) and then running the offending code again. This time, when an error is encountered, you'll be thrown into an interactive debugging environment in which you are offered a choice of frames to investigate. It will look much like what traceback() gives you, except that you can type 7 to enter the evaluation environment of call 7 on the call stack. Typing ls() once you've entered a frame will give you the list of its arguments.
An example (based on that in ?traceback) is probably the best way to show this:
foo <- function(x) { print(1); bar(2) }
bar <- function(x) { x + a.variable.which.does.not.exist }
## First with traceback()
foo(2) # gives a strange error
# [1] 1
# Error in bar(2) : object 'a.variable.which.does.not.exist' not found
traceback()
# 2: bar(2) at #1
# 1: foo(2)
## Then with options(error=recover)
options(error=recover)
foo(2)
# [1] 1
# Error in bar(2) : object 'a.variable.which.does.not.exist' not found
#
# Enter a frame number, or 0 to exit
#
# 1: foo(2)
# 2: #1: bar(2)
Selection: 1
# Called from: top level
Browse[1]> ls()
# [1] "x"
Browse[1]> x
# [1] 2
Browse[1]> ## Just press return here to go back to the numbered list of envts.
#
# Enter a frame number, or 0 to exit
#
# 1: foo(2)
# 2: #1: bar(2)
R has many helpful debugging tools, most of which are discussed in the answers to this SO question from a few years back.
You can use trace() to tag or label a function as requiring a "detour" to another function, the logical choice being browser().
?trace
?browser
> trace(mean)
> mean(1:4)
trace: mean(1:4)
[1] 2.5
So that just displayed the call. This next mini-session shows trace actually detouring into the browser:
> trace(mean, browser)
Tracing function "mean" in package "base"
[1] "mean"
> mean(1:4)
Tracing mean(1:4) on entry
Called from: eval(expr, envir, enclos)
Browse[1]> x #once in the browser you can see what values are there
[1] 1 2 3 4
Browse[1]>
[1] 2.5
> untrace(mean)
Untracing function "mean" in package "base"
As far as seeing what is in a function, if it is exported, you can simply type its name at the console. If it is not exported then use: getAnywhere(fn_name)

Resources