R - Operate on a column w/o explicitely reassigning it? - r

I'm often writing things like:
dataframe$this_column <- as.Date(dataframe$this_column)
That is, when changing some column in my data frame [table], I'm constantly writing the column twice. Is there some function that allows me to directly change the data frame w/o explicitly reassigning it? Say: ch(dataframe$this_column, as.Date())
EDIT: While similar, the potential duplicate is not the same. I am not looking for a way to shorten self-referential reassignments. I'm looking to avoid the explicit reassignment all together. The answer I accepted here is an appropriate solution (and much better than the answers provided in the "duplicate" question, in regards to their relevance to my question).

Here is the example using magrittr package:
library(magrittr)
x = c('2015-12-12','2015-12-13','2015-12-14')
df = data.frame(x)
df$x %<>% as.Date

Related

Using "count" function in a loop in R

I'm quite new to R and I've been learning with the available resources on the internet.
I came across this issue where I have a vector (a) with vars "1", "2", and "3". I want to use the count function to generate a new df with the categories for each of those variables and its frequencies.
The function I want to use in a loop is this
b <- count(mydata, var1)
However, when I use this loop below;
for (i in (a)) {
'j' <- count(mydata[, i])
print (j)
}
The loop happens but the frequencies which gets saved on j is only of the categorical variable "var 3".
Can someone assist me on this code please?
TIA!
In R there are generally better ways than to use loops to process data. In your particular case, the “straightforward” way fails, because the idea of the “tidyverse” is to have the data in tidy format (I highly recommend you read this article; it’s somewhat long but its explanation is really fundamental for any kind of data processing, even beyond the tidyverse). But (from the perspective of your code) your data is spread across multiple columns (wide format) rather than being in a single column (long form).
The other issue is that count (like many other tidyverse functions) expect an unevaluated column name. It does not accept the column name via a variable. akrun’s answer shows how you can work around this (using tidy evaluation and the bang-bang operator) but that’s a workaround that’s not necessary here.
The usual solution, instead of using a loop, would first require you to bring your data into long form, using pivot_longer.
After that, you can perform a single count on your data:
result <- mydata %>%
pivot_longer(all_of(a), names_to = 'Var', values_to = 'Value') %>%
count(Var, Value)
Some comments regarding your current approach:
Be wary of cryptic variable names: what are i, j and a? Use concise but descriptive variable names. There are some conventions where i and j are used but, if so, they almost exclusively refer to index variables in a loop over vector indices. Using them differently is therefore quite misleading.
There’s generally no need to put parentheses around a variable name in R (except when that name is the sole argument to a function call). That is, instead of for (i in (a)) it’s conventional to write for (i in a).
Don’t put quotes around your variable names! R happens to accept the code 'j' <- … but since quotes normally signify string literals, its use here is incredibly misleading, and additionally doesn’t serve a purpose.

Is there a way to apply plyr's count() function to every column individually?

Similar to this question but for R. I want to get a summary count of every variable in each column of a data frame.
Currently, doing something like plyr::count(df[,1:10]) checks for how many times every variable in a row match. Instead, I just want a quick way of printing out what all my variables even are, though. I know this can be done with C-style recursion, but I'm hoping for a more elegant/simpler solution.
You can use lapply:
lapply(df, plyr::count)
Alternatively, keeping everything in base R you can use table with stack to get similar output
lapply(df, function(x) stack(table(x)))

Replace subset of data table with other data table

I feel a bit silly for this question, but I only want to something which I know how to do with a data.frame, but I have not yet found a nice way to do it in R. All other similar questions seem way more complicated for what I have in mind. I simply want to replace a subset of a data.table with another data.table only based on an row index and choosing some columns.
MWE follows
x.df <- data.frame(a=c(1,2,3),
b=c(2,NA,NA),
c=c(3,NA,NA))
x.dt <- data.table(x.df)
x.df.replace<- data.frame(b=c(10,11), c=c(22,21))
x.dt.replace<- data.table(x.df.replace)
This works like a charm in data Frame
x.df[is.na(x.df$b),2:3]<-x.df.replace
On the other hand I would like to call the columns by name and I only know how to replace each column individually, but not jointly
x.dt[is.na(b),]
x.dt[is.na(b),c:=x.dt.replace[,c]]
x.dt[is.na(b),b:=x.dt.replace[,b]]
x.dt[is.na(b), list(b,c)]<-x.dt.replace
x.dt[is.na(b), list(b,c):=x.dt.replace]
I was having the same issue and I came across this question with no answer. The comments above helped me to find the solution to my problem, so I decided to post it here. May simply be a difference between data.table versions (I am using version 1.11.8), since this is relatively old question.
The solution uses a () instead of a .() or a list() to declare the column names to be replaced:
colunas <- c("b","c")
x.dt[is.na(b), (colunas) := x.dt.replace]
Hope this is useful

How to operate non-standard-evaluation in correct manner for summarize{dplyr}

I want to pass variables to 'summarize' by way of non-standard-evaluation approach (see http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions).
My script is as follows:
library(dplyr)
library(pryr)
x2<-data.frame(x=runif(1000,1,10),y=rnorm(1:1000))
y2<-group_by(x2,x)
field2<-"x"
z<-substitute(summarize(y2,check=sum(x)),list(x=as.name(field2)))
eval(quote(z),parent.frame())
But the output is not a dataframe as I supposed but a string:
>eval(quote(z),parent.frame())
summarize(y2, check = sum(x))
I am a little bit confused with non-standard-evaluation although I have looked through a number of examples.
Could you specify what is wrong with my approach?

Approaches to preserving object's attributes during extract/replace operations

Recently I encountered the following problem in my R code. In a function, accepting a data frame as an argument, I needed to add (or replace, if it exists) a column with data calculated based on values of the data frame's original column. I wrote the code, but the testing revealed that data frame extract/replace operations, which I've used, resulted in a loss of the object's special (user-defined) attributes.
After realizing that and confirming that behavior by reading R documentation (http://stat.ethz.ch/R-manual/R-patched/library/base/html/Extract.html), I decided to solve the problem very simply - by saving the attributes before the extract/replace operations and restoring them thereafter:
myTransformationFunction <- function (data) {
# save object's attributes
attrs <- attributes(data)
<data frame transformations; involves extract/replace operations on `data`>
# restore the attributes
attributes(data) <- attrs
return (data)
}
This approach worked. However, accidentally, I ran across another piece of R documentation (http://stat.ethz.ch/R-manual/R-patched/library/base/html/Extract.data.frame.html), which offers IMHO an interesting (and, potentially, a more generic?) alternative approach to solving the same problem:
## keeping special attributes: use a class with a
## "as.data.frame" and "[" method:
as.data.frame.avector <- as.data.frame.vector
`[.avector` <- function(x,i,...) {
r <- NextMethod("[")
mostattributes(r) <- attributes(x)
r
}
d <- data.frame(i = 0:7, f = gl(2,4),
u = structure(11:18, unit = "kg", class = "avector"))
str(d[2:4, -1]) # 'u' keeps its "unit"
I would really appreciate if people here could help by:
Comparing the two above-mentioned approaches, if they are comparable (I realize that the second approach as defined is for data frames, but I suspect it can be generalized to any object);
Explaining the syntax and meaning in the function definition in the second approach, especially as.data.frame.avector, as well as what is the purpose of the line as.data.frame.avector <- as.data.frame.vector.
I'm answering my own question, since I have just found an SO question (How to delete a row from a data.frame without losing the attributes), answers to which cover most of my questions posed above. However, additional explanations (for R beginners) for the second approach would still be appreciated.
UPDATE:
Another solution to this problem has been proposed in an answer to the following SO question: indexing operation removes attributes. Personally, however, I better like the approach, based on creating a new class, as it's IMHO semantically cleaner.

Resources