using values from a vector in R - r

Suppose I have a vector x=c(3,2,1). I have a data frame d. I want to add a column in that data frame such that if x takes value 3 then the new column takes value 1 else it takes value 0. It can be done by using simple "ifelse". But my problem is that I want to have the new vector name as "var_3" (without quotes obviously) where this 3 I will extract from x[1].
I have tried:
d$paste("var",x[1],sep="_")=ifelse(d$x==x[1],1,0)
which gives me the error: target of assignment expands to non-language object. As because paste gives me my desired var_3 but with quotes. I have tried noquotes too but with no luck.

This won't work with the $ operator but with the [ subscript operator:
d[, paste("var", x[1], sep="_")] <- ifelse(d$x == x[1], 1, 0)

Related

Subset data.table based on value in column of type list

So I have this case currently of a data.table with one column of type list.
This list can contain different values, NULL among other possible values.
I tried to subset the data.table to keep only rows for which this column has the value NULL.
Behold... my attempts below (for the example I named the column "ColofTypeList"):
DT[is.null(ColofTypeList)]
It returns me an Empty data.table.
Then I tried:
DT[ColofTypeList == NULL]
It returns the following error (I expected an error):
Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(), :
RHS of == is length 0 which is not 1 or nrow (96). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.
(Just a precision my original data.table contains 96 rows, which is why the error message say such thing:
which is not 1 or nrow (96).
The number of rows is not the point).
Then I tried this:
DT[ColofTypeList == list(NULL)]
It returns the following error:
Error: comparison of these types is not implemented
I also tried to give a list of the same length than the length of the column, and got this same last error.
So my question is simple: What is the correct data.table way to subset the rows for which elements of this "ColofTypeList" are NULL ?
EDIT: here is a reproducible example
DT<-data.table(Random_stuff=c(1:9),ColofTypeList=rep(list(NULL,"hello",NULL),3))
Have fun!
If it is a list, we can loop through the list and apply the is.null to return a logical vector
DT[unlist(lapply(ColofTypeList, is.null))]
# ColofTypeList anotherCol
#1: 3
Or another option is lengths
DT[lengths(ColofTypeList)==0]
data
DT <- data.table(ColofTypeList = list(0, 1:5, NULL, NA), anotherCol = 1:4)
I have found another way that is also quite nice:
DT[lapply(ColofTypeList, is.null)==TRUE]
It is also important to mention that using isTRUE() doesn't work.

R selecting rows from dataframe using logical indexing: accessing columns by `$` vs `[]`

I have a simple R data.frame object df. I am trying to select rows from this dataframe based on logical indexing from a column col in df.
I am coming from the python world where during similar operations i can either choose to select using df[df[col] == 1] or df[df.col == 1] with the same end result.
However, in the R data frame df[df$col == 1] gives an incorrect result compared to df[df[,col] == 1] (confirmed by summary command). I am not able to understand this difference as from links like http://adv-r.had.co.nz/Subsetting.html it seems that either way is ok. Also, str command on df$col and df[, col] shows the same output.
Is there any guidelines about when to use $ vs [] operator ?
Edit:
digging a little deeper and using this question as reference, it seems like the following code works correctly
df[which(df$col == 1), ]
however, not clear how to guard against NA and when to use which
You confused many things.
In
df[,col]
col should be the column number. For example,
col = 2
x = df[,col]
would select the second column and store it to x.
In
df$col
col should be the column name. For example,
df=data.frame(aa=1:5,bb=10:14)
x = df$bb
would select the second column and store it to x. But you cannot write df$2.
Finally,
df[[col]]
is the same as df[,col] if col is a number. If col is a character ("character" in R means the same as string in other languages), then it selects the column with this name. Example:
df=data.frame(aa=1:5,bb=10:14)
foo = "bb"
x = df[[foo]]
y = df[[2]]
z = df[["bb"]]
Now x, y, and z are all contain the copy of the second column of df.
The notation foo[[bar]] is from lists. The notation foo[,bar] is from matrices. Since dataframe has features of both matrix and list, it can use both.
Use $ when you want to select one specific column by name df$col_name.
Use [] when you want to select one or more columns by number:
df[,1] # select column with index 1
df[,1:3]# select columns with indexes 1 to 3
df[,c(1,3:5,7)] # select columns with indexes 1, 3 to 5 and 7.
[[]] is mostly for lists.
EDIT: df[which(df$col == 1), ] works because which function creates a logical vector which checks if the column index is equal to 1 (true) or not (false). This logical vector is passed to df[] and only true value is shown.
Remove rows with NAs (missing values) in data.frame - to find out more about how to deal with missing values. It is always a good practice to exclude missing values from dataset.

Subsetting a Spatial Data Frame using Input from InputSelect [duplicate]

I'm wondering how to use the subset function if I don't know the name of the column I want to test. The scenario is this: I have a Shiny app where the user can pick a variable on which to filter (subset) the data table. I receive the column name from the webapp as input, and I want to subset based on the value of that column, like so:
subset(myData, THECOLUMN == someValue)
Except where both THECOLUMN and someValue are variables. Is there a syntax for passing the desired column name as a string?
Seems to want a bareword that is the column name, not a variable that holds the column name.
Both subset and with are designed for interactive use and warnings against their use within other functions will be found in their help pages. This stems from their strategy of evaluation arguments as expressions within an environment constructed from the names of their data arguments. These column/element names would otherwise not be "objects" in the R-sense.
If THECOLUMN is the name of an object whose value is the name of the column and someValue is the name of an object whose value is the target, then you should use:
dfrm[ dfrm[[THECOLUMN]] == someValue , ]
The fact that "[[" will evaluate its argument is why it is superior to "$" for programing. If we use joran's example:
d <- data.frame(x = letters[1:5],y = runif(5))
THECOLUMN= "x"
someValue= "c"
d[ d[[THECOLUMN]] == someValue , ]
# x y
# 3 c 0.7556127
So in this case all these return the same atomic vector:
d[[ THECOLUMN ]]
d[[ 'x' ]]
d[ , 'x' ]
d[, THECOLUMN ]
d$x # of the three extraction functions: `$`, `[[`, and `[`,
# only `$` is unable to evaluate its argument
This is precisely why subset is a bad tool for anything other than interactive use:
d <- data.frame(x = letters[1:5],y = runif(5))
> d[d[,'x'] == 'c',]
x y
3 c 0.3080524
Fundamentally, extracting things in R is built around [. Use it.
I think you could use the following one-liner:
myData[ , grep(someValue, colnames(myData))]
where
colnames(myData)
outputs a vector containing all column names and
grep(someValue, colnames(myData))
should results in a numeric vector of length 1 (given the column name is unique) pointing to your column. See ?grep for information about pattern matching in R.

Subset based on variable column name

I'm wondering how to use the subset function if I don't know the name of the column I want to test. The scenario is this: I have a Shiny app where the user can pick a variable on which to filter (subset) the data table. I receive the column name from the webapp as input, and I want to subset based on the value of that column, like so:
subset(myData, THECOLUMN == someValue)
Except where both THECOLUMN and someValue are variables. Is there a syntax for passing the desired column name as a string?
Seems to want a bareword that is the column name, not a variable that holds the column name.
Both subset and with are designed for interactive use and warnings against their use within other functions will be found in their help pages. This stems from their strategy of evaluation arguments as expressions within an environment constructed from the names of their data arguments. These column/element names would otherwise not be "objects" in the R-sense.
If THECOLUMN is the name of an object whose value is the name of the column and someValue is the name of an object whose value is the target, then you should use:
dfrm[ dfrm[[THECOLUMN]] == someValue , ]
The fact that "[[" will evaluate its argument is why it is superior to "$" for programing. If we use joran's example:
d <- data.frame(x = letters[1:5],y = runif(5))
THECOLUMN= "x"
someValue= "c"
d[ d[[THECOLUMN]] == someValue , ]
# x y
# 3 c 0.7556127
So in this case all these return the same atomic vector:
d[[ THECOLUMN ]]
d[[ 'x' ]]
d[ , 'x' ]
d[, THECOLUMN ]
d$x # of the three extraction functions: `$`, `[[`, and `[`,
# only `$` is unable to evaluate its argument
This is precisely why subset is a bad tool for anything other than interactive use:
d <- data.frame(x = letters[1:5],y = runif(5))
> d[d[,'x'] == 'c',]
x y
3 c 0.3080524
Fundamentally, extracting things in R is built around [. Use it.
I think you could use the following one-liner:
myData[ , grep(someValue, colnames(myData))]
where
colnames(myData)
outputs a vector containing all column names and
grep(someValue, colnames(myData))
should results in a numeric vector of length 1 (given the column name is unique) pointing to your column. See ?grep for information about pattern matching in R.

Losing Class information when I use apply in R

When I pass a row of a data frame to a function using apply, I lose the class information of the elements of that row. They all turn into 'character'. The following is a simple example. I want to add a couple of years to the 3 stooges ages. When I try to add 2 a value that had been numeric R says "non-numeric argument to binary operator." How do I avoid this?
age = c(20, 30, 50)
who = c("Larry", "Curly", "Mo")
df = data.frame(who, age)
colnames(df) <- c( '_who_', '_age_')
dfunc <- function (er) {
print(er['_age_'])
print(er[2])
print(is.numeric(er[2]))
print(class(er[2]))
return (er[2] + 2)
}
a <- apply(df,1, dfunc)
Output follows:
_age_
"20"
_age_
"20"
[1] FALSE
[1] "character"
Error in er[2] + 2 : non-numeric argument to binary operator
apply only really works on matrices (which have the same type for all elements). When you run it on a data.frame, it simply calls as.matrix first.
The easiest way around this is to work on the numeric columns only:
# skips the first column
a <- apply(df[, -1, drop=FALSE],1, dfunc)
# Or in two steps:
m <- as.matrix(df[, -1, drop=FALSE])
a <- apply(m,1, dfunc)
The drop=FALSE is needed to avoid getting a single column vector.
-1 means all-but-the first column, you could instead explicitly specify the columns you want, for example df[, c('foo', 'bar')]
UPDATE
If you want your function to access one full data.frame row at a time, there are (at least) two options:
# "loop" over the index and extract a row at a time
sapply(seq_len(nrow(df)), function(i) dfunc(df[i,]))
# Use split to produce a list where each element is a row
sapply(split(df, seq_len(nrow(df))), dfunc)
The first option is probably better for large data frames since it doesn't have to create a huge list structure upfront.

Resources