Paste0, subset Error: 'subset' must be logical - r

I would like to use paste0 to create a long string containing the conditions for the subset function.
I tried the following:
#rm(list=ls())
set.seed(1)
id<-1:20
ids<-sample(id, 3)
d <- subset(id, noquote(paste0("id==",ids,collapse="|")))
I get the
Error in subset.default(id, noquote(paste0("id==", ids, collapse = "|"))) :
'subset' must be logical
I tried the same without noquote. Interestinly when I run
noquote(paste0("id==",ids,collapse="|"))
I get [1] id==4|id==7|id==1. When I then paste this by hand in the subset formula
d2<-subset(id,id==4|id==7|id==1)
Everything runs nice. But why does subset(id, noquote(paste0("id==",ids,collapse="|"))) not work although it seems to be the same? Thanks a lot for your help!

Related

Keep getting an error when trying to use !duplicated function in R

I have imported an xlsx document into R, and I have found several duplicates in the document. When I try to delete those duplicates using !duplicated function, it keeps giving me the following error:
Error: Must subset columns with a valid subscript vector.
ℹ Logical subscripts must match the size of the indexed input.
x Input has size 30 but subscript !duplicated(export) has size 33376.
Below is the code I have so far:
cb<-read.csv("120Water Request_KE.csv")
export <- read_xlsx("Anderson, IN _ Ziptility Export_KE.xlsx")
cb<-clean_names(cb)
export<-clean_names(export)
export <- export[!duplicated[[export, ]
Thank you
I think what you are looking for is:
export <- export[!duplicated(export),]
or
library(tidyverse)
export <- export %>%
distinct(., .keep_all = TRUE)
This is a simple syntax error.
It's difficult to rewrite your code without having access to your dataset, but the syntax you're looking for is probably this:
export[!duplicated(export, by = "column_name"),]
Just change your square brackets to round ones and specify which column you want it to check.

Why am I getting Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

My code is as follows:
my_filtered_data <- my_data[, colSums(my_data != 0) >= 300]
set.seed(123)
data1.csv <- my_filtered_data[sample(nrow(my_filtered_data), 200), ]
data2.csv <- data.frame(data1.csv)
data3.csv <- scale(data2.csv, center = TRUE) # Gives error.
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Can someone explain why I am receiving this error?
This is a bit longer for a comment, may or may not answer the problem. But I think this can be one of the issues with OP dataset.
You used data.frame command , now data.frame contains a parameter stringsAsFactors = TRUE by default,and probably this is converting one of your columns to factor, that is why you are getting this error, One way to avoid it to use options(stringsAsFactors=FALSE) on top of your code or use data.frame(your_object, stringsAsFactors=FALSE)
Just to recreate the error , you can use iris dataset to display a similar error,
scale(iris[,1:5], center=TRUE, scale=TRUE)
## This fails with the same error as the last column in iris data set is a factor
but this will work,
scale(iris[,1:4], center=TRUE, scale=TRUE)
Note I am dropping here the column, In your case you might want to change it to numeric(so it totally depends on what you are trying to do here). In case you do want to change it to numeric from a factor, try running as.numeric(as.character(your_column)).
Also as suggested in comments , try avoiding names which contains dot in in your object in R.
So, sum of all the info can be wrapped in one line that , you should try this:
data.frame(data1.csv, stringsAsFactors=FALSE) then try running the scale command.

data.table - Extract all the text features

As part of a function, I am trying to isolate all features that are either character or factor. My data set is a data.table.
text_features <- c(names(data_set[sapply(data_set, is.character)]), names(data_set[sapply(data_set, is.factor)]))
When I run the function I am getting an exception message that says :
Error in [.data.table(data_set, sapply(data_set, is.character)) :
i evaluates to a logical vector length 87 but there are 12992 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
I understand this error is thrown by a recent version of data.table - How should I change my code to work the same way in order to avoid this error?
Note:
packageVersion("data.table")
[1] ‘1.10.4.3’
Thanks
The error that you are getting is because you have commas in the wrong place when you are subsetting your inner data.tables. You want a subset of the columns, not rows:
data_set[sapply(data_set, is.character)] # subsetting rows
data_set[,sapply(data_set, is.character), with = FALSE] # subsetting columns
All that said, I think a much cleaner solution would be:
text_cols <- names(data_set)[sapply(data_set, class) %in% c("character","factor")]
data_set[, ..text_cols] # subset data

Naming elements of a list as a function of x in r

Here I am trying to name the individual elements of this list as a function of x so that I may index it later like one would with a dataframe or vector, yet I keep getting the error message
Error: unexpected '=' in "Indxlist <- sapply(1:1600, function(x) list( (x) ="
Here is the code that I am attempting to use...
Indxlist <- sapply(1:1600, function(x) list( (x) = dataframe1[,x]))
Thanks!
I think this cannot work. You cannot name a list with integers. Just do this after your command (which is not good practice anyways):
names(Indxlist) <- 1:1600

Using apply() over columns to output subsets

I have a data frame in R where the majority of columns are values, but there is one character column. For each column excluding the character column I want to subset the values that are over a threshold and obtain the corresponding value in the character column.
I'm unable to find a built-in dataset that contains the pattern of data I want, so a dput of my data can be accessed here.
When I use subsetting, I get the output I'm expecting:
> df[abs(df$PA3) > 0.32,1]
[1] "SSI_01" "SSI_02" "SSI_04" "SSI_05" "SSI_06" "SSI_07" "SSI_08" "SSI_09"
When I try to iterate over the columns of the data frame using apply, I get a recursion error:
> apply(df[2:10], 2, function(x) df[abs(df[[x]])>0.32, 1])
Error in .subset2(x, i, exact = exact) :
recursive indexing failed at level 2
Any suggestions where I'm going wrong?
The reason your solution didn't work is that the x being passed to your user-defined function is actually a column of df. Therefore, you could get your solution working with a small modification (replacing df[[x]] with x):
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])
You could use the ... argument to apply to pass an extra argument. In this case, you would want to pass the first column:
apply(df[2:10], 2, function(x, y) y[abs(x) > 0.32], y=df[,1])
Yet another variation:
apply(abs(df[-1]) > .32, 2, subset, x=df[[1]])
The cute trick here is to "curry" subset by specifying the x parameter. I was hoping I could do it with [ but that doesn't deal with named parameters in the typical way because it is a primitive function :..(
A quick and non-sophisticated solution might be:
sapply(2:10, function(x) df[abs(df[,x])>0.32, 1])
Try:
lapply(df[,2:10],function(x) df[abs(x)>0.32, 1])
Or using apply:
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])

Resources