R - selecting data with [a:b, c:d] not working - r

Got a data set with a bunch of rows (1-285) and columns (x__1-x__70). Trying to select data from rows 12 and 13, starting from column x__5 to x__70.
I can select individual cells WPP[12,"x__5"], full columns WPP[,x__5], full rows WPP[12,], full row ranges WPP[12:13,], but can't do column ranges.
I'd like WPP[12:13,"X__5":"x__70"] but I get this:
"Error in "X__5":"x__70" : NA/NaN argument
In addition: Warning messages:
1: In check_names_df(j, x) : NAs introduced by coercion
2: In check_names_df(j, x) : NAs introduced by coercion"

I also encountered this problem, while converting some of the SAS code in R. Unfortunately R does not have any mechanism, which can be used for this purpose.
Hence, its important that you do it using the available set of tools. In R, you can subset columns by name(you have to give all column names, explicitly), or position index.
Below are the two solutions:
WPP[12:13, paste0("X__", 5:70)] # explicit column names
requiredColIndex <- which(names(WPP) %in% paste0("X__", 5:70))
WPP[12:13, requiredColIndex] # using the index of columns
You can also use subset function, as pointed out in the comments.
rowIndex <- 1:nrow(mtcars) %in% 12:15
subset(mtcars, rowIndex, select=wt:gear)
But sub-setting rows in subset is a bit tricky, as one can observe in the code above. However you can always use, something like below:
subset(mtcars[12:15, ], select=wt:gear)
which will give similar result.
However, if you are doing subsetting of rows using data-frame way/method, then it is easier to subset the columns using similar data-frame way/method.

Related

Can't select row in R - can't figure out why

I am trying to remove two rows from my dataset with this simple line of code:
my_data_screen <- my_data [-influential]
However, I get the error message Error: Can't negate columns that don't exist.
(The "influential" variable simply contains two numbers of rows, which is the result of calculating outliers from my sample.)
Even why I try do something as simple as targeting a specific row (i.e. my_data [37]), I get the same error message.
Why is R interpreting my command as targeting columns, rather than rows?
Hi with your code R cannot understand if you select a row or a column.
As #ThomasIsCoding suggest you should use:
my_data_screen <- my_data[-influential,]
Comma indicate there are rows, if you want to delete columns the following specification is the right one:
my_data_screen <- my_data[,-influential]
In summary, the position of commas tell R if you want to delete columns or rows.
If you have my_data as data.frame, then you should use
my_data[37, ]
since my_data[37] is indexing my_data in terms of columns by default.
Please read about https://rspatial.org/intr/4-indexing.html
If you are familiar with tidyverse, you should use :
The filter() function to remove rows : filter(!(influential %in% specified_values))
The select() function to remove columns : select(-influential)

How to use filter_if to select rows where any one of a number of logical variables in the data frame is true

I have a large data frame where rows are medicines with one or more component generic drugs. The DF has 21 logical variables indicating medicine contains one of a set of 21 generic drugs I want to filter for. Can I use filter_if to identify all the rows where any of these 21 variables is TRUE? Assuming I can, I think I am having trouble with the syntax for filter_if.
Here is my attempt so far and the error codes I'm getting. In the following code, the variables I am testing are in the columns"IBUPROFEN":"BAICALIN/CATECHIN"
These are all logical TRUE/FALSE variables.
> Drug_Table_NamesNumberSML %>%
+ select("IBUPROFEN":"BAICALIN/CATECHIN") %>%
+ filter_if(isTRUE("IBUPROFEN":"BAICALIN/CATECHIN"))
Error in "IBUPROFEN":"BAICALIN/CATECHIN" : NA/NaN argument
In addition: Warning messages:
1: In isTRUE("IBUPROFEN":"BAICALIN/CATECHIN") : NAs introduced by coercion
2: In isTRUE("IBUPROFEN":"BAICALIN/CATECHIN") : NAs introduced by coercion
I dont understand where the NA/NaN argument error and the NAs introduced by coercion warnings are coming from.
I'm also not sure if this will do what I want it to do once those errors/warnings are addressed.
What I'd like to end up with is a data frame that contains only the rows that pertain to the 21 drugs for which I have logical variables to flag.
It's hard to say without a sample of your data to play around with (please provide some!), but I think filter_all could work.
If I understood correctly, something like this should give you what you want:
Drug_Table_NamesNumberSML %>%
select("IBUPROFEN":"BAICALIN/CATECHIN") %>%
filter_all(any_vars(. == TRUE))
You can find more examples of how to use different filter functions here.

'row.names' is not a character vector of length

I am simply trying to create a dataframe.
I read in data by doing:
>example <- read.csv(choose.files(), header=TRUE, sep=";")
The data contains 2 columns with 8736 rows plus a header.
I then simply want to combine this with the column of a dataframe with the same amount of rows (!) by doing:
>data_frame <- as.data.frame(example$x, example$y, otherdata$z)
It produces the following error
Warning message:
In as.data.frame.numeric(example$x, example$y, otherdata$z) :
'row.names' is not a character vector of length 8736 -- omitting it. Will be an error!
I have never had this problem before. It seems so easy to tackle but I cant help myself at the moment.
Overview
As long as the nrow(example) equals length(otherdata$z), use cbind.data.frame to combine columns into one data frame. An advantage with cbind.data.frame() is that there is no need to call the individual columns within example when binding them with otherdata$z.
# create a new data frame that adds the 'z' field from another source
df_example <- cbind.data.frame(example, otherdata$z)

R Error - Error in x[j] : only 0's may be mixed with negative subscripts

priceSet <- subset(price, price$Source=='xyz', select = c(price$Category, price$AvgPrice))
I am connecting to a SQL Server DB using RODBC package and getting a few fields from the table as above.
But the subset returns error,
Error in x[j] : only 0's may be mixed with negative subscripts
The AvgPrice does contain both negative and positive values. And I need to allow that.
How do I get pass the error?
The select argument only wants to know the columns, and not the dataframe such columns come from (which is already declared in the x argument):
priceSet <- subset(price, Source=='xyz', select = c(Category, AvgPrice))
From the R help section:
The select argument exists only for the methods for data frames and
matrices. It works by first replacing column names in the selection
expression with the corresponding column numbers in the data frame and
then using the resulting integer vector to index the columns. This
allows the use of the standard indexing conventions so that for
example ranges of columns can be specified easily, or single columns
can be dropped (see the examples).

How to escape error of length when suppressing NA values in data frame?

I have a data.frame with a lot of NAs but each column has not the different length without the NAs and I would like to have an identical data frame but without the NAs.
So when I am doing :
for (i in 1:length(df[,1]))
(df[,i]<-df[,i][!is.na(df[,i])])
It answers :
Error in `[<-.data.frame`(`*tmp*`, , i, value = c(2696L, 2696L, 2640L, :
Do anyone has an idea how to do it ?
You cannot do what you're attempting in the code in your question because a data.frame is a list structure with a single key restriction: all vectors (or variables) in the list must have the same length. Your code is attempting to create vectors with different lengths, which is not allowed.
You probably just need the complete.cases function:
complete.cases(df)
This removes all rows that have any NA value in any column.

Resources