loop over columns with semi like columnnames - r

I have the following variable and dataframe
welltypes <- c("LC","HC")
qccast <- data.frame(
LC_mean=1:10,
HC_mean=10:1,
BC_mean=rep(0,10)
)
Now I only want to see the welltypes I selected(in this case LC and HC, but it could also be different ones.)
for(i in 1:length(welltypes)){
qccast$welltypes[i]_mean
}
This does not work, I know.
But how do i loop over those columns?
And it has to happen variable wise, because welltypes is of an unkown size.

The second argument to $ needs to be a column name of the first argument. I haven't run the code, but I would expect welltypes[i]_mean to be a syntax error. $ is similar to [[, so you can use paste to create the column name string and subset via [[.
For example:
qccast[[paste(welltypes[i],"_mean",sep="")]]
Depending on the rest of your code, you may be able to do something like this instead.
for(i in paste(welltypes,"_mean",sep="")){
qccast[[i]]
}

Here's another strategy:
qccast[ sapply(welltypes, grep, names(qccast)) ]
LC_mean HC_mean
1 1 10
2 2 9
3 3 8
4 4 7
5 5 6
6 6 5
7 7 4
8 8 3
9 9 2
10 10 1

Another easy way to access given welltypes
qccast[,paste(welltypes, '_mean', sep = "")]

Related

Sorting dataframe in R in reverse order - Column name as a variable [duplicate]

I've looked and looked and the answer either does not work for me, or it's far too complex and unnecessary.
I have data, it can be any data, here is an example
chickens <- read.table(textConnection("
feathers beaks
2 3
6 4
1 5
2 4
4 5
10 11
9 8
12 11
7 9
1 4
5 9
"), header = TRUE)
I need to, very simply, sort the data for the 1st column in descending order. It's pretty straightforward, but I have found two things below that both do not work and give me an error which says:
"Error in order(var) : Object 'var' not found.
They are:
chickens <- chickens[order(-feathers),]
and
chickens <- chickens[sort(-feathers),]
I'm not sure what I'm not doing, I can get it to work if I put the df name in front of the varname, but that won't work if I put an minus sign in front of the varname to imply descending sort.
I'd like to do this as simply as possible, i.e. no boolean logic variables, nothing like that. Something akin to SPSS's
SORT BY varname (D)
The answer is probably right in front of me, I apologize for the basic question.
Thank you!
You need to use dataframe name as prefix
chickens[order(chickens$feathers),]
To change the order, the function has decreasing argument
chickens[order(chickens$feathers, decreasing = TRUE),]
The syntax in base R, needs to use dataframe name as a prefix as #dmi3kno has shown. Or you can also use with to avoid using dataframe name and $ all the time as mentioned by #joran.
However, you can also do this with data.table :
library(data.table)
setDT(chickens)[order(-feathers)]
#Also
#setDT(chickens)[order(feathers, decreasing = TRUE)]
# feathers beaks
# 1: 12 11
# 2: 10 11
# 3: 9 8
# 4: 7 9
# 5: 6 4
# 6: 5 9
# 7: 4 5
# 8: 2 3
# 9: 2 4
#10: 1 5
#11: 1 4
and dplyr :
library(dplyr)
chickens %>% arrange(desc(feathers))

How does is.null work on list elements in R? [duplicate]

I found a very suprising and unpleasant feature of R - it completes list item names!!! See the following code:
a <- list(cov_spring = "spring")
a$cov <- c()
a$cov
# spring ## I expect it to be empty!!! I've set it empty!
a$co
# spring
a$c
I don't know what to do with that.... I need to be able to set $cov to NULL and have $cov_spring there at the same time!!! And use $cov separately!! This is annoying!
My question:
What is going on here? How is this possible, what is the logic behind?
Is there some easy fix, how to turn this completion off? I need to use list items cov_spring and cov independently as if they are normal variables. No damn completion please!!!
From help("$"):
'x$name' is equivalent to 'x[["name", exact = FALSE]]'
When you scroll back and read up on exact=:
exact: Controls possible partial matching of '[[' when extracting by
a character vector (for most objects, but see under
'Environments'). The default is no partial matching. Value
'NA' allows partial matching but issues a warning when it
occurs. Value 'FALSE' allows partial matching without any
warning.
So this provides you partial matching capability in both $ and [[ indexing:
mtcars$cy
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[["cy"]]
# NULL
mtcars[["cy", exact=FALSE]]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
There is no way I can see of to disable the exact=FALSE default for $ (unless you want to mess with formals, which I do not recommend for the sake of reproducibility and consistent behavior).
Programmatic use of frames and lists (for defensive purposes) should prefer [[ over $ for precisely this reason. (It's rare, but I have been bitten by this permissive behavior.)
Edit:
For clarity on that last point:
mtcars$cyl becomes mtcars[["cyl"]]
mtcars$cyl[1:3] becomes mtcars[["cyl"]][1:3]
mtcars[,"cy"] is not a problem, nor is mtcars[1:3,"cy"]
You can use [ or [[ instead.
a["cov"] will return a list with a NULL element.
a[["cov"]] will return the NULL element directly.

Very confusing R feature - completion of list item names

I found a very suprising and unpleasant feature of R - it completes list item names!!! See the following code:
a <- list(cov_spring = "spring")
a$cov <- c()
a$cov
# spring ## I expect it to be empty!!! I've set it empty!
a$co
# spring
a$c
I don't know what to do with that.... I need to be able to set $cov to NULL and have $cov_spring there at the same time!!! And use $cov separately!! This is annoying!
My question:
What is going on here? How is this possible, what is the logic behind?
Is there some easy fix, how to turn this completion off? I need to use list items cov_spring and cov independently as if they are normal variables. No damn completion please!!!
From help("$"):
'x$name' is equivalent to 'x[["name", exact = FALSE]]'
When you scroll back and read up on exact=:
exact: Controls possible partial matching of '[[' when extracting by
a character vector (for most objects, but see under
'Environments'). The default is no partial matching. Value
'NA' allows partial matching but issues a warning when it
occurs. Value 'FALSE' allows partial matching without any
warning.
So this provides you partial matching capability in both $ and [[ indexing:
mtcars$cy
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[["cy"]]
# NULL
mtcars[["cy", exact=FALSE]]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
There is no way I can see of to disable the exact=FALSE default for $ (unless you want to mess with formals, which I do not recommend for the sake of reproducibility and consistent behavior).
Programmatic use of frames and lists (for defensive purposes) should prefer [[ over $ for precisely this reason. (It's rare, but I have been bitten by this permissive behavior.)
Edit:
For clarity on that last point:
mtcars$cyl becomes mtcars[["cyl"]]
mtcars$cyl[1:3] becomes mtcars[["cyl"]][1:3]
mtcars[,"cy"] is not a problem, nor is mtcars[1:3,"cy"]
You can use [ or [[ instead.
a["cov"] will return a list with a NULL element.
a[["cov"]] will return the NULL element directly.

looping over the name of the columns in R for creating new columns

I am trying to use the loop over the column names of the existing dataframe and then create new columns based on one of the old column.Here is my sample data:
sample<-list(c(10,12,17,7,9,10),c(NA,NA,NA,10,12,13),c(1,1,1,0,0,0))
sample<-as.data.frame(sample)
colnames(sample)<-c("x1","x2","D")
>sample
x1 x2 D
10 NA 1
12 NA 1
17 NA 1
7 10 0
9 20 0
10 13 0
Now, I am trying to use for loop to generate two variables x1.imp and x2.imp that have values related to D=0 when D=1 and values related to D=1 when D=0(Here I actually don't need for loop but for my original dataset with large cols (variables), I really need the loop) based on the following condition:
for (i in names(sample[,1:2])){
sample$i.imp<-with (sample, ifelse (D==1, i[D==0],i[D==1]))
i=i+1
return(sample)
}
Error in i + 1 : non-numeric argument to binary operator
However, the following works, but it doesn't give the names of new cols as imp.x2 and imp.x3
for(i in sample[,1:2]){
impt.i<-with(sample,ifelse(D==1,i[D==0],i[D==1]))
i=i+1
print(as.data.frame(impt.i))
}
impt.i
1 7
2 9
3 10
4 10
5 12
6 17
impt.i
1 10
2 12
3 13
4 NA
5 NA
6 NA
Note that I already know the solution without loop [here]. I want with loop.
Expected output:
x1 x2 D x1.impt x2.imp
10 NA 1 7 10
12 NA 1 9 20
17 NA 1 10 13
7 10 0 10 NA
9 20 0 12 NA
10 13 0 17 NA
I would greatly appreciate your valuable input in this regard.
This is nuts, but since you are asking for it... Your code with minimum changes would be:
for (i in colnames(sample)[1:2]){
sample[[paste0(i, '.impt')]] <- with(sample, ifelse(D==1, get(i)[D==0],get(i)[D==1]))
}
A few comments:
replaced names(sample[,1:2]) with the more elegant colnames(sample)[1:2]
the $ is for interactive usage. Instead, when programming, i.e. when the column name is to be interpreted, you need to use [ or [[, hence I replaced sample$i.imp with sample[[paste0(i, '.impt')]]
inside with, i[D==0] will not give you x1[D==0] when i is "x1", hence the need to dereference it using get.
you should not name your data.frame sample as it is also the name of a pretty common function
This should work,
test <- sample[,"D"] == 1
for (.name in names(sample)[1:2]){
newvar <- paste(.name, "impt", sep=".")
sample[[newvar]] <- ifelse(test, sample[!test, .name],
sample[test, .name])
}
sample

R: Can't select a specific column in a data frame

I have a problem with a function to select a given column. I have a data frame called Volume from which I want to make a subset DateSearch:
DateSearch = subset(Volume,select=c("TRI",name))
For some reason it does not work. I have used browser(). I can select TRI or name but I can't select both (either with their name or indice). I have tried with and without "".
Does anyone know why is that?
Many thanks,
Vincent
I just did what (I think) you describe:
str(dfrm)
#'data.frame': 20 obs. of 8 variables:
# $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
# $ factor1: Factor w/ 4 levels "Not at all","To a small extent",..: 3 2 3 NA 3 NA 3 NA 4 1 ...
## <snip>
name = "factor1"
subset(dfrm, select=c("ID", name))
No error, .... results as expected.
Examine the spelling carefully. My guess is that you have a space at the beginning or end of the result of the as.character result. Perhaps even a non-printing character? You can use nchar(name) to check.

Resources