Unique function: looping through variables in R - r

I am currently in a loop in my R code and I want to use the following code:
distinct.values <- unique(SQL_Table$column.names2[for.num])
column.names2 looks like this
column.names2
[1] "plan" "gender" "marital_status" "acceleration" "extension"
[6] "inflation"
depending on the for.num I want a different variable after the $.
Example:
when for.num = 1.
I want distinct.values to be set as unique(SQL_Table$plan)
when for.num = 2.
I want distinct.values to be set as unique(SQL_Table$gender)
and so on...
How can I do this?

To expand my comment on the original question, there a few different ways to access columns of data frames.
my_df$column_name
When using the $ operator, "column_name" is specified as a literal token in the R script. Note that because the column name is unquoted, this method does not allow variable substitution.
my_df[["column_name"]]
When using the [[ ]] operator, a string (or vector of strings) is expected. In this case, variable substitution is allowed, so the following is valid:
my_col <- "column_name"
my_df[[my_col]]
This would be equivalent to my_df$column_name.
So your code could be modified to read:
unique(SQL_Table[[column.names2[for.num]]])

Related

How can list elements be used as factor names in R?

I have a list, e.g. mylist=c("A","B","C"), and I wish to use list elements to extract factors of a data frame in R.
If MyDataFrame has a column name "A", I can extract the column/factor as MyDataFrame$A. However,
MyDataFrame$mylist[1]
fails. At first I thought that this was because mycolumn[3] is "A" whereas I need $A without the quotes. However, using
MyDataFrame$as.name(mylist[1])
fails as well, presumably because R looks for the string as.name(mylist[1]) as a factor name rather than processing the function (the rror it gives is "attempt to apply non-function". Setting x=as.name(mylist[1]) and then using MyDataFrame$x runs into the same problem of x not being treated as a variable.
Is there a straightforward way to do this, as I need to loop over a long list of column names in order to call the factors of interest.
Try this : rather than $
MyDataFrame[,mylist[1]]

R encoding ASCII backtick

I have the following backtick on my list's names. Prior lists did not have this backtick.
$`1KG_1_14106394`
[1] "PRDM2"
$`1KG_20_16729654`
[1] "OTOR"
I found out that this is a 'ASCII grave accent' and read the R page on encoding types. However what to do about it ? I am not clear if this will effect some functions (such as matching on list names) or is it OK leave it as is ?
Encoding help page: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html
Thanks!
My understanding (and I could be wrong) is that the backticks are just a means of escaping a list name which otherwise could not be used if left unescaped. One example of using backticks to refer to a list name is the case of a name containing spaces:
lst <- list(1, 2, 3)
names(lst) <- c("one", "after one", "two")
If you wanted to refer to the list element containing the number two, you could do this using:
lst[["after one"]]
But if you want to use the dollar sign notation you will need to use backticks:
lst$`after one`
Update:
I just poked around on SO and found this post which discusses a similar question as yours. Backticks in variable names are necessary whenever a variable name would be forbidden otherwise. Spaces is one example, but so is using a reserved keyword as a variable name.
if <- 3 # forbidden because if is a keyword
`if` <- 3 # allowed, because we use backticks
In your case:
Your list has an element whose name begins with a number. The rules for variable names in R is pretty lax, but they cannot begin with a number, hence:
1KG_1_14106394 <- 3 # fails, variable name starts with a number
KG_1_14106394 <- 3 # allowed, starts with a letter
`1KG_1_14106394` <- 3 # also allowed, since escaped in backticks

Paste function to construct existing data frame name and evaluate in R

I am working with a long list of data frames.
Here is a simple hypothetical example of a data frame:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
I am trying to retrieve a specified column of the data frame using paste function.
get(paste("DFrame","$","ColTwo",sep=""))
The get function returns the following error, when trying to retrieve a specified column:
Error in get(paste("DFrame", "$", "ColTwo", sep = "")) :object 'DFrame$ColTwo' not found
When I enter the constructed name of the data frame DFrame$ColTwo it returns the desired output of the second column.
If I reconstruct an example without the '$' sign then I get the desired answer from the get function. For example the code yields 2:
enter code here
Ans <- 2
get(paste("An","s",sep=""))
[1] 2
I am looking for the same desired outcome, but struggling to get past the error that the object could not be found.
I also attempted using the following format, but the quotation in the column name breaks the paste function:
paste("DFrame","[,"ColTwo"]",sep="")
Thank you very much for the input,
Kind regards
You can do that using the following syntax:
get("DFrame")[,"ColTwo"]
You can use paste() in both of these strings, for example:
get(paste("D", "Frame", sep=""))[,paste("Col", "Two", sep="")]
Edit: Despite someone downvoting this answer without leaving a comment, this does exactly what the original poster asked for. If you feel that it does not or is in some way dangerous, I would encourage you to leave a comment.
Stop trying to use paste and get entirely.
The whole point of having a list (of data frames, say) is that you can reference them using names:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
#A list of data frames
l <- list(DFrame,DFrame)
#The data frames in the list can have names
names(l) <- c("DF1",'DF2')
# Now you just use `[[`
> l[["DF1"]][["ColOne"]]
[1] 1 0
> l[["DF1"]][["ColTwo"]]
[1] Yes No
Levels: No Yes
If you have to, you can use paste to construct the indices passed inside [[.

How to call expression result after paste command in R?

I want to get a cell value after dynamically passing its address. So I am trying paste command to join the address of the cell like following:
paste0("DT1$", eval(cols[1]),"[1]")
where DT1 is datatable, cols[1] is refering to 1 column and [1] is first row of that column. While running this I am getting the string(address of the cell):
> paste0("DT1$", eval(cols[1]),"[1]")
[1] "DT1$BCC1[1]"
But I want the value of the cell like if I run:
> DT1$BCC1[1]
[1] 0
So how to run call the result of the paste expression to get value of cell like "0" in previous example. I tried eval() and do.call(), but nothing seems to be working. I am sorry for this basic question as I am new to R. Any help is really appreciated.
You can use eval(), but you have to parse the string "DT1$BCC1[1]" first:
str <-paste0("DT1$", eval(cols[1]),"[1]")
eval(parse(text = str))
The $ dollar is suitable for console use(partial name matching). You should Use the subsetting [ operator.
For example you can call it like this :
DT1[1,cols[1]]
Ore more general :
x= 1
y = "BCC1"
DT1[x,y]
Note that DT1 that here is a data.frame not a data.table. You can do the same thing with a data.table:
DT1[x,y,with=F]

Variable cell specification for a csv in R

I wish to use a variable to specify a particular cell in a csv file. I can use:
emp1 <- read.csv("C:/Database/data/emp1.csv",as.is=TRUE)
numberofemployee <- 1
> emp1["1", "X.name"]
[1] "ALEX"
but if I use:
> emp1["numberofemployee", "X.name"]
[1] NA
I assume R is looking for numberofemployee as a column header.
How do I get it to see it as an integer so I can specify my cells?
csv file
#name,mon,tue,wed,thu,fri
ALEX,98,95,73,88,18
BRAD,66,25,72,8,32
JOHN,22,41,78,43,36
The problem is that you pass strings to the []. This works best when referring to row and columnnames. In case of using "1", R probably makes an educated guess and converts the "1" to a 1 (numeric). However, in case of you passing the name of a variable, R cannot do anything else than assume that you are trying to extract something from the numberofemployee column. If you want to use the content of numberofemployee, you need to omit the ". R will then interpret that as an R object, whose content you want to use:
emp1[numberofemployee, "X.name"]

Resources