Subset of values in result from RODBC - r

I select some values from a Database in R with RODBC like
library(RODBC)
dbhandle <- odbcDriverConnect('driver={SQL Server};server=mydatabase, ...')
res <- sqlQuery("select id, class, param1, param2 from table1 ..."
For Analysis of the Data I need to select a subset of the data. I got the column class which is a varchar and defines some subclasses like set1 or set2.
For example, I need summary() for both sets, and then for each set. I would say that this is done by:
summary(res) # works fine
summary(res[res["class"] == 'set1']) # does not work
summary(res[res["class"] == 'set2']) # does not work
Because I get this instead:
Length Class Mode
10788 character character
After filtering I have the data as a long list and not as matrix. What is wrong there?

zx8754 answer shows you what wrong in your code. Another way of getting it done is to use subset function:
summary(subset(res, class == 'set2'))

Try this:
summary(res[res[,"class"] == "set1",])
Update:
res[row,column] - genrally 1st value is row index, 2nd value is column index, so:
res[,"class"] - select "class" column from res.
res[,"class"] == "set1" - compare "class" column values with string "set1", this will give TRUE, FALSE values.
res[res[,"class"] == "set1",] - TRUE, FALSE values define which rows to return.

Related

Creating a function to extract a value from a data frame in R

I have a data frame named data, which contains names of cryptos and their values. I also have the following function:
for(i in 1) {
x <- data$id[i] #id is the column with crypto names (such as "bitcoin")
c1 <- data[data$id == x,5] #5 is the column with values
}
c1 #returns the value of a crypto
For i in 1, it returns the value of the first crypto, for i in 2 value of the second crypto, and so on.
I would like to create a function that would return the value of any crypto-currency from the list by entering its name (id), so that it would work like this:
function("bitcoin")
#Returns the value of a crypto just like c1 above
Please, let me know how can I possibly do this.
A simple solution with some toy data:
crypto <- data.frame(name=c("bitcoin","etherium","solana","polcadot","elrond"),value=c(50000,2500,34,27,147))
crypto_values <- function(cryptoName){
value <- crypto[crypto$name==cryptoName,"value"]
return(value)
}
crypto_values("bitcoin")
[1] 50000
You have to be sure to write the right name as input.
Remember R is case-sensitive, so "bitcoin"! = "Bitcoin".
Also, note that this function is very very simple in this form, but with little effort, it can be improved so that it is not case-sensitive for example, or to make it more generic

How to remove the first row from multiple dataframes?

I have multiple dataframes and would like to remove the first row in all of them.
I have tried using a for loop but cannot understand what I am doing wrong
for (i in cities){
i <- i[-1, ]
}
I get the following error code:
Error in i[-1, ] : incorrect number of dimensions
If we assume that the only objects in your workspace are dataframes then this might succeed:
cities <- objects() )
for (i in cities) { assign(i, get(i)[-1,])}
Explanation:
Two thing wrong with original codes:
One was already mentioned in comments. "df" is not the same as df. You need to use get to convert a character value to a "true" R name that is used to retrieve an object having that name. The result of object() is only a character value. In R the term "name" means a "language object". See the help page: ?mode. (There is potential confusion about rownames and columnnames which are always "character"-class.) It's not like SAS which is a macro language that has no such distinction.
The second error was trying to get substitution for the i on the left-hand side of <-. The would have failed even if you were working with actual R names. The assign function is designed to handle character values that are then converted to R names.
say you get a list of all the tables in your environment, and you call that list cities. You can't just iterate over each value of cities and change things, because in the list they are just characters.
Here is what you need:
for (i in cities){
tmp <- get(i) # load the actual table
tmp <- tmp[-1, ] # remove first column
assign(i, tmp) # re-assign table to original table name
}

Cannot assign column name by indexing within function argument

I am learning R, and I am trying to understand the indexing properties. I cannot seem to understand why the following code to change a column name does not work:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# why the following row does not work?
names(state.all["States"]) <- "Test"
colnames(state.all)
While this works:
state.all <- as.data.frame(state.x77)
head(state.all)
state.all$States <- rownames(state.all)
rownames(state.all) <- NULL
# This work
names(state.all)[which(colnames(state.all)=="States")] <- "Test"
colnames(state.all)
Shouldn't the function be able to overwrite the name of the column also in the first example? Is it something to do with the local vs. global environment?
Thanks in advance!
What you're trying to do is replacing the name of column number 9.
the expression which(colnames(state.all)=="States") results in the index if the column named "States" (if there is any) and then takes this index and replaces the value in the names vector.
the expression state.all["States"] just returns the values of this column so of course nothing will happen.
I suggest something like colnames(state.all)[which(colnames(state.all)=="States")] <- "Test".

How to conditionally replace values in r data frame using if/then statement

I'd like to learn how to conditionally replace values in R data frame using if/then statements. Suppose I have a data frame like this one:
df <- data.frame(
customer_id = c(568468,568468,568468,485342,847295,847295),
customer = c('paramount','paramount','paramount','miramax','pixar','pixar'));
I'd like to do something along the lines of,
"if customer in ('paramount','pixar') make customer_id 99. Else do nothing". I'm using this code, but it's not working:
if(df$customer %in% c('paramount','pixar')){
df$customer_id == 99
}else{
df$customer_id == df$customer_id
}
I get a warning message such as the condition has length > 1 and only the first element will be used. And the values aren't replaced.
I'd also like to know how to do this using logical operators to perform something like,
"if customer_id >= 500000, replace customer with 'fox'. Else, do nothing.
Very easy to do in SQL, but can't seem to figure it out in R.
My sense is that I'm missing a bracket somewhere?
How do I conditionally replace values in R data frame using if/then statements?
You can use ifelse, like this:
df$customer_id <- ifelse(df$customer %in% c('paramount', 'pixar'), 99, df$customer_id)
The syntax is simple:
ifelse(condition, result if TRUE, result if FALSE)
This is vectorized, so you can use it on a dataframe column.
You are using == instead of =(Assignment Operator) in if block. And I dont think there's need of else block in your example as you are not going to change values
if(df$customer %in% c('paramount','pixar')){
df$customer_id = 99
}
Above code will do the job for you

R: add column to dataframe, named based on formula

More 'feels like it should be' simple stuff which seems to be eluding me today. Thanks in advance for assistance.
Within a loop, that's within a function, I'm trying to add a column, and name it based on a formula.
I can bind a column & its name is taken from the bound object: data<-cbind(data,bothdata)
I can bind a column & manually name the bound object: data<-cbind(data,newname=bothdata)
I can bind a column which is the product of an equation & manually name the bound object: data<-cbind(data,newname2=bothdata-1)
Or another way: data <- transform(data, newColumn = bothdata-1)
What I can't do is have the name be the product of a formula. My actual formula-derived example name is paste("E_wgt",rev(which(rev(Esteps) == q))-1,"%") & equation for column: baddata - q.
A simpler one: data<-cbind(data,paste("magic",100,"beans")=bothdata-1). This fails because cbind isn't expecting the = even though it's fine in previous examples. Same fail for transform.
My first thought was assign but while I've used this successfully for creating forumla-named objects, I can't see how to get it to work for formula-named columns.
If I use an intermediary step to put the naming formula in an object container then use that, e.g.:
name <- paste("magic",100,"beans")
data<-cbind(data,name=bothdata-1)
the column name is "name" not "magic100beans". If I assign the equation result to an formula-named object:
assign(paste("magic",100,"beans"),bothdata-1)
Then try to cbind that via get:
data<-cbind(data,get(paste("magic",100,"beans")))
The column is called "get(paste("magic",100,"beans"))". Boo! Any thoughts anyone? It occurs to me that I can do cbind then separately colnames(data)[ncol(data)] <- paste("magic",100,"beans")) which I guess I'll settle for for now, but would still be interested to find if there was a direct way.
Thanks.
Chances are that cbind is overkill for your use case. In almost every instance, you can simply mutate the underlying data frame using data$newname2 <- data$bothdata - 1.
In the case where the name of the column is dynamic, you can just refer to it using the [[ operator -- data[["newcol"]] <- data$newname + 1. See ?'[' and ?'[.data.frame' for other tips and usages.
EDIT: Incorporated #Marek's suggestion for [["newcol"]] instead of [, "newcol"]
It may help you to know that data$col1 is the same than data[,"col1"] which is the same than data[,x] if x is "col1". This is how I usually access/set columns programmatically.
So this should work:
name <- paste("magic",100,"beans")
data[,name] <- obsdata-1
Note that you don't have to use the temporary variable name. This is equivalent to:
data$magic100beans <- obsdata-1
Itself equivalent, for a data.frame, to:
data<-cbind(data, magic100beans=bothdata-1)
Just so you know, you could also set the names afterwards:
old_names <- names(data)
name <- paste("magic",100,"beans")
data <- cbind(data, bothdata-1)
data <- setNames(data, c(old_names, name))
# or
names(data) <- c(old_names, name)

Resources