Looping through R dataframe columns - r

I want to loop a dataframe columns and use them for something else (in this case, performing a chi-squared test on all my features.
for(i in (1:ncol(wdbc))){
wdbc[,i]
chisq.test(wdbc$diagnosis,wdbc[,i])
}
I've tried referring to the features in all kinds of ways, for example:
chisq.test(wdbc$diagnosis,wdbc[i]) ##looping through colnames(wdbc)
or
chisq.test(wdbc$diagnosis,wdbc$i) ##looping through colnames(wdbc)
but can't seem to solve the problem.

wdbc[i] will return a dataframe (rather than a vector), and wdbc$i doesn't work to loop through column names.
wdbc[,i] should work if wdbc is actually a dataframe. However, I've encountered an error in this type of situation before when my dataframe is not actually a dataframe but a tibble. The issue is that wdbc[,i] will still be a tibble rather than a vector. Try converting it to a dataframe with as.data.frame(wdbc).

Related

R: combining data-frames row-wise when some variables are lists/dfs?

Is there a function in R that would let me combine/concatenate data-frames when some variables are either lists or data frames themselves? I've tried rbind(), rbindlist(), rbind.data.frame and bind_rows and they are all throwing out errors, e.g. duplicate 'row.names' are not allowed or Argument 4 can't be a list containing data frames.
After looking into it a bit, it seems that none of those functions support nested data-frames. Is there a function that would work for me? Or is there something (other than a for-loop that adds row by row) that I could do?
As a bit of a background, I'm making API-calls to a database and can get only 40 results at a time so I am looping through those via multiple calls, and I want to combine the results without any loss of information. I am using jsonlite:fromJSON to convert to a df: could I/should I combine the info in JSON format first and then convert to a df?

Transform Dataframe to vector in R

I am attempting to pull a table from SQLServer and convert it to a vector in R.
I use sqlQuery() to return the table, which looks to be returned as a dataframe. I am curious, can I change all the values in this dataframe to be a vector?
I am currently using as.vector(nameofdataframe), which converts it to a list. I find that if I use as.vector(dataframe$column), it returns a vector, but I have many columns and I feel like there should be a much more simple way.
I was able to figure it out. If you take the data frame resulting from a sqlQuery() you need to use as.matrix first and then as.vector to the resulting matrix. Thank you all for your help.

R data frame issue - non-numeric headers

This is definitely a rookie question but I'm not finding an answer for this (maybe because of my wording) so here goes:
I'm reading a data frame into R studio (csv file) that has 24 columns with headers. There are only numbers in these columns (they're essentially concentrations of several chemicals). It's called all. I need to use them as numeric vectors. When I read them in and type
is.numeric(all[,1])
I get
TRUE
When I type
is.numeric(all[1])
I get
FALSE
I think this is because R interprets the header as a factor. I also tried reading in a table without headers and with headers=FALSE, but R renames it to V1, V2 etc so the result ends up being the same.
I need to work with functions where I invoke something like all[2:24]. How can I go about to make R either "not see" the header or remove it altogether?
Thanks for the answers!
PS: the dataframe I am using (without headers - if it had headers, it would just have names instead of V1, V2, etc) is something like this:
This is a subset from the first column, not the first row.
all[,1]) #subset first column
The following is subset of first row
all[1,]) #subset first row (headers of df not included)
To give columnames
colnames(all) <- c("col1","col2")
Your assumption is wrong. You have a data.frame and all[1] does list subsetting, which results in a data.frame, which is not a vector, and not a numeric vector in particular.
You should study help("[") and An Introduction to R.

Multiple inputs to a function

I might just be missing a very obvious solution, but here's my question:
I have a function that takes a few inputs. I'm generating these inputs by getting values from a dataframe. What's the cleanest way to input my values into the function?
Let's say I have the function
sampleFunction<-function(input1, input2, input3){
return((input1+input2)-input3)
}
And an input that consists of a few columns of a row of a dataframe
sampleInput <- c(1,2,3)
I'd like to input the three values of my sampleInput into my sampleFunction. Is there a cleaner way than just doing
sampleFunction(sampleInput[1], sampleInput[2], sampleInput[3])
?
I would consider using the package data.table.
It doesn't directly answer the question of how to pass in a vector, however it does help address the greater contextual question of using the function for the rows in a table in a typing-efficient manner. You could do it in data.frame but your code would still look similar to what you're trying to avoid.
If you made your data.frame a data.table then your code would look like:
library(data.table)
sampleFunction<-function(input1, input2, input3){
return((input1+input2)-input3)
}
mydt[,sampleFunction(colA,colB,colC)] # do for all rows
mydt[1,sampleFunction(colA,colB,colC)] # do it for just row 1
You could then add that value as a column, return it independently etc
try
do.call(sampleFunction, sampleInput)
P.S sampleInput must be a list. If it is vector use as.list
do.call(sampleFunction, as.list(sampleInput))

How to order a matrix by all columns

Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.

Resources