how to turn values in dataframe into objects - r

For a function that I am writing, the output is a dataframe. But how do i assign the values that are in one of the columns of my dataframe to objects?
For example, if I have 2 vectors that I cbind into a dataframe
>numbers<-c(33, 44, 55, 66)
>names<-c("A", "B", "C", "D")
>MYdataframe<-data.frame(cbind(names, numbers))
I will get this:
>MYdataframe
names numbers
1 A 33
2 B 44
3 C 55
4 D 66
But how do I assign the numbers (e.g. 33) to objects (e.g. A)

It does not look like a very good idea: your function would be assigning variables in the global environment, or in its parent environment, instead of returning something. If you want to return several values, you can put them in a named list, e.g., list(A=3.14, B=2.71), or a vector if they all have the same type (they do, if you can put them in a data.frame).
In addition, in your example, cbind converts the numbers into factors: I am not sure this is intentional.
However, if you really insist, this can be done with assign.
library(plyr)
d_ply( MYdataframe, "names", function(u)
assign( as.character(u$names[1]), u$numbers, envir=.GlobalEnv)
)

If you really wanted to use the character values as names and the numeric values as "names' for a numeric vector then this would do it:
names(numbers) <- names
numbers
# A B C D
#33 44 55 66
numbers["A"]
# A
# 33
Maybe you should say what you really want, as well as choosing names for your objects that are not function names (names is a function) will help us keep things sorted out in our heads.

Related

Print column name in an r apply and save as new column on a dataframe

I've written an apply where I want to 'loop' over a subset of the columns in a dataframe and print some output. For the sake of an example I'm just transforming based on dividing one column by another (I know there are other ways to do this) so we have:
apply(df[c("a","b","c")],2,function(x){
z <- a/df[c("divisor")]
}
)
I'd like to print the column name currently being operated on, but colnames(x) (for example) doesn't work.
Then I want to save a new column, based on each colname (a.exp,b.exp or whatever) into the same df.
For example, take
df <- data.frame(a = 1:3, b = 11:13, c = 21:23)
I'd like to print the column name currently being operated on, but
colnames(x) (for example) doesn't work.
Use sapply with column indices:
sapply(seq_len(ncol(df)), function(x) names(df)[x])
# [1] "a" "b" "c"
I want to save a new column, based on each colname (a.exp,b.exp or
whatever) into the same df.
Here is one way to do it:
(df <- cbind(df, setNames(as.data.frame(apply(df, 2, "^", 2)), paste(names(df), "sqr", sep = "."))))
# a b c a.sqr b.sqr c.sqr
# 1 1 11 21 1 121 441
# 2 2 12 22 4 144 484
# 3 3 13 23 9 169 529
I think a lot of people will look for this same issue, so I'm answering my own question (having eventually found the answers). As below, there are other answers to both parts (thanks!) but non-combining these issues (and some of the examples are more complex).
First, it seems the "colnames" element really isn't something you can get around (seems weird to me!), so you 'loop' over the column names, and within the function call the actual vectors by name [c(x)].
Then the key thing is that to assign, so create your new columns, within an apply, you use '<<'
apply(colnames(df[c("a","b","c")]),function(x) {
z <- (ChISEQCIS[c(paste0(x))]/ChISEQCIS[c("V1")])
ChISEQCIS[c(paste0(x,"ind"))] <<- z
}
)
The << is discussed e.g. https://stackoverflow.com/questions/2628621/how-do-you-use-scoping-assignment-in-r
I got confused because I only vaguely thought about wanting to save the outputs initially and I figured I needed both the column (I incorrectly assumed apply worked like a loop so I could use a counter as an index or something) and that there should be same way to get the name separately (e.g. colname(x)).
There are a couple of related stack questions:
https://stackoverflow.com/questions/9624866/access-to-column-name-of-dataframe-with-apply-function
https://stackoverflow.com/questions/21512041/printing-a-column-name-inside-lapply-function
https://stackoverflow.com/questions/10956873/how-to-print-the-name-of-current-row-when-using-apply-in-r
https://stackoverflow.com/questions/7681013/apply-over-matrix-by-column-any-way-to-get-column-name (easiest to understand)

Both extract and create names for columns within a matrix

I have a matrix with columns denoting 30 different frequency windows and rows denoting dates. I would like to extract each column and assign a variable to each resulting vector and have the name of that variable be the name of that frequency window (which I have in center values, so I'd like to name each variable something like f100). What is the best way to write a loop to both extract and name each variable?
Thanks!
If you want to create 30 variables in the global environment from the columns of the matrix, you could use list2env or assign (I would probably keep it together in a matrix/dataframe or even in a list and do all the necessary operations rather than cluttering the global environment with lots of variables).
list2env(lapply(as.data.frame(mat), function(x) x), envir=.GlobalEnv)
# <environment: R_GlobalEnv>
f1
#[1] 37 38 12 34 26 21 30 6 27 29
data
set.seed(42)
mat <- matrix(sample(1:40, 30*10, replace=TRUE), ncol=30,
dimnames=list(NULL, paste0("f", 1:30)))

How can I specify which columns to select using read.table in R

I have a dataset with 100 columns and it doesn't have a header.
I have an int vector that consists of some numbers ranges between 1 to 100. For example, a vector with "2 5 62 78".
Now when I read the dataset using read.table, all I want is to select column 2, 5, 62 and 78 from the dataset. How can I do that? Many thanks.
What you want is the option colClasses of read.table() (and the derivative functions). It allows you to pass a character vector with the classes of each column in the data. If you set that to "NULL" the column will be skipped. You can set the whole thing to "NULL" and then only change the ones you want to import (based on their class).
Proof of concept below.
cc <- rep('NULL', 100) ## skip all 100 columns
cc[c(2, 5)] <- 'integer' ## 2 and 5 are integer
cc[c(62, 58)] <- 'character' ## 62 and 58 will be imported as character
df <- read.csv('really-wide-data.csv', colClasses=cc)

R applying to a line

I have a data frame that contains multiple rows and multiple columns.
I have a character vector that contains the names of some of the columns in the data frame. The number of columns can vary.
For each line, for each of these columns, I have to identify if one of them is not NA. (basically any(!is.na(df[namecolumns])) for each line), to then do a subset for the ones that are TRUE.
Actually, any(!is.na(df[1,][namescolumns])) works well, but it's only for the first line.
I could easily do a for loop, which is my first reflex as a programmer and because it works for the first line, but I'm sure it's not the R way and that there is a way to do this with an "apply" (lapply, mapply, sapply, tapply or other), but I can't figure out which one and how.
Thank you.
try using apply over the first dimension (rows):
apply(df, 1 function(x) any(!is.na(x[namescolumns])))
The results will come back transposed, and so, you might want to wrap the whole statement inside of t(.)
You can use a combination of lapply and Reduce
has.na.in.cols <- Reduce(`&`, lapply(colnames, function (name) !is.na(df[name])))
to get a vector of whether or not there are NA values in any of the columns in colnames, which can in turn be used to subset the data.
df[has.any.na,]
For example. Given:
df <- data.frame(a = c(1,2,3,4,NA,6,7),
b = c(2,4,6,8,10,12,14),
c = c("one","two","three","four","five","six","seven"),
d = c("a",NA,"c","d","e","f","g")
)
colnames <- c("a","d")
You can get:
> df[Reduce(`&`, lapply(colnames, function (name) !is.na(df[name]))),]
a b c d
1 1 2 one a
3 3 6 three c
4 4 8 four d
6 6 12 six f
7 7 14 seven g

aaply fails on a vector

I am trying to understand how to use the excellent plyr package's commands on a vector (in my case, of strings). I suppose I'd want to use aaply, but it fails, asking for a margin. But there aren't columns or rows in my vector!
To be a bit more concrete, the following command works, but returns results in a wierd list. states.df is a data frame, and region is the name of the state (returned using Hadley's map_data("state") command). Thus, states.df$region is a vector of strings (specifically, state names). opinion.new is a vector of numbers, named using state names.
states.df <- map_data("state")
ch = sapply(states.df$region, function (x) { opinion.new[names(opinion.new)==x] } )
What I'd like to do is:
ch = aaply(states.df$region, function (x) { opinion.new[names(opinion.new)==x] } )
Where ch is the vector of numbers looked up or pulled from opinion.new. But aaply requires an array, and fails on a vector.
Thanks!
If you want to use plyr on a vector, you have to use l*ply, as follows:
v <- 1:10
sapply(v, function(x)x^2)
[1] 1 4 9 16 25 36 49 64 81 100
laply(v, function(x)x^2)
[1] 1 4 9 16 25 36 49 64 81 100
In other words, sapply and laply are equivalent

Resources