Data frame creation with dynamically assigned variable names - r

I'd like to assign a value to a variable the name of which is determined on the fly, and then assign that variable to a column of a data frame, something like:
x = rnorm(10)
y = 'z'
data.frame(assign(y, x))
While assign(y, x) creates z with the right values, it fails to name the data frame's column "z".

Based on the OP comment, the solution would be:
#Code
assign(y, x)
For the other issue you can try:
#Code2
df <- data.frame(assign(y, x))
names(df)[1] <- y
Output:
df
z
1 -0.5611014
2 -2.2370362
3 0.9037152
4 -1.1543826
5 0.4997336
6 -0.4726948
7 -0.6566381
8 1.0173725
9 -0.5230326
10 -0.9362808

Related

partial match a dataframe column to pick rows of interest

Say,
I create a dataframe as:
dataframe <- data.frame("x" = c("aaa/bbb", "ccc", "ddd/eee/fff"),
"y" = c(9,2,1),
"z" = c(7,5,8))
and another dataframe as
list <- data.frame("m" = c("ccc"))
then I can select the matches rows from first dataframe as:
result<-merge(list,dataframe,by.x= "m",by.y="x")
but how can I match when my list dataframe is:
list <- data.frame("m" = c("fff","bbb"))
I am looking for a results like:
x y z
aaa/bbb 9 7
ddd/eee/fff 1 8
Thanks.
I think it's not a merge issue but a filter one. You can try this:
df1[grep(paste(df2$m, collapse = "|"), df1$x), ]
# x y z
# 1 aaa/bbb 9 7
# 3 ddd/eee/fff 1 8
It's not a good habit to assign variables with existing object or function names. So I change your dataframe and list to df1 and df2.

How to look for uniques in other column relatively assign ids

I have a toy example to explain what I am trying to work on :
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
I managed to do assigning unique ids to column y and now output looks like:
aski2 = data.frame(x=c("a","b","c","a","d","d"),y=c("1","2","3","2","1","4"))
as you see "b" is present in both col x and y and we assigned an id=1 in col y
and "a" with id=2 in col y and so on..
As you see these values are also present in col x.....
col x has "a" as its first element ."a" was also in col y and assigned an id=2
so I'll assign an id=2 for a in col x also
Now what i m trying to do next is look for these values in col x and if it occurs in col y I assign that id to it
FINAL DATAFRAME LIKE
aski3 = data.frame(x=c("2","1","4","2","3","3"),y=c("1","2","3","2","1","4"))
Without the need to create aski2 as an intermediate, a possible solution is to use match with lapply to get the numeric representations of the letters:
# create a vector of the unique values in the order
# in which you want them assigned to '1' till '4'
v <- unique(aski$y)
# convert both columns to integer values with 'match' and 'lapply'
aski[] <- lapply(aski, match, v)
which gives:
> aski
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
If you want the number as characters, you can additionally do:
aski[] <- lapply(aski, as.character)
First, convert both columns to character vectors.
Then, collect all unique values from the two columns to use as levels of a factor.
Convert both columns to factors, then numeric.
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
aski$x <- as.character(aski$x)
aski$y <- as.character(aski$y)
lev <- unique(c(aski$y, aski$x))
aski$x <- factor(aski$x, levels=lev)
aski$y <- factor(aski$y, levels=lev)
aski$x <- as.numeric(aski$x)
aski$y <- as.numeric(aski$y)
aski
A solution from dplyr. We can first create a vector showing the relationship between index and letter as vec by unique(aski$y). After this step, you can use Jaap's lapply solution, or you can use mutata_all from dplyr as follows.
# Create the vector showing the relationship of index and letter
vec <- unique(aski$y)
# View vec
vec
[1] "b" "a" "d" "c"
library(dplyr)
# Modify all columns
aski2 <- aski %>% mutate_all(funs(match(., vec)))
# View the results
aski2
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
Data
aski <- data.frame(x = c("a","b","c","a","d","d"),
y = c("b","a","d","a","b","c"),
stringsAsFactors = FALSE)

Functionalizing my code - calling a externally created data frame into function

I have a data frame called "Region_Data" which I have created by performing some functions on it.
I want to take this data frame called "Region_Data" and use it an input and I want to subset it using the following function that I created. The function should produce the subset data frame:
Region_Analysis_Function <- function(Input_Region){
Subset_Region_Data = subset(Region_Data, Region == "Input_Region" )
Subset_Region_Data
}
However, when I create this function and then execute it using:
Region_Analysis_Fuction("North West")
I get 0 observations when I execute this code (though I know that there are xx number of observations in the data frame.)
I read that there is something called global / local environment, but I'm not really clear on that.
How do I solve this issue? Thank you so much in advance!!
When you try to subset your data using subset(Region_Data, Region == "Input_Region" ), "Input_Region" is being interpreted as a string literal, rather than being evaluated to the value it represents. This means that unless the column Input_Region in your object Region_Data contains some rows with the value "Input_Region", your function will return a zero-row subset. Removing the quotes will solve this, and changing == to %in% will make your function more generalized. Consider the following data set,
mydf <- data.frame(
x = 1:5,
y = rnorm(5),
z = letters[1:5])
##
R> mydf
x y z
1 1 -0.4015449 a
2 2 0.4875468 b
3 3 0.9375762 c
4 4 -0.7464501 d
5 5 0.8802209 e
and the following 3 functions,
qfoo <- function(Z) {
subset(mydf, z == "Z")
}
foo <- function(Z) {
subset(mydf, z == Z)
}
##
bar <- function(Z) {
subset(mydf, z %in% Z)
}
where qfoo represents the approach used in your question, foo implements the first change I noted, and bar implements both changes.
The second two functions will work when the input value is a scalar,
R> qfoo("c")
[1] x y z
<0 rows> (or 0-length row.names)
##
R> foo("c")
x y z
3 3 0.9375762 c
##
R> bar("c")
x y z
3 3 0.9375762 c
but only the third will work if it is a vector:
R> foo(c("a","c"))
x y z
1 1 -0.4015449 a
Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(z, Z) :
longer object length is not a multiple of shorter object length
##
R> bar(c("a","c"))
x y z
1 1 -0.4015449 a
3 3 0.9375762 c

Replicate variable based off match of two other variables in R

I've got a seemingly simple question that I can't answer: I've got three vectors:
x <- c(1,2,3,4)
weight <- c(5,6,7,8)
y <- c(1,1,1,2,2,2)
I want to create a new vector that replicates the values of weight for each time an element in x matches y such that it produces the following new weight vector associated with y:
y_weight <- c(5,5,5,6,6,6)
Any thoughts on how to do this (either loop or vectorized)? Thanks
You want the match function.
match(y, x)
to return the indicies of the matches, the use that to build your new weight vector
weight[match(y, x)]
#Using plyr
library(plyr)
df<-as.data.frame(cbind(x,weight)) # converting to dataframe
df<-rename(df,c(x="y")) # rename x as y for joining dataframes
y<-as.data.frame(y) # converting to dataframe
mydata <- join(df, y, by = "y",type="right")
> mydata
y weight
1 1 5
2 1 5
3 1 5
4 2 6
5 2 6
6 2 6

Reorganizing Lists of data.frames

Let's say I have a list of data frames. Where each data frame has columns like this:
lists$a
company, x, y ,z
lists$b
company, x, y, z
lists$c
company, x, y, z
Any thoughts on how I mean change it to something like:
new.list$company
a,x,y,z
b,x,y,z
c,x,y,z
new.list$company2
a,x,y,z
b,x,y,z
c,x,y,z
I've been using:
new.list[[company]] <- ldply(lists, subset, company=company.name)
But this only does one at a time. Is there a shorter way?
Brandon,
You can use the | parameter in cast to create lists. Using the data.frame from #Wojciech:
require(reshape)
dat.m <- melt(dat_1, "company")
cast(dat.m, L1 ~ variable | company)
Here's a way using the plyr package: start with #wojciech's dat_l and put the whole thing in a single data-frame using ldply:
require(plyr)
df <- ldply(dat_l)
and then turn it back into a list by splitting on the company column:
new_list <- dlply(df, .(company), subset, select = c(.id,x,y,z) )
> new_list[1:3]
$C
.id x y z
3 a 3 0.7209484 1.6247163
35 i 3 0.1630658 0.2158516
37 j 1 0.8779915 -0.9371671
$G
.id x y z
2 a 2 0.1132311 -1.8067876
10 c 2 0.1825166 1.8355509
28 g 4 0.6474877 -0.8052137
$H
.id x y z
1 a 1 0.9562020 -1.450522
25 g 1 0.1322886 0.584342
Example data
dat_l <- lapply(1:10,function(x) data.frame(x=1:4,y=rexp(4),
z=rnorm(4),company=sample(LETTERS,4)))
names(dat_l) <- letters[1:10]
Code
Nrec <- unlist(lapply(dat_l,nrow))
dat <- do.call(rbind,dat_l)
dat$A <- rep(names(Nrec),Nrec)
dat_new <- split(dat[-4],dat$company)

Resources