I am trying to subset a data frame using a column names stored in an object. Is this possible? Here is an example:
ReallyLongColNameA <- c(1,2,3,4,5,6)
ReallyLongColNameB <- c(6,5,4,3,2,1)
ReallyLongColNameC <- c(7,8,9,10,11,12)
X <- data.frame(ReallyLongColNameA, ReallyLongColNameB, ReallyLongColNameC)
can i store a column name as such:
ShortColNameB <- names(X[2])
and then subset using the column name stored in object ShortColNameB
I can subset the following:
subX <- X[X$ReallyLongColB == 6,]
To get:
ReallyLongColA ReallyLongColB ReallyLongColC
1 6 7
But what if I wanted the following desired output by using the column name stored in an object (ShortColNameB)?:
ReallyLongColA ReallyLongColB
1 6
You can easily remove the last column by subsetting on column numbers.
X[X[[ShortColNameB]]==6,c(1,2)]
You define what rows you want by filtering on the ==6 for ShortColNameB, and you define the columns you want by selecting the numbers (e.g. 1st and 2nd column, A & B).
Related
I have a dataframe dataselect that tells me what dataframe to use for each case of an analysis (let's call this the relevant dataframe).
The case is assigned dynamically, and therefore which dataframe is relevant depends on that case.
Based on the case, I would like to assign the relevant dataframe to a pointer "relevantdf". I tried:
datasetselect <- data.frame(case=c("case1","case2"),dataset=c("df1","df2"))
df1 <- data.frame(var1=letters[1:3],var2=1:3)
df2 <- data.frame(var1=letters[4:10],var2=4:10)
currentcase <- "case1"
relevantdf <- get(datasetselect[datasetselect$case == currentcase,"dataset"]) # relevantdf should point to df1
I don't understand if I have a problem with the get() function or the subsetting process.
You are almost there, the problem is that the dataset column from datasetselect is a factor, you just need to convert it to character
You can add this line after the definition of datasetselect:
datasetselect$dataset <- as.character(datasetselect$dataset)
And you get your expected output
> relevantdf
var1 var2
1 a 1
2 b 2
3 c 3
I have this sample code to create a new data frame 'new_data' from the existing data frame 'my_data'.
new_data = NULL
n = 10 #this number correspond to the number of rows in my_data
conditions = c("Bas_A", "Bas_T", "Oper_A", "Oper_T") # the vector characters correspond to the target column names in my_data
for (cond in conditions){
for (i in 1:n){
new_data <- rbind(new_data, c(cond, my_data$cond[i]))
}
}
The problem is that my_data$cond (where cond is a variable, and not the column name) is not accepted.
How can I call a column of a data frame by using, after the dollar sign, a variable value?
To access a column, use:
my_data[ , cond]
or
my_data[[cond]]
The ith row can be accessed with:
my_data[i, ]
Combine both to obtain the desired value:
my_data[i, cond]
or
my_data[[cond]][i]
I guess you need get().
For example,
get(x,list), where list is the list and x is the variable(can be a string), which equals list$x.
But in get(x,list), x can be a variable while using $, x cannot be a variable.
$ works on columns, not individual column objects. It's a form of vectorization. The code
corrections$BookDate = as.Date(corrections$BookDate, format = "%m/%d/%Y")
converts the contents of the BookDate column of the corrections table from strings to Date objects. It performs it in one operation, assignment.
Do the following and it will fix your problem:
new_data <- rbind(new_data, c(cond, my_data$cond))
So i have a excel file with 5 columns and 100 rows. I import this to R.
I want to make unique list vector for each of the rows. Each vector would then contain 5 elements.
My issue is how do i make R to automatically assigns 100 unique variable names and assign each row elements to those variables? I don't want to manually assign variable names to each row.
You can use the split function for that. An example:
# creating a data.frame
df <- data.frame(x=gl(2,10, labels=c("t","c")), y=runif(20))
# splitting the dataframe df in seperate dataframes
lst <- split(df, 1:nrow(df))
This will create a list of dataframes lst. You can access the separate dataframes as follows:
> lst[1]
$`1`
x y
1 t 0.971842
A slightly alternative approach:
# creating a data.frame
set.seed(1)
df <- data.frame(x=rnorm(20), y=runif(20))
# creating a unique value for each row
df$unique <- paste0("u",seq_len(20))
# splitting the dataframe df in seperate dataframes
lst <- split(df, df$unique)
this gives for example:
> lst$u11
x y unique
11 1.511781 0.4776196 u11
I have a data frame and I want to extract the rows where particular columns have a particular value. The column names are stored in a character array and the values are stored in a list.
data <- data.frame(A=c("a","b","b"), B=c(1,2,2), C=(3,3,4))
column_key <- c("A", "B")
value_key <- list("b", 2)
Obviously, I can extract the information I want by simple indexing if I hardcode the column names of the keys:
desired_rows <- data[data$A=="b" & data$B==2,]
desired_rows =
A B C
2 b 2 3
3 b 2 4
But how do I do this if the column names are stored in variables. Ideally, it would be something like this:
key <- value_key
names(key) <- column_key
desired_rows <- data[key,]
But I cannot index a data.frame with a list.
I found this trick just before posting the question.
I can compare a data frame to a list that has the same length as a row which returns a logical matrix indicating which element in each row matches the corresponding element in the list. Because I want to find rows that match entirely, I apply the all function across the rows to get a logical index into the rows of data.
desired_rows <- data[apply(data[column_key]==value_key, 1, all),]
I have a dataframe and a vector. The vector has about about 20 string values, which correspond to part of the column names in the dataframe. The dataframe has several hundred column names. I have to subset the dataframe based upon the partial column names present in the vector.
For example, if one of the column names in the dataframe is GRP20R.45.M, one of the values in the vector will be GRP20R
Thanks
Assuming that v.names is your vector of names, you can use grepl to filter using and aggregating pattern:
patt <- sub(',\\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]
here an example:
v.names <- c('GRP20R','GRP20KA')
df <- data.frame(GRP20R.45.M=1,GRP20KA.25.8=2,hh=1)
patt <- sub(',\\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]
GRP20R.45.M GRP20KA.25.8
1 1 2
where df is :
df
GRP20R.45.M GRP20KA.25.8 hh
1 1 2 1
EDIT a liner solution (thanks #thelatemail)
df[,grepl(paste0(v.names,collapse="|"),colnames(df))]
Test data:
dat <- data.frame(
GRP20R.30.M="a",
GRP20R.45.M="a",
GRP40R.30.M="b",
GRP40R.45.M="b",
GRP60R.30.M="c",
GRP60R.45.M="c"
)
Only extract the columns partially matching the below strings:
strings <- c("GRP20R","GRP60R")
If your column names all had a predictable stem length, you could use:
dat[substr(colnames(dat),1,6) %in% strings]
If you wanted to more flexibly compare the part of the column name before the first period ., you could use:
dat[gsub("(.)?\\..+","\\1",colnames(dat)) %in% strings]
Both options giving the result:
GRP20R.30.M GRP20R.45.M GRP60R.30.M GRP60R.45.M
1 a a c c