R - Subset based on column name - r

My data frame has over 120 columns (variables) and I would like to create subsets bases on column names.
For example I would like to create a subset where the column name includes the string "mood". Is this possible?

I generally use
SubData <- myData[,grep("whatIWant", colnames(myData))]
I know very well that the "," is not necessary and
colnames
could be replaced by
names
but it would not work with matrices and I hate to change the formalism when changing objects.

Related

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

Remove multiple rows from a list of names in R (a list of 187 names to remove)?

I have a data frame in R containing over 29,000 rows. I need to remove multiple rows using only a list of names (187 names).
My dataset is about airlines, and I need to remove specific airlines from my data set that contains over 200 types of airlines. My first column contains all airline names, and I need to remove the entire row for those specific airlines.
I singled out all airline names that I want removed by this code: transmute(a_name_remove, airline_name). This gave me a table of all names of airlines that I want removed, now I have to remove that list of names from my original dataset named airlines.
I know there is a way to do this manually, which is: mydata[-c("a", "b"), ], for example. But writing out each name would be hectic.
Can you please help me by giving me a way to use the list that I have to forwardly remove those rows from my dataset?
I cannot write out each name on its own.
I also tried this: airlines[!(row.names(airlines) %in% c(remove)), ], in which I made my list "removed" into a data frame and as a vector, then used that code to remove it from my original dataset "airlines", still did not work.
Thank you!
You can create a function that negates %in%, e.g.
'%not_in%' <- Negate('%in%')
so per your code, it should look like this
airlines[row.names(airlines) %not_in% remove, ]
additionally, I do not recommend using remove as a variable name, since it is a base function in R, if possible rename the variable, e.g. discard_airlines ,
airlines[row.names(airlines) %not_in% discard_airlines, ]

Assigning name to rows in R

I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution

How to use the names in a list to access a dataframe column

I have two lists and a dataframe. The columns in the dataframe have the same names as the entries in the list. The dataframe has other columns as well, other than the ones specified in the lists
category.list <- c('Reserve_Book','choicepriv_and_points','Latency_freeze_load','signin','gift_card','mystery_gift','credit_card','call_support','account')
crosstab.list <- c('browser','OS','Device','comment_cat','comment_focus','recommend')
Now, how do I iterate through the elements in the list and use them to access the dataframe columns?
Below is the code, I am trying but I am getting errors while trying to access the dataframe column via the iterator variable.
for (i in category.list){
for (j in crosstab.list){
ftable(dataframe[j]~dataframe[i])
}
}
Specifically to your question, your dataframe references need to specify both which columns are desired and which rows.
ftable(dataframe[j]~dataframe[i])
needs to be
ftable(dataframe[,j]~dataframe[,i])
Note the addition of commas

Extract columns with same names from multiple data frames [R]

I am dealing with about 10 data frames that have the same column names, but different number of rows. I would like to create a list of all columns with the same names.
So, say i have 2 data frames with the same names.
a<-seq(0,20,1)
b<-seq(20,40,1)
c<-seq(10,30,1)
df.abc.1<-data.frame(a,b,c)
a<-seq(20,50,1)
b<-seq(10,40,1)
c<-seq(30,60,1)
df.abc.2<-data.frame(a,b,c)
I know i can create a list from this data such as,
list(df.abc.1$a, df.abc.2$a)
but i don't want to type out my long data frame names and column names.
I was hoping to do something like this,
list(c(df.abc.1, df.abc.2)$a)
But, it returns a list of df.abc.1$a
Perhaps there could be a way to use the grep function across multiple data.frames?
Perhaps a loop could accomplish this task?
Not sure if it's any better, but maybe
lapply(list(df.abc.1, df.abc.2), function(x) x$a)
For more than one column
lapply(list(df.abc.1, df.abc.2), function(x) x[, c("a","b")])

Resources