I have looked at similar answers but none of them quite answer my task.
I have found a very messy answer to my question but would like advice as to whether there is a simpler way.
I have a file list of many tables that I want to import into R and append columns to an empty df.
The rownames or column 1 will be the same for each imported table/df but the number of columns (sample_ids) will change.
At the moment I create a vector outside the loop and name it with the row names that I know won't change. Then I loop through the dfs and do a left_join using the same col name
Something like this:
final_df<-c(the row names that I want to extract)
names(final_df) <- "Sample_ID"
for (i in 1:length(files)){
my_df<-read_tsv(files[i])
# get the table specific sample names
my_sn <- my_df[15,-c(1:3)]
# get the rows I want to extract
my_df<-filter(row names I want to extract)
names(my_df)<-c("Sample_ID", my_sn)
final_df<-left_join(final_df, my_df, by="Sample_ID")
}
I'm thinking there must be a more elegant way.
Related
I have a data frame in R containing over 29,000 rows. I need to remove multiple rows using only a list of names (187 names).
My dataset is about airlines, and I need to remove specific airlines from my data set that contains over 200 types of airlines. My first column contains all airline names, and I need to remove the entire row for those specific airlines.
I singled out all airline names that I want removed by this code: transmute(a_name_remove, airline_name). This gave me a table of all names of airlines that I want removed, now I have to remove that list of names from my original dataset named airlines.
I know there is a way to do this manually, which is: mydata[-c("a", "b"), ], for example. But writing out each name would be hectic.
Can you please help me by giving me a way to use the list that I have to forwardly remove those rows from my dataset?
I cannot write out each name on its own.
I also tried this: airlines[!(row.names(airlines) %in% c(remove)), ], in which I made my list "removed" into a data frame and as a vector, then used that code to remove it from my original dataset "airlines", still did not work.
Thank you!
You can create a function that negates %in%, e.g.
'%not_in%' <- Negate('%in%')
so per your code, it should look like this
airlines[row.names(airlines) %not_in% remove, ]
additionally, I do not recommend using remove as a variable name, since it is a base function in R, if possible rename the variable, e.g. discard_airlines ,
airlines[row.names(airlines) %not_in% discard_airlines, ]
I came across a problem in my DataCamp exercise that basically asked "Remove the column names in this vector that are not factors." I know what they -wanted- me to do, and that was to simply do glimpse(df) and manually delete elements of the vector containing the column names, but that wasn't satisfying for me. I figured there was a simple way to store the column names of the dataframe that are factors into a vector. So, I tried two things that ended up working, but I worry they might be inefficient.
Example data Frame:
factorVar <- as.factor(LETTERS[1:10])
df1 <- data.frame(x = 1, y = 1:10, factorVar = sample(factorVar, 10))
My first solution was this:
vector1 <- names(select_if(df1, is.factor))
This worked, but select_if returns an entire tibble of a filtered dataframe and then gets the column names. Surely there's an easier way...
Next, I tried this:
vector2 <- colnames(df1)[sapply(df1,is.factor)]
This also worked, but I wanted to know if there's a quicker, more efficient way of filtering column names based on their type and then storing the results as a vector.
I have two lists and a dataframe. The columns in the dataframe have the same names as the entries in the list. The dataframe has other columns as well, other than the ones specified in the lists
category.list <- c('Reserve_Book','choicepriv_and_points','Latency_freeze_load','signin','gift_card','mystery_gift','credit_card','call_support','account')
crosstab.list <- c('browser','OS','Device','comment_cat','comment_focus','recommend')
Now, how do I iterate through the elements in the list and use them to access the dataframe columns?
Below is the code, I am trying but I am getting errors while trying to access the dataframe column via the iterator variable.
for (i in category.list){
for (j in crosstab.list){
ftable(dataframe[j]~dataframe[i])
}
}
Specifically to your question, your dataframe references need to specify both which columns are desired and which rows.
ftable(dataframe[j]~dataframe[i])
needs to be
ftable(dataframe[,j]~dataframe[,i])
Note the addition of commas
I looked at this solution: R-friendly way to convert R data.frame column to a vector?
but each solution seems to involve manually declaring the name of the vector being created.
I have a large dataframe with about 224 column names. I would like to break up the data frame and turn it into 224 different vectors which preserve their label without typing them all manually. Is there a way to step through the columns in the data frame and produce a vector which has the same name as the column or am I dreaming?
I think it's a bad idea but this would work (using mtcars data set):
list2env(mtcars, .GlobalEnv)
attach is another dangerous command that people use to be able to access the columns of a data frame directly with their names. If you don't know why it's dangerous, though, don't do it.
Here's another bad idea:
for(i in names(mtcars)) assign(i, mtcars[,i])
Just for Richard:
for (x in names(mtcars))
eval(parse(text=paste(x, '<- c(', paste(mtcars[[x]], collapse=',') ,')')))
I am dealing with about 10 data frames that have the same column names, but different number of rows. I would like to create a list of all columns with the same names.
So, say i have 2 data frames with the same names.
a<-seq(0,20,1)
b<-seq(20,40,1)
c<-seq(10,30,1)
df.abc.1<-data.frame(a,b,c)
a<-seq(20,50,1)
b<-seq(10,40,1)
c<-seq(30,60,1)
df.abc.2<-data.frame(a,b,c)
I know i can create a list from this data such as,
list(df.abc.1$a, df.abc.2$a)
but i don't want to type out my long data frame names and column names.
I was hoping to do something like this,
list(c(df.abc.1, df.abc.2)$a)
But, it returns a list of df.abc.1$a
Perhaps there could be a way to use the grep function across multiple data.frames?
Perhaps a loop could accomplish this task?
Not sure if it's any better, but maybe
lapply(list(df.abc.1, df.abc.2), function(x) x$a)
For more than one column
lapply(list(df.abc.1, df.abc.2), function(x) x[, c("a","b")])