I am looping through a series of ids, loading 2 csvs for each, and applying some analysis to them. I need rename the columns of one of the 2 csvs to match the row values of the other. I need to do this inside the loop in order to apply it to the csvs for every id.
I have tried renaming the columns like this:
`names(LCC_diff)[2:length(LCC_diff)] <- c("Bare.areas" = "Bare areas",
"Tree." = "Tree ", "Urban.areas" = "Urban areas",
"Water.bodies" = "Water bodies")`
where LCC_diff is a dataframe and the first value in each pair is the original column name and the second is the name that i want to assign to that column, but it just replaces the column names in order, and does not match them.
This is a problem because not all column names need replaced, and the csvs for different ids have these columns in different orders.
How do I match the original column names to the strings that I want to use to replace them?
Try rename them first, it should be much easier if they have the same name.
library(stringr)
str_replace_all(c("Tree ","Bare areas")," ",".")
[1] "Tree." "Bare.areas"
Related
I have a complex names of big matrix. I'm supposed to replace the name by splitting the name of each column which are separated by "_".
sample of name
d__Bacteria.p__Firmicutes.c__Clostridia.o__Lachnospirales.f__Lachnospiraceae.g__Tuzzerella.__
now my target is to extract only family name of each group ending "aceae" (number 6 name) in all columns names and replace instead of such a big complex name.
may I ask you to help me?
I made vectors of columns and rows, and used library(stringr)
strsplit(colname_matrix, "_")
I have a list of split names but I do not know how I remove the rest and just keep names ending with "aceae" and apply it for all names in columns and rows.
matrix is symmetrical
x<-"d__Bacteria.p__Firmicutes.c__Clostridia.o__Lachnospirales.f__Lachnospiraceae.g__Tuzzerella.__ "
library(stringr)
str_extract(x, "(?<=f__)[^.g]+")
Base R if u do not want "aceae"
sub(".*.f__","",sub("aceae.*", "", x))
or
y<-as.vector(str_split(x,"__"))
y[[1]][str_detect(y[[1]], "aceae")]
I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]
I have a data frame in R containing over 29,000 rows. I need to remove multiple rows using only a list of names (187 names).
My dataset is about airlines, and I need to remove specific airlines from my data set that contains over 200 types of airlines. My first column contains all airline names, and I need to remove the entire row for those specific airlines.
I singled out all airline names that I want removed by this code: transmute(a_name_remove, airline_name). This gave me a table of all names of airlines that I want removed, now I have to remove that list of names from my original dataset named airlines.
I know there is a way to do this manually, which is: mydata[-c("a", "b"), ], for example. But writing out each name would be hectic.
Can you please help me by giving me a way to use the list that I have to forwardly remove those rows from my dataset?
I cannot write out each name on its own.
I also tried this: airlines[!(row.names(airlines) %in% c(remove)), ], in which I made my list "removed" into a data frame and as a vector, then used that code to remove it from my original dataset "airlines", still did not work.
Thank you!
You can create a function that negates %in%, e.g.
'%not_in%' <- Negate('%in%')
so per your code, it should look like this
airlines[row.names(airlines) %not_in% remove, ]
additionally, I do not recommend using remove as a variable name, since it is a base function in R, if possible rename the variable, e.g. discard_airlines ,
airlines[row.names(airlines) %not_in% discard_airlines, ]
I want to extract the values form file 2 to file matching the values in indicated columns. It is a simple lookup function in Excel.
but many solutions given are based on matching column names which I don't want change in my data set.
2 files having a matching column and file2 column to be inserted in file1
As your column names are different in the two data.frames you need to tell merge which columns correspond to each other:
merge(file1, unique(file2[, c("Symbol", "GeneID"))], by.x="UniprotBlastGeneSymbol", by.y="Symbol")
Your result column will be called GeneID, not Column4, of course. If file2 contains gene Ids that are not found in file1 then you may also want all.y=FALSE.
My data frame has over 120 columns (variables) and I would like to create subsets bases on column names.
For example I would like to create a subset where the column name includes the string "mood". Is this possible?
I generally use
SubData <- myData[,grep("whatIWant", colnames(myData))]
I know very well that the "," is not necessary and
colnames
could be replaced by
names
but it would not work with matrices and I hate to change the formalism when changing objects.