CSV including column zero R - r

I have a DataFrame in R full of unique values from unique rownames that come from different files. Therefore, I have a column zero with the name of the file where the column name came from and a column 1 with unique rownames.
col0 col1
path/file1 name
path/file1 age
path/file2 color
path/file3 tree
path/file3 house
I want to export this as a CSV, but I haven't been able to write.csv including column0.
I have also tried mutating column 0 as a column, but that doesn't work either.
Any ideas?

Related

How to extract a common column from multiple tsv files and combine them into one dataframe in R?

I want to extract a common column named "framewise_displacement" from 162 tsv files arranged by subject ID numbers (eg., sub-CC123_timeseries.tsv, sub-CC124_timeseries.tsv, etc) with different number of columns and same number of rows, and merge them into a single dataframe.
The new dataframe is desired to have the columns to be the "framewise_displacement" from different subjects files with subject ID along, and the rows to be the same from the original files.
I tried to use vroom function in R, but failed because the files have different number of columns.
Also tried this code, but the output stacked all the columns into 1 single columns.
files = fs::dir_ls(path = "Documents/subject_timeseries", glob = "*.tsv")
merged_df <- map_df(files, ~vroom(.x, col_select=c(framewise_displacement)))
What should I do to merge them into one dataframe with the desired column side by side?
Any suggestions would be appreciated.
Many thanks!!!

R function for simple lookup replacement of excel

I want to extract the values form file 2 to file matching the values in indicated columns. It is a simple lookup function in Excel.
but many solutions given are based on matching column names which I don't want change in my data set.
2 files having a matching column and file2 column to be inserted in file1
As your column names are different in the two data.frames you need to tell merge which columns correspond to each other:
merge(file1, unique(file2[, c("Symbol", "GeneID"))], by.x="UniprotBlastGeneSymbol", by.y="Symbol")
Your result column will be called GeneID, not Column4, of course. If file2 contains gene Ids that are not found in file1 then you may also want all.y=FALSE.

Rename dataframe columns by string matching in R

I am looping through a series of ids, loading 2 csvs for each, and applying some analysis to them. I need rename the columns of one of the 2 csvs to match the row values of the other. I need to do this inside the loop in order to apply it to the csvs for every id.
I have tried renaming the columns like this:
`names(LCC_diff)[2:length(LCC_diff)] <- c("Bare.areas" = "Bare areas",
"Tree." = "Tree ", "Urban.areas" = "Urban areas",
"Water.bodies" = "Water bodies")`
where LCC_diff is a dataframe and the first value in each pair is the original column name and the second is the name that i want to assign to that column, but it just replaces the column names in order, and does not match them.
This is a problem because not all column names need replaced, and the csvs for different ids have these columns in different orders.
How do I match the original column names to the strings that I want to use to replace them?
Try rename them first, it should be much easier if they have the same name.
library(stringr)
str_replace_all(c("Tree ","Bare areas")," ",".")
[1] "Tree." "Bare.areas"

Assigning name to rows in R

I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution

How to avoid reading data from a dataframe when the passed column name do not match exactly?

I recently discovered that R will output data for a column name if the column name does not exist as is passed but the dataframe has a column name that meets what was passed as column name to retrieve data.
So if you have a dataframe X with column names say fruits and vegetables and if you try to retrieve data as X$fruit it will give you the fruits column data even when the passed column name (fruit) does not match the data frame column name (fruits). It throws error if there are column names like fruitss because at this time I believe R cannot decide whether to show fruits or fruitss to the passed value of x$fruit
How to avoid this?
The $ can create confusion where there are similar prefix for column names, so it is better to use [[ or [ to extract the columns as it will match the entire string and not any partial strings.
X[["fruit"]]
Or
X[, "fruit"]

Resources