How can I do this reshape of the data.frame so I can have a new column with the cell lines and another column for each gene without changing the rest
Considering that df is the name of your dataframe, then:
df2 <- as.data.frame(t(df))[-1,]
colnames(df2) <- df$Geneid
Related
I want to loop through the rows in a r dataframe (df1) and create columns based on the variable values (v1) on each row.
V1 is a column name on the dataframe df1. What I want to do is add a column using the name V1 onto df2. Variable v1 is of the data type <date> and the values will all be dates.
This is what I tried
for(row in 1:nrow(df1)){
df2 %>%
mutate(row$v1 == "value")
}
Here's my answer
for(row in 1:nrow(df1)){
colname <- df1[row, "v1"]
df2[,colname] <- "value"
}
You could do this directly without a loop :
df2[as.character(df1$v1)] <- 'value'
We can also use
library(dplyr)
df2 %>%
mutate_at(vars(as.character(df1$v1)), 'value')
I am currently trying to rename a single column in multiple dataframes to match the dataframe name in R.
I have seen some questions/solutions on the site that are similar to what I am attempting to do, but none appear to do this dynamically. I have over 45 dataframes I need rename a column in, so manually typing in each individual name is doable, but time consuming.
Dataframe1 <- column
Dataframe2 <- column
Dataframe3 <- column
I want it to look like this:
Dataframe1 <- Dataframe1
Dataframe2 <- Dataframe2
Dataframe3 <- Dataframe3
The ultimate goal is to have a master dataframe with columns Dataframe1, Dataframe2, and Dataframe3
We can get all the datasets into a list and rename at once in the list
lst1 <- lapply(mget(ls(pattern = "Dataframe\\d+")), function(x) {
names(x)[5] <- "newcol"
x})
Update
If we are renaming the columns in different datasets with different names, then create a vector of columns names that corresponds to each 'Dataframe' column name
nm1 <- c("col5A", "col5B", "col5C", ..., "col5Z")
lst2 <- Map(function(x) {names(x)[5] <- y; x},
mget(ls(pattern = "Dataframe\\d+")),
nm1)
In the above code, we are renaming the 5th column to 'newcol'.
It can also be done using tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = "Dataframe\\d+")), ~ .x %>%
rename_at(5, ~ "newcol"))
I have a dataset, where I defined the factors for rows gene.fac and columns cell.fac.
load('Analysis.RData')
top200_groups <- data.frame (cluster = cell.fac, t(top200))
melted <- melt(top200_groups, id.vars=c("cluster"))
After the application with melt function, I can see
Then I want to replace the genename in melted$variable with the factors defined in gene.fac.
Is there easy way to transform this? Thanks.
Here is a solution with dplyr:
load('Analysis.RData')
top200_groups <- data.frame (cluster = cell.fac, t(top200))
melted <- melt(top200_groups, id.vars=c("cluster"))
df2 <- as.data.frame(gene.fac)
df2$variable <- factor(rownames(df2))
df_new <- full_join(melted, df2, by = "variable")
The new data.frame df_new has two columns, your old one and the new one. You can erase the old one.
There is also warning but no error, when you execute the code, that dplyr changed to character.
I would like to convert a data.frame into a list of data.frames by column using base R functions and holding the first column constant. For example, I would like to split DF into a list of three data.frames, each of which includes the first column. That is, I would like to end up with the list named LONG without having to type out each list element out separately. Thank you.
DF <- data.frame(OBS=1:10,HEIGHT=rnorm(10),WEIGHT=rnorm(10),TEMP=rnorm(10))
DF
LONG <- list(HEIGHT = DF[c("OBS", "HEIGHT")],
WEIGHT = DF[c("OBS", "WEIGHT")],
TEMP = DF[c("OBS", "TEMP" )])
LONG
SHORT <- as.list(DF)
SHORT
SPLIT <- split(DF, col(DF))
We can loop through the names of 'DF' except the first one, cbind the first column with the subset of 'DF' from the names.
setNames(lapply(names(DF)[-1], function(x) cbind(DF[1], DF[x])), names(DF)[-1])
Or another option would be
Map(cbind, split.default(DF[-1], names(DF)[-1]), OBS=DF[1])
Using R, how do I make a column of a dataframe the dataframe's index? Lets assume I read in my data from a .csv file. One of the columns is called 'Date' and I want to make that column the index of my dataframe.
For example in Python, NumPy, Pandas; I would do the following:
df = pd.read_csv('/mydata.csv')
d = df.set_index('Date')
Now how do I do that in R?
I tried in R:
df <- read.csv("/mydata.csv")
d <- data.frame(V1=df['Date'])
# or
d <- data.frame(Index=df['Date'])
# but these just make a new dataframe with one 'Date' column.
#The Index is still 0,1,2,3... and not my Dates.
I assume that by "Index" you mean row names. You can assign to the row names vector:
rownames(df) <- df$Date
The index can be set while reading the data, in both pandas and R.
In pandas:
import pandas as pd
df = pd.read_csv('/mydata.csv', index_col="Date")
In R:
df <- read.csv("/mydata.csv", header=TRUE, row.names="Date")
The tidyverse solution:
library(tidyverse)
df %>% column_to_rownames(., var = "Date")
while saving the dataframe use row.names=F
e.g. write.csv(prediction.df, "my_file.csv", row.names=F)