renaming subset of columns in r with paste0 - r

I have a data frame (my_df) with columns named after individual county numbers. I melted/cast the data from a much larger set to get to this point. The first column name is year and it is a list of years from 1970-2011. The next 3010 columns are counties. However, I'd like to rename the county columns to be "column_"+county number.
This code executes in R but for whatever reason doesn't update the column names. they remain solely the numbers... any help?
new_col_names = paste0("county_",colnames(my_df[,2:ncol(my_df)]))
colnames(my_df[,2:ncol(my_df)]) = new_col_names

The problem is the subsetting within the colnames call.
Try names(my_df) <- c(names(my_df)[1], new_col_names) instead.
Note: names and colnames are interchangeable for data.frame objects.
EDIT: alternate approach suggested by flodel, subsetting outside the function call:
names(my_df)[-1] <- new_col_names

colnames() is for a matrix (or matrix-like object), try simply names() for a data.frame
Example:
new_col_names=paste0("county_",colnames(my_df[,2:ncol(my_df)]))
my_df <- data.frame(a=c(1,2,3,4,5), b=rnorm(5), c=rnorm(5), d=rnorm(5))
names(my_df) <- c(names(my_df)[1], new_col_names)

Related

What is happening during assignment to a dataframe by lapply

Given a dataframe df and a function f which is applied to df:
df[] <- lapply(df, f)
What is the magic R is performing to replace columns in df with collection of vectors in the list from lapply? I see that the result from lapply is a list of vectors having the same names as the dataframe df. I assume some magic mapping is being done to map the vectors to df[], which is the collection of columns in df (methinks). Just works? Trying to better understand so that I remember what to use the next time.
A data.frame is merely a list of vectors having the same length. You can see it using is.list(a_data_frame). It will return TRUE.
[] can have different meaning or action depending of the object it is applied on. It even can be redefined as it is in fact a function.
[] allows to subset or insert vector columns from data.frame.
df[1] get the first column
df[1] <- 2 replace the first column with 2 (repeated in order to have the same length as other columns)
df[] return the whole data.frame
df[] <- list(c1,c2,c3) sets the content of the data.frame replacing it's current content
Plus a wide number of other way to access or set data in a data.frame (by column name, by subset of rows, of columns, ...)

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?
I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

How to subset a single column fataframe and get a dataframe

I have a DataFrame with only one column and rownames
> head(UMIpCells_df, n=10)
UMIs
MB04_GATAACTGGCCT 4571.266
MB04_ACCCTGTCATTT 4534.992
MB04_GTAAGACGAATG 4793.417
MB04_AGGCTATTCCAA 4786.393
MB04_ATTATCTGATTT 4478.233
MB04_CCCGGGTCTGCC 4765.347
MB04_AAACGAGCTGAC 4571.253
MB04_TGTTGCTTTTCG 4167.119
MB04_ACGTCCCCCAAA 4778.961
MB04_GTCGCGCAGTTC 4664.638
I want to subset the firs 5 rows but I got a numeric vector:
> UMIpCells_df[1:5,]
[1] 4571.266 4534.992 4793.417 4786.393 4478.233
However if I add an extra column to the UMIpCell_df the subset returns a df.
I found out that to return a df from a single column dataframe I have to add:
drop = False
> UMIpCells_df[(1:5), ,drop=FALSE]
UMIs
MB04_GATAACTGGCCT 4571.266
MB04_ACCCTGTCATTT 4534.992
MB04_GTAAGACGAATG 4793.417
MB04_AGGCTATTCCAA 4786.393
MB04_ATTATCTGATTT 4478.233
However I found this odd and as basic as it is I will like to learn why subsetting the simplest df (only 1 column) has to be different that subsetting any other DataFrame (>1 column). Hope you do not get offended by the elementary of this question.
Consider using tibbles and data_frame instead of the standard data.frame. While not base R, packages such as dplyr help to "correct" some of these behaviors you noticed that may no longer be beneficial.
Check out the vignette on tibbles here:
https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html
And here is a brief comparison of tibbles to data frames as well as some comparisons when subsetting:
http://r4ds.had.co.nz/introduction-2.html#tibbles
head(UMIpCells_df, n=5) is also a data frame, so you can just do:
new.df <- head(UMIpCells_df, n=5)

Find data by multiple variable names in R

I have a question regarding variables names in R.
In my dataset I have a list of 70 variable names as characters and I want to find the corresponding data (including the header) in the data.
For example I used the dataset iris. I don't want to select all variables by iris$Sepal.Length since I have 70 variables in the dataset that I use. In my code I can print the data but I am struggling with saving the data as a dataframe with the corresponding header names. Somebody any thoughts?
iris
head(iris)
colnames(iris)
b <- list("Sepal.Length","Petal.Length")
i=1
for (i in 1:length(b)){
#print(b[[i]])
print(iris[,c(b[[i]])])
c[,i]<-(iris[,c(b[[i]])])
}
It sounds like you're trying to get a subset of 70 columns from a data.frame or matrix. The 70 columns you have are stored in a list. R will let you get columns named by a character vector, but not by a list. So, you can just use unlist.
b <- list("Sepal.Length","Petal.Length")
newTable <- iris[,unlist(b)]
I find dplyr the best for this. If you turn iris into a tibble
iris <- as_tibble(iris)
You can then use the dplyr::select function either selecting by name (no quotes) or by position. You can even use the 1:5 notation selecting columns 1 to 5. A great place to start is: http://r4ds.had.co.nz
Are you looking for this ?
b <- c("Sepal.Length","Petal.Length")
New_iris=iris[,b]

R - creating dataframe from colMeans function

I've been trying to create a dataframe from my original dataframe, where rows in the new dataframe would represent mean of every 20 rows of the old dataframe. I discovered a function called colMeans, which does the job pretty well, the only problem, which still persists is how to change that vector of results back to dataframe, which can be further analysed.
my code for colMeans: (matrix1 in my original dataframe converted to matrix, this was the only way I managed to get it to work)
a<-colMeans(matrix(matrix1, nrow=20));
But here I get the numeric sequence, which has all the results concatenated in one single column(if I try for example as.data.frame(a)). How am I supposed to get this result back into dataframe where each column includes only the results for specific column name and not all the averages.
I hope my question is clear, thanks for help.
Based on the methods('as.data.frame'), as.data.frame.list is an option to convert each element of a vector to columns of a data.frame
as.data.frame.list(a)
data
m1 <- matrix(1:20, ncol=4, dimnames=list(NULL, paste0('V', 1:4)))
a <- colMeans(m1)

Resources