I have a question regarding variables names in R.
In my dataset I have a list of 70 variable names as characters and I want to find the corresponding data (including the header) in the data.
For example I used the dataset iris. I don't want to select all variables by iris$Sepal.Length since I have 70 variables in the dataset that I use. In my code I can print the data but I am struggling with saving the data as a dataframe with the corresponding header names. Somebody any thoughts?
iris
head(iris)
colnames(iris)
b <- list("Sepal.Length","Petal.Length")
i=1
for (i in 1:length(b)){
#print(b[[i]])
print(iris[,c(b[[i]])])
c[,i]<-(iris[,c(b[[i]])])
}
It sounds like you're trying to get a subset of 70 columns from a data.frame or matrix. The 70 columns you have are stored in a list. R will let you get columns named by a character vector, but not by a list. So, you can just use unlist.
b <- list("Sepal.Length","Petal.Length")
newTable <- iris[,unlist(b)]
I find dplyr the best for this. If you turn iris into a tibble
iris <- as_tibble(iris)
You can then use the dplyr::select function either selecting by name (no quotes) or by position. You can even use the 1:5 notation selecting columns 1 to 5. A great place to start is: http://r4ds.had.co.nz
Are you looking for this ?
b <- c("Sepal.Length","Petal.Length")
New_iris=iris[,b]
Related
Hello, I have this type of table consisting of a single row and several columns. I have tried a code to extract my KD_PL parameters without success. Do you know a way in R to extract all the KD_PLs and store them in a vector or data frame array?
I tried this:
KDPL <- select("KD_PL.", which(substr(colnames(max_LnData), start=1, stop=6)))
This should do the trick:
library(tidyverse)
KDPL <- max_LnData %>% select(starts_with("KD_PL."))
This function selects all columns from your old dataset starting with "KD_PL." and stores them in a new dataframe KDPL.
If you only want the names of the columns to be saved, you could use the following:
KDPL_names <- colnames(KDPL)
This saves the column names in the vector KDPL_names.
I have a DataFrame with only one column and rownames
> head(UMIpCells_df, n=10)
UMIs
MB04_GATAACTGGCCT 4571.266
MB04_ACCCTGTCATTT 4534.992
MB04_GTAAGACGAATG 4793.417
MB04_AGGCTATTCCAA 4786.393
MB04_ATTATCTGATTT 4478.233
MB04_CCCGGGTCTGCC 4765.347
MB04_AAACGAGCTGAC 4571.253
MB04_TGTTGCTTTTCG 4167.119
MB04_ACGTCCCCCAAA 4778.961
MB04_GTCGCGCAGTTC 4664.638
I want to subset the firs 5 rows but I got a numeric vector:
> UMIpCells_df[1:5,]
[1] 4571.266 4534.992 4793.417 4786.393 4478.233
However if I add an extra column to the UMIpCell_df the subset returns a df.
I found out that to return a df from a single column dataframe I have to add:
drop = False
> UMIpCells_df[(1:5), ,drop=FALSE]
UMIs
MB04_GATAACTGGCCT 4571.266
MB04_ACCCTGTCATTT 4534.992
MB04_GTAAGACGAATG 4793.417
MB04_AGGCTATTCCAA 4786.393
MB04_ATTATCTGATTT 4478.233
However I found this odd and as basic as it is I will like to learn why subsetting the simplest df (only 1 column) has to be different that subsetting any other DataFrame (>1 column). Hope you do not get offended by the elementary of this question.
Consider using tibbles and data_frame instead of the standard data.frame. While not base R, packages such as dplyr help to "correct" some of these behaviors you noticed that may no longer be beneficial.
Check out the vignette on tibbles here:
https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html
And here is a brief comparison of tibbles to data frames as well as some comparisons when subsetting:
http://r4ds.had.co.nz/introduction-2.html#tibbles
head(UMIpCells_df, n=5) is also a data frame, so you can just do:
new.df <- head(UMIpCells_df, n=5)
I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])
Dear Friends I would appreciate if someone can help me in some question in R.
I have a data frame with 8 variables, lets say (v1,v2,...,v8).I would like to produce groups of datasets based on all possible combinations of these variables. that is, with a set of 8 variables I am able to produce 2^8-1=63 subsets of variables like {v1},{v2},...,{v8}, {v1,v2},....,{v1,v2,v3},....,{v1,v2,...,v8}
my goal is to produce specific statistic based on these groupings and then compare which subset produces a better statistic. my problem is how can I produce these combinations.
thanks in advance
You need the function combn. It creates all the combinations of a vector that you provide it. For instance, in your example:
names(yourdataframe) <- c("V1","V2","V3","V4","V5","V6","V7","V8")
varnames <- names(yourdataframe)
combn(x = varnames,m = 3)
This gives you all permutations of V1-V8 taken 3 at a time.
I'll use data.table instead of data.frame;
I'll include an extraneous variable for robustness.
This will get you your subsetted data frames:
nn<-8L
dt<-setnames(as.data.table(cbind(1:100,matrix(rnorm(100*nn),ncol=nn))),
c("id",paste0("V",1:nn)))
#should be a smarter (read: more easily generalized) way to produce this,
# but it's eluding me for now...
#basically, this generates the indices to include when subsetting
x<-cbind(rep(c(0,1),each=128),
rep(rep(c(0,1),each=64),2),
rep(rep(c(0,1),each=32),4),
rep(rep(c(0,1),each=16),8),
rep(rep(c(0,1),each=8),16),
rep(rep(c(0,1),each=4),32),
rep(rep(c(0,1),each=2),64),
rep(c(0,1),128)) *
t(matrix(rep(1:nn),2^nn,nrow=nn))
#now get the correct column names for each subset
# by subscripting the nonzero elements
incl<-lapply(1:(2^nn),function(y){paste0("V",1:nn)[x[y,][x[y,]!=0]]})
#now subset the data.table for each subset
ans<-lapply(1:(2^nn),function(y){dt[,incl[[y]],with=F]})
You said you wanted some statistics from each subset, in which case it may be more useful to instead specify the last line as:
ans2<-lapply(1:(2^nn),function(y){unlist(dt[,incl[[y]],with=F])})
#exclude the first row, which is null
means<-lapply(2:(2^nn),function(y){mean(ans2[[y]])})
I have a data frame (my_df) with columns named after individual county numbers. I melted/cast the data from a much larger set to get to this point. The first column name is year and it is a list of years from 1970-2011. The next 3010 columns are counties. However, I'd like to rename the county columns to be "column_"+county number.
This code executes in R but for whatever reason doesn't update the column names. they remain solely the numbers... any help?
new_col_names = paste0("county_",colnames(my_df[,2:ncol(my_df)]))
colnames(my_df[,2:ncol(my_df)]) = new_col_names
The problem is the subsetting within the colnames call.
Try names(my_df) <- c(names(my_df)[1], new_col_names) instead.
Note: names and colnames are interchangeable for data.frame objects.
EDIT: alternate approach suggested by flodel, subsetting outside the function call:
names(my_df)[-1] <- new_col_names
colnames() is for a matrix (or matrix-like object), try simply names() for a data.frame
Example:
new_col_names=paste0("county_",colnames(my_df[,2:ncol(my_df)]))
my_df <- data.frame(a=c(1,2,3,4,5), b=rnorm(5), c=rnorm(5), d=rnorm(5))
names(my_df) <- c(names(my_df)[1], new_col_names)