This question is about selecting a different number of columns on every row of a data frame. I have a data frame:
df = data.frame(
START=sample(1:2, 10, repace=T), END=sample(2:4, 10, replace=T),
X1=rnorm(10), X2=rnorm(10), X3=rnorm(10), X4=rnorm(10)
)
I would like to have a way without loops to select columns (START[i]:END[i])+2 on row i for all rows of my data frame.
Base R solution
lapply(split(df,1:nrow(df)),function(row) row[(row$START+2):(row$END+2)])
Or something similar as given in the comment above (I would store the output in a list)
library(plyr)
alply(df,1,function(row) row[(row$START+2):(row$END+2)])
Edit per request of OP:
To get a TRUE/FALSE index matrix, use the following R base solution
idx_matrix=col(df)>=df$START+2&col(df)<=df$END+2
df[idx_matrix]
Note, however, that you lose some information here (compared to the list based solution).
Related
Sorry for the basic question here!
I am working the following type of dataset:
data <- data.frame(a=c(1,2,3), b=c(2,3,4), c=(10,11,12))
I am trying to turn it into a new data frame where there is only one column, containing all of these values in individual rows, i.e. a=c(1,2,3,2,3,4,10,11,12)
What function should I use to arrive at this output?
All the best,
Cameron
You can do data.frame(a = unlist(data)).
I know, there is other questions like this one but none of them answer my specific problem.
On my data frame, I need to count the number of values in each rows between cols 3 and 8.
I want a simple NB.VAL like in Excel..
base_graphs$NB <- rowSums(!is.na(base_graphs)) # with this code, I count all values except NAs but I can't select specific columns
How to create this new column "NB" on my data frame "base_graphs" ?
You were really close:
base_graphs$NB <- rowSums(!is.na(base_graphs[, 3:8]))
The [, 3:8] subsets and selects columns 3 through 8.
apply can apply a function to each row of a data frame. Try:
base_graphs$NB <- apply(base_graphs[3:8], 1, function (x) sum(is.na(x)))
I came across a problem in my DataCamp exercise that basically asked "Remove the column names in this vector that are not factors." I know what they -wanted- me to do, and that was to simply do glimpse(df) and manually delete elements of the vector containing the column names, but that wasn't satisfying for me. I figured there was a simple way to store the column names of the dataframe that are factors into a vector. So, I tried two things that ended up working, but I worry they might be inefficient.
Example data Frame:
factorVar <- as.factor(LETTERS[1:10])
df1 <- data.frame(x = 1, y = 1:10, factorVar = sample(factorVar, 10))
My first solution was this:
vector1 <- names(select_if(df1, is.factor))
This worked, but select_if returns an entire tibble of a filtered dataframe and then gets the column names. Surely there's an easier way...
Next, I tried this:
vector2 <- colnames(df1)[sapply(df1,is.factor)]
This also worked, but I wanted to know if there's a quicker, more efficient way of filtering column names based on their type and then storing the results as a vector.
I am having some issues understanding what row.names is and how it works. And, how I can get my data to do stuff the row.names allows one to do.
For example, I am creating some clusters with the code below (my data). I want to export the results which is what the sapply line does, but only to the screen for now. The first column (path_country) of my data frame are country names and the other columns are other variables (integers). I don't see an easy way to export these clusters to a table or list of countries and their group membership.
I tried to make a dummy example using example data sets in R. For example, mtcars, it was then that I noticed the first column was denoted as row.names. With mtcars I can create clusters, cutree to the specified number of groups and then save as a data frame. With this approach I have the 'car names' in the first column and the group number in the second column (more or less, could be cleaned up to look nicer, but is essentially what I am after), which is what I would like to happen with my data.
Any thoughts on this would be appreciated.
# my data
path_country <- read.csv("C:/path_country.csv")
patho <- subset(path_country, select=c(2:188))
patho.d <- dist(patho)
patho.hclust <- hclust(patho.d)
patho.hclust.groups11 = cutree(patho.hclust,11)
sapply(unique(patho.hclust.groups11),function(g)path_country$Country[patho.hclust.groups11 == g])
# mtcars data
car.d <- dist(mtcars)
car.h <- hclust(car.d)
car.h.11 <- cutree(car.h, 11)
nice_result <- as.data.frame(car.h.11)
write.table(nice_result, "test.txt", sep="\t")
1) You can create data.frame with row.names from CSV file:
# Names in the first column
path_country <- read.table("C:/path_country.csv", row.names=1)
# Names in column "Country"
path_country <- read.table("C:/path_country.csv", row.names="Country", head=TRUE)
Note, that in second case you should specify head=TRUE in order to use columns' names.
Now rownames(path_country) should give you vector with rows' names, and as.data.frame(patho.hclust.groups11) nice result for export.
2) At any time you can specify rows' names for your data.frame with command:
rownames(path_country) <- names.vector
where names.vector is a vector with unique names of length equal to number of rows in data.frame. In your example:
rownames(patho.hclust.groups11) <- path_country$Country
Note, that if you are using first approach you don't need this command.
I have a data frame with different variables and I want to build different subsets out of this data frame using some conditions and I want to use a loop because there will be a lot of subsets and this would be saving a lot of time.
This are the conditions:
Variable A has an ID for an area, variable B has different species (1,2,3, etc.) and I want to compute different subsets with these columns. The name of every subset should be the the ID of a point and the content should be all individuals of a certain specie in this point.
For a better understanding:
This would be the code for the one subset and I want to use a loop
A_2_NGF_Abies_alba <- subset(A_2_NGF, subset = Baumart %in% c("Abies alba"))
Is this possible doing in R
Thanks
Does this help you?
Baumdaten <- data.frame(pointID=sample(c("A_2_SEF","A_2_LEF","A_3_LEF"), 10, T), Baumart=sample(c("Abies alba", "Betula pendula", "Fagus sylvatica"), 10, T))
split(Baumdaten, Baumdaten[, 1:2])