Subset and lookup from separate table - r

I have this lookup DATA FRAME:
VAR1=c('X1')
VAR2=c('X2')
VAR3=c('X3')
VAR4=c('X4')
VAR5=c('NA')
df<-data.frame(VAR1,VAR2,VAR3,VAR4,VAR5)
which I need to cross reference with a main DATA FRAME so that I select variables X1 to X5. Sometimes, like the example, column 5 is simply NA.
I would typically use something like the below:
main_data <-subset(main_data, select=c(df[1,1],df[1,2],df[1,3]))
main_data <-subset(main_data, select=c(df[1,1:max(col(df))]))
but there are NAs, and moreover I will have a dynamic count of columns and these don't work.
The other idea is to use grepl on main_data but I cannot get it to work with more than one variable at a time:
main_data <- main_data[, grepl(paste0(df[1:max(col(df))], colnames(main_data)))]
I am certain there is a straightforward way to do this but I cannot find it.

With Roman's help I got it:
df<-as.vector(unlist(df))
main_data<-main_data[, names(main_data) %in% df]

Related

Create data frame from list

I have a data frame (mydata) with >200 variables. I would like to automatically subset some of them into a smaller data frame.
The name of the variables I would like to subset follow a naming convention, e.g., "Q1Pct", "Q2Pct", ... "Q18Pct".
I can get a list of the variables using:
Q.names <- setNames(as.list(1:18),paste(paste0("mydata$Q",1:18,"Pct")))
I have tried to combine the list into a new data frame, but it isn't working:
df.QList <- data.frame(Q.names)
I'm sure there is a much better way to do this - please help.
You can try this:
library(dplyr)
select(mydata, all_of(paste0("Q",1:18,"Pct")))
Or, more simply (base R):
mydata[,paste0("Q",1:18,"Pct")]

Appending a variable to existing data depending on rows

So I have two columns. I need to add a third column. However this third column needs to have A for the first amount of rows, and B for the second specified amount of rows.
I tried adding this data_exercise_3 ["newcolumn"] <- (1:6)
but it didn't work. Can someone tell me what I'm doing wrong please?
Looks like you're having a problem with subsetting a data frame correctly. I'd recommend reviewing this concept before you proceed much further, either via a Coursera course or on a website like this UCLA R learning module on subsetting data frames. Subsetting is a crucial component of data wrangling with R, and you'll go much faster with a solid foundation of the basics!
You can assign values to a subset of a data frame by using [row, column] notation. Since your data frame is called data_exercise_3 and the column you'd like to assign values to is called 'newcolumn', then assuming you want the first 6 rows as 'A' and the next 3 as 'B', you could write it like this:
data_exercise_3[1:6,'newcolumn'] <- 'A'
data_exercise_3[7:9,'newcolumn'] <- 'B'
data_exercise_3$category <- c(rep("A",6),rep("B",6))

Error in 'colsplit' function?

Im am trying to split a column of a dataframe into 2 columns using transform and colsplit from reshape package. I don't get what I am doing wrong. Here's an example...
library(reshape)
df1 <- data.frame(col1=c("x-1","y-2","z-3"))
Now I am trying to split the col1 into col1.a and col1.b at the delimiter '-'. the following is my code...
df1 <- transform(df1,col1 = colsplit(col1,split='-',names = c('a','b')))
Now in my RStudio when I do View(df1) I do get to see col1.a and col1.b split the way I want to.
But when I run...
df1$col1.a or head(df1$col1.a) I get NULL. Apparently I am not able to make any further operations on these split columns. What exactly is wrong with this?
colsplit returns a list, the easiest (and idiomatic) way to assign these to multiple columns in the data frame is to use [<-
eg
df1[c('col1.a','col1.b')] <- colsplit(df1$col1,'-',c('a','b'))
it will be much harder to do this within transform (see Assign multiple new variables on LHS in a single line in R)

Generating new variable values by subset

I have a data set, and I am trying to create a new variable with random values that are associated with a particular subset.
For example, given the data frame:
data(iris)
iris=iris
I want another variable that associates each value of iris$Species with a random number (between 0 and 1). This can be accomplished in a circuitous fashion by creating a data frame:
df=data.frame(unique(iris$Species),runif(length(unique(iris$Species))))
And merging it with the original data frame:
iris=merge(iris,df,by.x="Species",by.y="unique.iris.Species.")
This accomplishes what I want, but it is inelegant. Furthermore, if I wanted to replicate this process many times over different variables this process would be burdensome. What I would hope for is some quick indexing method that would hopefully look something like:
iris$Species.unif=runif(length(unique(iris$Species)))[iris$Species]
Given that indexing in R is typically very slick, I expect there is some way of doing this that I am not aware of.
Thank you in advance.
You may want to try by using levels:
iris <- iris
iris$species_unif <- iris$Species
levels(iris$species_unif ) <- runif(length(levels(iris$Species)))

Appending column to a data frame - R

Is it possible to append a column to data frame in the following scenario?
dfWithData <- data.frame(start=c(1,2,3), end=c(11,22,33))
dfBlank <- data.frame()
..how to append column start from dfWithData to dfBlank?
It looks like the data should be added when data frame is being initialized. I can do this:
dfBlank <- data.frame(dfWithData[1])
but I am more interested if it is possible to append columns to an empty (but inti)
I would suggest simply subsetting the data frame you get back from the RODBC call. Something like:
df[,c('A','B','D')]
perhaps, or you can also subset the columns you want with their numerical position, i.e.
df[,c(1,2,4)]
dfBlank[1:nrow(dfWithData),"start"] <- dfWithData$start

Resources