Extract factor column from data frame - r

My data frame is breaking when i extract some rows from a factor column:
data.df = data.frame(x = factor(letters[1:10]))
data.temp = data.df[1:3, ]
print(data.temp)
How can i avoid that? I need to column name to be kept also. Thanks!

You can add argument drop=FALSE to keep data as data frame.
data.df = data.frame(x = factor(letters[1:10]))
data.temp = data.df[1:3, ,drop=FALSE]
print(data.temp)
x
1 a
2 b
3 c

Related

Subset columns of one data frame (by name) by information from second data frame

I would appreciate a solution for the following problem: I have the following example data frame:
df1 = data_frame(Tom = c(1,2,3,4), Tina = c(5,6,7,8), Todd = c(9,10,11,12), Brit = c(1,2,3,4))
I have a second data frame with information about Tom, Tina etc.
df2 = data_frame(ID = c("Tom","Todd","Tina","Brit"), value = c(1,3,2,1))
Now I would like to subset colums from data frame df1 if the "value" in df2 fulfils a particular condition, e.g. df2$value = 1 | df2$value = 2
The resulting table should look like:
desired_result_look_like = data_frame(Tom = c(1,2,3,4), Tina = c(5,6,7,8), Brit = c(1,2,3,4))
Thanks for you help.
Because you're using row values in one data frame to select the columns in another data frame, the solution isn't particularly clean, but if you wanted to stick with this approach, you could create a third data frame that filters the second data frame based on your conditions, then select the column names in the first data frame that correspond with values in the filtered data frame. The code would look something like this:
library(dplyr)
df2_filtered <- df2 %>% filter(value == 1 | value == 2)
desired_result <- df1[ , colnames(df1) %in% df2_filtered$ID]
(This is operating under the assumption that in your posted "desired result", you meant to include Tina instead of Todd)

adding unique rows from one data frame to another

I have a data frame which comprises a subset of records contained in a 2nd data frame. I would like to add the record rows of the 2nd data frame that are not common in the first data frame to the first... Thank you.
If you want all unique rows from both dataframes, this would work:
df1 <- data.frame(X = c('A','B','C'), Y = c(1,2,3))
df2 <- data.frame(X = 'A', Y = 1)
df <- rbind(df1,df2)
no.dupes <- df[!duplicated(df),]
no.dupes
# X Y
#1 A 1
#2 B 2
#3 C 3
But it won't work if there's duplicate rows in either dataframe that you want to preserve.
You should look dplyr's distint() and bind_rows() functions.
Or Better provide a dummy data to work on and expected output .
Suppose you have two dataframes a and b ,and you want to merge unique rows of a dataframe to the b dataframe
a = data.frame(
x = c(1,2,3,1,4,3),
y = c(5,2,3,5,3,3)
)
b = data.frame(
x = c(6,2,2,3,3),
y = c(19,13,12,3,1)
)
library(dplyr)
distinct(a) %>% bind_rows(.,b)

Store output of sapply into a data frame?

how can I store the output of sapply() to a dataframe where the index value is stored in first column and its value in corresponding 2nd column. For illustration, I have shown only 2 elements here, but there are 110 columns in my data. "loan" is the data frame.
cols <- sapply(loan,function(x) sum(is.na(x)))
cols
id
0
member_id
7
I want output as:
var value
id 0
member_id 7
I know that sapply() returns a vector, but when I print the vector, values are printed along with its some "index" e.g., column name if applied on a data frame. So, now when I want to store it as a data frame with two columns where 1st column contains the index part and the second column contains the value, how can I do it?
I found an answer to my question. For those who actually did understand my problem, this answer might make sense:
cols <- data.frame(sapply(loan ,function(x) sum(is.na(x))))
cols <- cbind(variable = row.names(cols), cols)
I wanted the row.names to be in a column of the same data frame corresponding to the values obtained from sapply.
We can use stack
stack(mylist)[2:1]
data
mylist <- list(df = 1, rf = 2)
Is this what you want?
Your original list:
L <- c("df",1,"rf",2)
L
[1] "df" "1" "rf" "2"
As a data frame:
N <- length(L)
df <- data.frame( var = L[seq(1,N,2)], value = L[seq(2,N,2)] )
df
var value
1 df 1
2 rf 2

Changing character values in a data frame column based on a conversion data frame in R

I have a data frame in R that has a column of strings/characters. I am calling this "myDat" below.
I have another data frame in R that has two columns of strings/characters. I am calling this "conversionDat" below. One column ("Name") contains similar names as the column in "myDat". The other column ("Name2") contains names to which the "myDat" column should be converted to.
Here is a MWE of these two data frames:
myDat <- data.frame(Name = c("A","D","P","R"))
conversionDat <- data.frame(Name = c("D","R","A","P"), Name2 = c("S","T","B","Z"))
myDat$Name <- as.character(myDat$Name)
conversionDat$Name <- as.character(conversionDat$Name)
conversionDat$Name2 <- as.character(conversionDat$Name2)
I would like to find any case where "myDat" equals a value in "conversionDat$Name" and convert it to "conversionDat$Name2". So, in the MWE above, the "conversionDat" data frame would remain unchanged, but the "myDat" data frame would become:
B2
S2
Z2
T2
Is there a painless method to go about doing this? Any ideas would be much appreciated!
A painless method would be to simply merge both and then add the "2" you need in the Name2 column?
myDat <- data.frame(Name = c("A","D","P","R"))
conversionDat <- data.frame(Name = c("D","R","A","P"), Name2 = c("S","T","B","Z"))
myDat <- merge(myDat, conversionDat, by = "Name")
myDat$Name2 <- paste(myDat$Name2, "2", sep = "")
> myDat
Name Name2
1 A B2
2 D S2
3 P Z2
4 R T2

Matching data from unequal length data frames in r

This seems like it should be really simple. Ive 2 data frames of unequal length in R. one is simply a random subset of the larger data set. Therefore, they have the same exact data and a UniqueID that is exactly the same. What I would like to do is put an indicator say a 0 or 1 in the larger data set that says this row is in the smaller data set.
I can use which(long$UniqID %in% short$UniqID) but I can't seem to figure out how to match this indicator back to the long data set
Made same sample data.
long<-data.frame(UniqID=sample(letters[1:20],20))
short<-data.frame(UniqID=sample(letters[1:20],10))
You can use %in% without which() to get values TRUE and FALSE and then with as.numeric() convert them to 0 and 1.
long$sh<-as.numeric(long$UniqID %in% short$UniqID)
I'll use #AnandaMahto's data to illustrate another way using duplicated which also works if you've a unique ID or not.
Case 1: Has unique id column
set.seed(1)
df1 <- data.frame(ID = 1:10, A = rnorm(10), B = rnorm(10))
df2 <- df1[sample(10, 4), ]
transform(df1, indicator = 1 * duplicated(rbind(df2, df1)[, "ID",
drop=FALSE])[-seq_len(nrow(df2))])
Case 2: Has no unique id column
set.seed(1)
df1 <- data.frame(A = rnorm(10), B = rnorm(10))
df2 <- df1[sample(10, 4), ]
transform(df1, indicator = 1 * duplicated(rbind(df2, df1))[-seq_len(nrow(df2))])
The answers so far are good. However, a question was raised, "what if there wasn't a "UniqID" column?
At that point, perhaps merge can be of assistance:
Here's an example using merge and %in% where an ID is available:
set.seed(1)
df1 <- data.frame(ID = 1:10, A = rnorm(10), B = rnorm(10))
df2 <- df1[sample(10, 4), ]
temp <- merge(df1, df2, by = "ID")$ID
df1$matches <- as.integer(df1$ID %in% temp)
And, a similar example where an ID isn't available.
set.seed(1)
df1_NoID <- data.frame(A = rnorm(10), B = rnorm(10))
df2_NoID <- df1_NoID[sample(10, 4), ]
temp <- merge(df1_NoID, df2_NoID, by = "row.names")$Row.names
df1_NoID$matches <- as.integer(rownames(df1_NoID) %in% temp)
You can directly use the logical vector as a new column:
long$Indicator <- 1*(long$UniqID %in% short$UniqID)
See if this can get you started:
long <- data.frame(UniqID=sample(1:100)) #creating a long data frame
short <- data.frame(UniqID=long[sample(1:100, 30), ]) #creating a short one with the same ids.
long$indicator <- long$UniqID %in% short$UniqID #creating an indicator column in long.
> head(long)
UniqID indicator
1 87 TRUE
2 15 TRUE
3 100 TRUE
4 40 FALSE
5 89 FALSE
6 21 FALSE

Resources