This should be pretty easy, not sure what I'm missing here. I want to select a single row from a data frame, let's say row 1000, and get all columns where that specific row is not NA.
This works
df<- df[1000,]
df<- df[, !is.na(df)]
This fails
df<- df[1000, !is.na(df)]
ERROR "undefined columns selected"
You missed indexing the part concerning with is.na, here's an approach:
df <- df[1000, !is.na(df[1000, ])]
Related
I am trying to take df1 (a summary table), and merge it into df2 (master summary table).
This is a snapshot of df2, ignore the random 42, just the answer to the ultimate question.
This is an example of what df1, looks like.
Lastly, I have a vector called Dates. This matches the dates that are the column names for df2.
I am trying to cycle through 20 file, and gather the summary statistics of that file. I then want to enter that data into df2 to be stored permanently. I only need to enter the Earned column.
I have tried to use merge but since they do not have shared column names, I am unable to.
My next attempt was to try this. But it gave an error, because of unequal row numbers.
df2[,paste(Dates[i])] <- cbind(df2,df1)
Then I thought that maybe if I specified the exact location, it might work.
df2[1:length(df1$Earned),Dates[i]] <- df1$Earned
But that gave and error "New columns would leave holes after existing columns"
So then I thought of trying that again, but with cbind.
df2[1:length(df1$Earned),Dates[i]] <- cbind(df2, df1$Earned)
##This gave an error for differing row numbers
df2 <- cbind(df2[1:length(df1$Earned),Dates[i]],df1$earned)
## This "worked" but it replaced all of df2 with df1$earned, so I basically lost the rest of the master table
Any ideas would be greatly appreciated. Thank you.
Something like this might work:
df1[df1$TreatyYear %in% df2$TreatyYear, Dates] <- df2$Earned
Example
df <- data.frame(matrix(NA,4,4))
df$X1 <- 1:4
df[df$X1 %in% c(1,2),c("X3","X4")] <- c(1,2)
The only solution that I have found so far is to force df1$Earned into a vector. Then append the vector to be the exact length of the df2. Then I am able to insert the values into df2 by the specific column.
temp_values <- append(df1$Earned,rep(0,(length(df2$TreatyYear)-length(df1$TreatyYear))),after=length(df1$Earned))
df2[,paste(Dates[i])] <- temp_values
This is kind of a roundabout way to fix it, but not a very pleasant way. Any better ideas would be appreciated.
I know, there is other questions like this one but none of them answer my specific problem.
On my data frame, I need to count the number of values in each rows between cols 3 and 8.
I want a simple NB.VAL like in Excel..
base_graphs$NB <- rowSums(!is.na(base_graphs)) # with this code, I count all values except NAs but I can't select specific columns
How to create this new column "NB" on my data frame "base_graphs" ?
You were really close:
base_graphs$NB <- rowSums(!is.na(base_graphs[, 3:8]))
The [, 3:8] subsets and selects columns 3 through 8.
apply can apply a function to each row of a data frame. Try:
base_graphs$NB <- apply(base_graphs[3:8], 1, function (x) sum(is.na(x)))
I have a dataframe df1 with 300+ Columns, and I am trying to subset it to df2 by column names based on a couple dozen inputs in list1. However, some of the items in list1 are not actually column names in df1. So, I get an "undefined columns" error:
df2 <- df1[,list1]
Error in [.data.frame(df1, , list1) :
undefined columns selected
I realize the reason for this error. BUT I can't go through list1 and see which ones occur in df1 and which ones don't. Because I have to do this many many times with many many different lists. Is there a way to simply ignore those (and not include those null column name values from list1 in the subset df2)?
Thanks for your kind help.
How about something like: df2 <- df1[,list1[list1 %in% names(df1)]]
I was hoping for a nicer/shorter) solution for my own use.
Here is another
df1[,intersect(names(df1), list1)]
I have a data frame that looks like this
df <- data.frame(cbind(1:10, sample(c(1:5), 10, replace=TRUE)))
# in real case the columns could be more than two
# and the column name could be anything.
What I want to do is to remove all rows where the value of all its columns
is smaller than 5.
What's the way to do it?
df[!apply(df,1,function(x)all(x<5)),]
First of all ...please stop using cbind to create data.frames. You will be sorry if you continue. R will punish you.
df[ !rowSums(df <5) == length(df), ]
(The length() function returns the number of columns in a dataframe.)
I've got 81,000 records in my test frame, and duplicated is showing me that 2039 are identical matches. One answer to Find duplicated rows (based on 2 columns) in Data Frame in R suggests a method for creating a smaller frame of just the duplicate records. This works for me, too:
dup <- data.frame(as.numeric(duplicated(df$var))) #creates df with binary var for duplicated rows
colnames(dup) <- c("dup") #renames column for simplicity
df2 <- cbind(df, dup) #bind to original df
df3 <- subset(df2, dup == 1) #subsets df using binary var for duplicated`
But it seems, as the poster noted, inelegant. Is there a cleaner way to get the same result: a view of just those records that are duplicates?
In my case I'm working with scraped data and I need to figure out whether the duplicates exist in the original or were introduced by me scraping.
duplicated(df) will give you a logical vector (all values consisting of either T/F), which you can then use as an index to your dataframe rows.
# indx will contain TRUE values wherever in df$var there is a duplicate
indx <- duplicated(df$var)
df[indx, ] #note the comma
You can put it all together in one line
df[duplicated(df$var), ] # again, the comma, to indicate we are selected rows
doops <- which(duplicated(df$var)==TRUE)
uniques <- df[-doops,]
duplicates <- df[doops,]
Is the logic I generally use when I am trying to remove the duplicate entrys from a data frame.