Replace only a part of a subsetted vector

Replace only a part of a subsetted vector - r

Suppose I've got a data frame called someMatrix. Now in this matrix I want to replace only the first three rows of the 4 column.
I came up with this idea.
(someMatrix[,4])[1:3] <- replacement
but I get following error: could not find function "(<-"
Any idea how I could solve this?
Thanks!

You may subset with brackets as many times you want, without bothering with parentheses:
a <- cbind(rnorm(10), rnorm(10))
a[1:5, ][2:3, ][, 2][1]

Related

Merge dataframes with unequal rows, and no matching column names R

I am trying to take df1 (a summary table), and merge it into df2 (master summary table).
This is a snapshot of df2, ignore the random 42, just the answer to the ultimate question.
This is an example of what df1, looks like.
Lastly, I have a vector called Dates. This matches the dates that are the column names for df2.
I am trying to cycle through 20 file, and gather the summary statistics of that file. I then want to enter that data into df2 to be stored permanently. I only need to enter the Earned column.
I have tried to use merge but since they do not have shared column names, I am unable to.
My next attempt was to try this. But it gave an error, because of unequal row numbers.
df2[,paste(Dates[i])] <- cbind(df2,df1)
Then I thought that maybe if I specified the exact location, it might work.
df2[1:length(df1$Earned),Dates[i]] <- df1$Earned
But that gave and error "New columns would leave holes after existing columns"
So then I thought of trying that again, but with cbind.
df2[1:length(df1$Earned),Dates[i]] <- cbind(df2, df1$Earned)
##This gave an error for differing row numbers
df2 <- cbind(df2[1:length(df1$Earned),Dates[i]],df1$earned)
## This "worked" but it replaced all of df2 with df1$earned, so I basically lost the rest of the master table
Any ideas would be greatly appreciated. Thank you.

Something like this might work:
df1[df1$TreatyYear %in% df2$TreatyYear, Dates] <- df2$Earned
Example
df <- data.frame(matrix(NA,4,4))
df$X1 <- 1:4
df[df$X1 %in% c(1,2),c("X3","X4")] <- c(1,2)

The only solution that I have found so far is to force df1$Earned into a vector. Then append the vector to be the exact length of the df2. Then I am able to insert the values into df2 by the specific column.
temp_values <- append(df1$Earned,rep(0,(length(df2$TreatyYear)-length(df1$TreatyYear))),after=length(df1$Earned))
df2[,paste(Dates[i])] <- temp_values
This is kind of a roundabout way to fix it, but not a very pleasant way. Any better ideas would be appreciated.

R: Check for finite values in DataFrame

I need to check whether data frame is "empty" or not ("empty" in a sense that dataframe contain zero finite value. If there is mix of finite and non-finite value, it should NOT be considered "empty")
Referring to How to check a data.frame for any non-finite, I came up with one line code to almost achieve this objective
nrow(tmp[rowSums(sapply(tmp, function(x) is.finite(x))) > 0,]) == 0
where tmp is some data frame.
This code works fine for most cases, but it fails if data frame contains a single row.
For example, the above code would work fine for,
tmp <- data.frame(a=c(NA,NA), b=c(NA,NA)) OR tmp <- data.frame(a=c(3,NA), b=c(4,NA))
But not for,
tmp <- data.frame(a=NA, b=NA)
because I think rowSums expects at least two rows
I looked at some other posts such as https://stats.stackexchange.com/questions/6142/how-to-calculate-the-rowmeans-with-some-single-rows-in-data, but I still couldn't come up a solution for my problem.
My question is, are there any clean ways (i.e. avoid using loops and ideally one liner) to check for being "empty" for any dataframes?
Thanks

If you are checking all columns, then you can just do
all(sapply(tmp, is.finite))
Here we are using all rather than the rowSums trick so we don't have to worry about preserving matrices.

R sapply and mean for more than 1 column in split dataframe

I have a problem concerning sapply in R:
I hav a dataframe Test_ALL that I split by (at the moment) one column named activity. The dataframe has somewhat 20 columns with extra long names ( e.g. fBodyBodyGyroJerkMag-std()) that I don`t want to write down explicitely. From this dataframe I want to get a mean for each column. I tried this and it worked for 1 named column.
aa<-split(Test_ALL,Test_ALL$activity)
y<-sapply(aa,function(x) colMeans(x [c("fBodyBodyGyroJerkMag-std()")]))
but when I tried to get a mean for more than 1 column it didn`t work.
aa<-split(Test_ALL,Test_ALL$activity)
y<-sapply(aa,function(x) colMeans(x [c("fBodyBodyGyroJerkMag-std()","fBodyAccMag-std()")]))
I tried this too, but also no success
namesERG<-names(Test_ALL)
aa<-split(Test_ALL,Test_ALL$activity)
y<-sapply(aa,function(x) colMeans(x[c(namesERG)]))
What am I doing wrong?
Thak you!

Without a reproducible example is difficult to completely understand your problem. Anyway I think that a part of the issue is related to the fact that you have some non numeric columns. I think that somenthing like that could be a solution
library(dplyr)
aa <- split(Test_ALL, Test_ALL$activity)
y <- sapply(aa, function(x) colMeans(select_if(x, is.numeric)))

Using grepl() in a particular type of pattern matching

I'm not sure how to do this, I have a feeling that I can use grepl() with this but I am not sure how.
I have a column in my dataset where I have names like "Abbot", "Baron", "William", and hundreds of other names, and many blanks/missing-values.
I want to extract it in such a way where the first letter is extracted and put in a new column that only contains the letter, and if its missing a value then fill in with unknown.

Below I use a quick sapply statement and strsplit to grab the first letter. There is likely a better way to do this, but here's one solution. :)
test <- c('Abbot', 'Baron', 'William')
firstLetter <- sapply(test, function(x){unlist(strsplit(x,''))[1]})

What do you mean with
and if its missing a value then fill in with unknown
?
The following code using substr should be very fast with a large number of rows. It always returns the first letter and returns NA if the respective value in test$name is NA.
test <- data.frame(name = c('Abbot', 'Baron', 'William', NA))
test$first.letter <- substr(test$name, 1, 1)
If you want to convert all NAin test$first.letter to 'unknown' you can do this afterwards:
test$first.letter <- ifelse(is.na(test$first.letter), "unknown", test$first.letter)

Convert a List to a Data Frame after a Split

The top post of this question helped me equally divide a vector into an even set of chunks:
Split a vector into chunks in R
My problem now is that I would like to construct data frames out of the output. Here is the problem in R syntax:
d <- rpois(73,5)
solution1 <- split(d, ceiling(seq_along(d)/20))
ERROR <- as.data.frame(solution1)
The error that you should see is "arguments imply differing number of rows." I'm especially confused because I thought that the as.data.frame() function could handle this problem, as evident here:
http://www.r-bloggers.com/converting-a-list-to-a-data-frame-2/
Thanks for all your help!
EDIT 1:
I am close to a solution with this line, however, there are NA values that are being introduced that distort the output that I seek:
ldply(solution1,data.frame)
ldply is from the plyr package

Did you read the ?split help page? Did you notice the unsplit() function? That sounds like exactly what you're trying to do here.
d <- rpois(73,5)
f <- ceiling(seq_along(d)/20) #factor for splitting
solution1 <- split(d, f)
unsplit(solution1 , f)
I'm not sure what you expected your data.frame to look like, but the error message you got was because as.data.frame() was trying to create a new column in your data.frame for each item in solution1. And since each of those vectors in the list has a different number of elements, you cannot make a data.frame from that. A data.frame requires that every column has the same number of rows.